When static typing is worth it




Some counters here:

Pascal Costanza wrote:
However, it
seems to be the case that once you have an extensive test suite for your
program, the correctness of your types is silently tested as well - a
program would simply fail at least some test cases if the types weren't
"correct." So why test the types twice, and especially, why putting up
with the involved overhead twice?

Two points here:

.. Your argument assumes that static and dynamic solutions require the same
amount of testing. They do not. Static typing typically reduces the need
for unit testing by 10-100x, to the point where automated unit tests and
debuggers are no longer a crippling concern as they are in dynamically
typed languages like Lisp.

.. Compiling is typically much faster than testing and testing sufficiently
can be prohibitively slow. Consequently, non-trivial applications can be
developed more quickly in modern statically typed languages than
dynamically typed ones.

[Yes, there is the objection that test cases only provide evidence for
single data points, whereas static type checks give guarantees for all
possible cases, but that's a theoretical claim. I am talking about
actual experience here. There is also the experience from people using
advanced static type systems who claim that once they get the types
right, their programs actually _are_ correct. The reason here may simply
be that different programmers prefer different programming approaches
for subjective reasons. All such claims about when programs and their
types are actually observably correct, both from the dynamic and static
camps, are anecdotal

For example:

http://members.verizon.net/~olsongt/c3a/

and cannot be proved right or wrong.]

On the contrary, the literature is full of examples demonstrating static
type systems proving correctness, e.g. one paper describes how static
typing can be used to prove that balanced binary tree implementations
retain balancing properties.

In my case, I am often able to prove that run-time errors have been removed
from a block of code by progressively rephrasing the code to leverage the
static type system and remove run-time checks.

As a trivial example, consider a function that requires a sequence that must
contain at least one element:

# let rec reduce f = function
| [h] -> h
| h::t -> f h (reduce f t)
| _ -> raise Not_found;;
val reduce : ('a -> 'a -> 'a) -> 'a list -> 'a = <fun>
# reduce (@) [[1]; [2]; [3]];;
- : int list = [1; 2; 3]
# reduce (@) [];;
Exception: Not_found.

The last clause in the pattern match:

| _ -> raise Not_found

incurs a run-time error (raising the Not_found exception) if the list is
empty. This exception should never occur but it will if the calling
function has a bug and calls this function with the empty list.

In a dynamic language like Lisp, you must therefore unit test all callers of
this function and try to find any that accidentally call it with the empty
list. This typically requires you to test many of their callers and so on.
This is like searching for a needle in a haystack. Not only are you
completely unable to eliminate the bugs, you cannot even quantify your
confidence in the code.

You might sit down with a pencil and try to prove your code correct.

Better yet, you might get the computer to check the proof directly from the
code.

Ideally, the computer would not only infer the proof from your code but it
would also provide machine-verified documentation synchronized with the
proof such that the proving and documenting are performed automatically
every incremental compilation and evolve as the code is developed. Modern
statically-typed functional programming languages let you do exactly this
easily.

In SML, OCaml, Haskell, F# or Scala, you can rewrite this function such that
the static type system proves that the sequence given to this function
contains at least one element:

# let rec reduce f (h, t) = match t with
| [] -> h
| h2::t -> f h (reduce f (h2, t))
| _ -> raise Not_found;;
Warning U: this match case is unused.
val reduce : ('a -> 'a -> 'a) -> 'a * 'a list -> 'a = <fun>

The OCaml compiler even tells you that the pattern match clause that raises
the run-time error is now unused, proving that this run-time error can
never occur. So we can simply remove it and the OCaml compiler even proves
that this can be done safely (if it were not safe, it would give
an "inexhaustive" warning):

# let rec reduce f (h, t) = match t with
| [] -> h
| h2::t -> f h (reduce f (h2, t));;
val reduce : ('a -> 'a -> 'a) -> 'a * 'a list -> 'a = <fun>
# reduce (@) ([1], [2]::[3]::[]);;
- : int list = [1; 2; 3]

Note that the "raise Not_found" has disappeared. We have completely
eliminated this source of run-time errors by getting the static type system
to prove that this run-time error can never occur. All code that uses this
is now guaranteed to be correct in this sense. The type inferred by the
compiler constitutes the machine-verified documentation.

With a powerful static type system (e.g. SML, OCaml, F# or Haskell), you can
eliminate a lot of run-time errors by getting the compiler to prove that
they cannot occur. This is precisely why the modern languages are vastly
superior and represent the evolution of Lisp.

However, there is yet another level to this debate: It can be beneficial
to be able to evolve a program at runtime, in the sense that you cannot
only change the values that variables are bound to, but also function
and method definitions, class definitions and other entities that make
up your program. If you perform such changes programmatically, you
basically have self-modifying programs.

From the static point of view, this is just run-time dispatch and
consequently, is solved using pattern matching in all modern FPLs.

Although the notion of a
self-modifying program has a bad reputation, there are actually
programming techniques that make such self-modifying programs feasible
and reasonably well to deal with. Lisp is an especially good basis for
such techniques (more so than Scheme or Smalltalk, which are
nevertheless also pretty good at that).

Few modern FPLs provide run-time compilation (MetaOCaml and F# are two) but
it is rarely useful compared to, for example, pattern matching.

Evolving programs at runtime helps you, at least, with developing and
testing large systems, when you don't want to shut down and restart a
system just because you changed a few definitions here and there. It can
also help you with deployment, for example when running server
applications where you don't want to shut them down for software updates
in order to decrease downtimes. There are even cooler applications for
that.

When you first hear about it, self-modifying code certainly does sound like
a very cool idea with cool applications. However, you only need to toy with
it to discover how it can completely undermine the reliability of any
significant software system. Consequently, prolific use of self-modifying
code is widely known to be fraught with peril and any good software
developer should know that it is bad practice except in the rarest of
circumstances.

The static solution is to provide flexibility by explicitly parameterizing
code over data. This clarifies all dependencies and prevents the rats-nest
characteristic that makes the dynamic approach unmaintainable. You are free
to change the data at run-time but you may also add an arbitrary amount of
static checking to your code. This provides the same spectrum of
flexibility but with a different trade-off in terms of ease of use. Some
systems (e.g. F#) provide the best of both worlds by combining run-time
code generation and collection with both static and dynamic typing. Lisp
provides an entirely dynamic approach with only the most rudimentary forms
of static checking provided inconsistently across compilers.

Common Lisp is especially remarkable because it provides this kind of
runtime flexibility while staying reasonably efficient in terms of
execution performance - much better than, say, Python, Ruby, and such
dynamic languages which are typically chosen in such scenarios. Common
Lisp also provides enough rope to tweak programs to such an extent that
you can squeeze out a lot of efficiency, to the degree that you can
write programs that are on par with static languages, like C, C++, etc.

The ray tracer language comparison showed that Lisp is harder to optimize
and still slower that many static languages:

http://www.ffconsultancy.com/languages/ray_tracer/results.html

The Lisp community were far more disgruntled with Lisp's appalling initial
performance results than any other community so they put much more effort
into optimizing the Lisp implementations. I received only a handful of
tweaks for the implementations in other languages but dozens of Lisp
implementations, almost all of which were substantially slower and longer
than Juho's.

Arguably the most vocal Lisp extremist of the time was Dr Thomas
Fischbacher, an expert in high performance computing from the Max Planck
Institute in Germany who was vehemently opposed to my language comparison
for many months, disgusted by the results, and he tried desperately to
improve Lisp's abysmal results:

http://groups.google.co.uk/group/comp.lang.functional/msg/e48420d41e74f38e

His best implementation of the ray tracer was not only slower than Juho's
but was seven times longer than faster OCaml versions. Perhaps this is why
Thomas is now an OCaml programmer:

http://groups.google.co.uk/groups/search?q=%2B%22Thomas+Fischbacher%22&start=0&scoring=d&hl=en&ie=UTF-8&oe=UTF-8&safe=off&;

In the intervening years, I have been lucky enough to participate in many
programming language performance comparisons and Lisp has fared terribly in
all but one (the Minim compiler, where Lisp's run-time compilation is an
advantage) where is was average. For example, the symbolic simplifier
benchmark by Mark Tarver demonstrates how the optimizing pattern match
compilers found in all modern functional programming languages place these
static languages a league ahead of Lisp in terms of performance, clarity
and development time:

http://www.lambdassociates.org/studies/study10.htm

As this result applies to a wide variety of real programming challenges,
perhaps Mark will also ditch Lisp and become an OCaml programmer. All of
the intelligent Lispers seem to be.

You typically have to make trade offs here, when you do that: Whenever
you improve the performance of a Common Lisp programs, you typically
lose some of the runtime flexibility. But Common Lisp is nevertheless
exceptional in that regard because it actually enables you to make these
choices yourself, and doesn't require you to buy into whatever model the
designers of other languages happened to prefer.

On the contrary, Lisp forces you to buy into the designer's dynamic style.
You cannot automate the static checking of even the most trivial of
constraints in Lisp, let alone prove valuable assurances about critical
parts of your program as you can do so easily with modern statically-typed
languages.

Obtaining dynamic behaviour in a static language is trivial (just box and
parameterize) but obtaining static behaviour in a dynamic language is a
practical impossibility.

This all is related to the discussion about static vs. dynamic types
because, to the best of my knowledge, it is not possible to combine this
kind of runtime flexibility with static type checking. As soon as a
program is able to modify itself, static types simply loose. And for
some (including me), runtime flexibility is far more important than the
illusion of safety static type systems give you.

Both in industry and in academia, the benefits of static typing are widely
appreciated.

If you want your code to work, choose static typing. If you're getting paid
in cash for lashing together porn sites, choose dynamic typing.

For thorough demonstrations of the superiority of modern statically-typed
functional programming languages, subscribe to our journals:

http://www.ffconsultancy.com/products/ocaml_journal/?cll
http://www.ffconsultancy.com/products/fsharp_journal/?cll

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u
.



Relevant Pages

  • Re: What is Lisp used for?
    ... My main languages are Perl, Java, and ColdFusion. ... Lisp PHP FlamingThunder). ...
    (comp.lang.lisp)
  • Re: SBCL is now faster than Java, as fast as Ocaml, and getting better
    ... what MLish languages do in this case. ... If you are trying to compare my OCaml code with your Lisp translation then ... Lispers not only fail to understand the benefits of pattern matching ... but it is a Java extension just much as Clojure ...
    (comp.lang.lisp)
  • Re: LISPPA
    ... > memory resources. ... do in Common Lisp with code that runs about as fast. ... >> comparing Lisp with languages like C, Pascal and Basic, ... If by "visual programming" you mean the sort of ...
    (comp.lang.lisp)
  • Re: How Common Lisp sucks
    ... languages than there are jobs that require them as primary programming ... blog entries on Lisp. ... Lisp that sucks, ...
    (comp.lang.lisp)
  • Re: Python from Wise Guys Viewpoint
    ... > prove correctness, you prove that the code fulfils some properties given ... A static type system checks conformity to restrictions that may not be ... most languages perform index-out-of-bonds checks at ... C is as much a straw man wrt to untyped languages as Java is wrt ...
    (comp.lang.lisp)