Thursday, May 3, 2012

The Essence of Dynamic Typing

Dynamic vs Static typing discussions exist probably since Bertrand Russel tried to formalize set theory and somebody questioned his preferences. One presumes, like all other discussions in math and science, it would be mostly resolved by now due to the sheer pressure of evidence and well thought arguments from both sides. Sadly, like most debates in programming this one is filled with misconceptions and both sides are famous for speaking past each other. I'll try to find the essence of dynamic typing, the one feature that is supposed to add expressiveness to dynamic typing (not dynamic languages in general, this won't cover Lisp-like macros, eval, Smalltalk become:, reflection, code-loading, etc.) and see what it fundamentally means.

Dynamic typing offers us the possibility of writing programs that are partially defined and may fail due to certain inputs. Let's see an example (using Javascript syntax):

    function example(b, x) {
        if (b) {
            return length(x);
        } else {
            return x * 2;
        }
    }


The function above is well defined if b is true and x is applicable to length() or if x is applicable to *2. The programs example(true, "abcdef") and example(false, 3) give both the same result and should not fail at runtime, while example(false, "abcdef")example(true, 3)example(3, false), etc., have undefined behavior. The essence of dynamic typing is to move this undefined behavior to run time, instead of trying to disallow it at compile time, allowing more programs to be expressed.


It's also possible to express and try to run an infinite amount of programs that only evaluate to undefined behavior, without hope of useful work being computed. Most, if not all, in the static typing camp find this downside to be greater than the added expressiveness.

Going back to our example I think it's safe to say that there's no value in the programs exhibiting only undefined behavior (i.e. all paths lead to crashes) and there's value in finding as early as possible if a program goes in that direction. OTOH dynamically typed programs usually have many valuable paths even if there exist undefined paths. Actually the whole dynamic typing argument can be boiled down to:
The value of executable paths is greater than the value of unexecutable paths. Undefined behavior in unexecutable paths can not be observed.
In this point of view trying to avoid the execution of paths that a program don't execute is a wasteful tautology: "avoid doing what you won't do" is a meaningless advice.

Aside, it's interesting to notice that such language can have an erasure model of compilation and don't require runtime tags. Of course it will be unsafe if the unexecutable paths suddenly are executed but this is another issue.

The example program can also be written in OO style, but the idea is the same.

Fundamentally dynamic typing relies on undefined behavior for some code paths which, surprisingly, exist in almost all statically typed languages. In Haskell, for example, it's easy to add undefined behavior by using the aptly named undefined value. What dynamically typed languages offer is a single static type (i.e. any or object or stuff) for all expressions so you can write down any kind of expressions and they all type check. One could do the same in Haskell using a modified Prelude and not using data declarations, essentially having something like this:


    data Any = B Bool | N Int | C Any Any | Nil | F (Any -> Any)
    -- we support booleans, numbers, cons cells, nil and functions


    length (Nil      ) = N 0
    length (C car cdr) = case (length cdr) of
                             N n -> 1 + n
    -- let it crash on other paths


    example (B True)  x     = length x
    example (B False) (N n) = N (n * 2)


This is the has the same behavior and static guarantees. For this small example the static typing won't get in the way of writing the same programs we would in a dynamically typed language. Of course this breaks down when we want to declare our own data types, because we need to extend Any to support structures. We can do it using other features of Haskell or extend the language, but this exercise is pointless. Any Turing-complete language can encode another Turing-complete language and any static type system can express a dynamic type system, because it is statically unityped.


If Haskell had no undefined value (or any equivalent) we could still encode the undefined behavior like this, using U as the undefined value:

    data Any = B Bool | N Int | C Any Any | Nil | F (Any -> Any) | U


    length (Nil      ) = N 0
    length (C car cdr) = case (length cdr) of
                             N n -> 1 + n
                             U   -> U
                         -- we propagate the undefined case
    length _           = U
    -- all other cases return the undefined primitive

    example (B True)  x     = length x
    example (B False) (N n) = N (n * 2)
    example _         _     = U


This is less convenient, but it is a syntax issue not one of semantics.

Dynamic or static is a language style that is very hard to reconcile, because even if statically typed languages can encode dynamically typed programs you end up having two sub languages that don't talk to each other, an encoded dynamically typed program can't use a function defined in the statically typed side because the statically typed function has many properties expressed via the type system while the dynamically typed program has no static invariant whatsoever. Even if a statically typed language offers a decent encoding of dynamic typing it would be no better than using a dynamically typed language directly. OTOH a dynamically typed language (without a before runtime execution phase) can't offer any of the static typing guarantees and such encoding is impossible by design. This doesn't mean static typing is a fundamentally better approach, it depends on how much are those additional guarantees valued over the cost of having to prove your program to the static type checker.

Fundamentally dynamically typed languages rely on the existence of undefined behavior, either via failed tag matches, doesNotUnderstand: messages or plain crashes. This same feature exist in most statically typed languages, so it's possible to encode dynamic typing in them, albeit with less syntactic sugar but it adds no value to them. Most dynamic languages reify this undefined behavior and allow it to be inspected at runtime, so a program can work around a undefined behavior which adds a meta-programming feature to the language (e.g. using method_missing in Ruby) but this is orthogonal to the basic idea of allowing programs with undefined paths to be expressed because of the value found in the defined/executable paths.

Theoretically it's possible to make all behavior defined, you just need to consider all cases in the primitives. For example, if statements/expressions require a boolean but many dynamic languages have the concept of truthiness so one can use anything in the test part of the if and it'll never fail at runtime with some sort of undefined behavior error. You can extend this to all primitives (e.g. arithmetic operators return their unit when presented with invalid values, attribute accessing returns Nil for undefined attributes) and end up with a program that won't crash but may do very unexpected things. This language is equivalent to using on error resume next in classic VB.

There's research on gradual typing which tries to provide a single type system with static and dynamic parts with minimal overhead. The results from this research is very encouraging as it would allow a program to have static guarantees over some areas and not over others, which can be very useful for some forms of programming.

7 comments:

  1. The point that dynamic languages can be seen as static languages with only one type has been made by Robert Harper :

    http://existentialtype.wordpress.com/2011/03/19/dynamic-languages-are-static-languages/

    For an example of rather successful mix of typed and untyped languages, see Typed Racket. It has a type system that is expressive and complex enough to typecheck idiomatic Lisp programs (which are harder to type-check that idiomatic ML programs, because they replace type-informing pattern matching by boolean conditions)

    http://docs.racket-lang.org/ts-guide/

    For serious attempts at integrating dynamic-minded capabilities in a static language, see Alice ML (inspired from Oz), and more recently Scala (in particular reflection).

    http://www.ps.uni-saarland.de/alice/
    http://www.mozart-oz.org/

    ReplyDelete
  2. Slight issue with the use of "undefined" for what most languages do when bad things happen: they throw an exceptions (or conditions or whatever). It's possible to put formal semantics on exceptions and it has been done. More pragmatically, as a programmer I can count on the behavior of exceptions when writing my program.

    When I read "undefined" I expect unsafe behavior that absolutely cannot be reasoned about, e.g. writing to a null pointer in C or having data races in shared mutable memory.

    ReplyDelete
  3. @James undefined definitely is trapped by most languages in the form of exceptions, but I don't think it's essential to this issue. Perhaps I should've used a different word, but nothing comes to mind. Otherwise agree 100%.

    ReplyDelete
  4. @gasche That point is older than Bob Harper's post, I first saw it in LtU discussions, by Frank Atanassow. I didn't link to Harper's post because I find it a bit too pro static typing which I find contrary to the goal of understanding the actual essence of dynamic typing.

    ReplyDelete
  5. If we think of a "type" as the set of values which can be represented at a point in an expression, and if we back off from classifying languages as statically or dynamically typed, the task of integrating statically typed code and dynamically typed code becomes more tractable.

    And, as you point out, "statically typed languages" do need to deal with the same kinds of issues that we need to deal with in "dynamically typed languages". Quite often the set of values which can be treated before some function would be referenced is larger than the domain of that function.

    Put differently, an "assertion" or an "error" can be treated as a "run-time error" or it can be treated as a "type declaration". If it is treated as a "type declaration" its constraints might propagate backwards in the code, or even as a compile time error.

    The problem, here, is one of simplicity. Statically typed languages punt on some typing issues because those issues were deemed, by the language designer(s), to be too complex for static analysis to be a productive approach. In complex cases, full static analysis can approach the difficulty of the halting problem (or, in cases which are not that bad, can approach the difficulty of the knapsack problem).

    Anyways, in my experience, using static typing in an otherwise dynamic language can be a great technique for eliminating overhead which would otherwise be necessary for proper treatment of irrelevant cases.

    ReplyDelete
  6. > In this point of view trying to avoid the execution of paths that a program don't execute is a wasteful tautology: "avoid doing what you won't do" is a meaningless advice.

    I keep hearing arguments like this. But out in the real world, the value of static typing, as experienced by a programmer, stems more from the help in avoiding mistakes that would otherwise cost a lot of time debugging at runtime - like typos, passing in wrong objects, missing implementations etc - and, productivity frameworks like better intellisense, refactoring tools etc. made possible by the type system.

    ReplyDelete
  7. "program to have static guarantees over some areas and not over others"

    - Shen (http://shenlanguage.org/) has this ability. For a concrete example see https://groups.google.com/d/msg/qilang/PPT-j2gptIM/wbDHxmD7AyEJ

    ReplyDelete