Sunday, February 26, 2012

Using laziness to improve the performance of Haskell programs

To improve the performance of a program the only solution is to reduce the amount of work it does. In Haskell we have three options:

  1. Replace an algorithm and/or data structure by one with performance characteristics more suited to our problem. It'll work less because the algorithms work less.
  2. Add strictness to the program so it don't unnecessarily delay computations. It'll work less by not working on thunk creation and such.
  3. Add laziness to the program so it don't do unnecessary computations. It'll work less by not doing some work that will never be used.
Option 1 is usually the only one available in strict programming languages. Option 2 is the usual candidate when Haskell people are talking about improving performance. Option 3 is known, but as it seems counter intuitive when we're trying to optimize for performance we tend to forget about it. It's a great tool to use in some situations and it's always good to think about it.

Suppose we have, inside our program, a function that checks some property of a data structure and depending on the result of this property we branch the flow. Let's say we use the following property:
prop xs = length xs > 2
If the list has at least three elements the property is True, otherwise False. The problem here is we fold the entire list to find its length. Thanks to laziness we are only going through the spine of the list not the actual elements (which may do expensive computations on their own) but it's a waste to do all this work to check if the list has at least three elements. The following program does the same computation and works much less:
prop (_:(_:(_:_))) = True
prop _             = False
The pattern only traverses at most three conses of the list. Testing both versions in GHCi demonstrate the performance improvement. It seems a fact of life that these transformations are necessary, but actually this is due to some unnecessary strictness outside our program.

We only need to write down this new version because length returns an Int which is a strict, unboxed, value in GHC. So to check if length xs > 2 we need to compare two strict values which need to be fully evaluated, so length must be fully evaluated.

Now that we found the culprit, what can we do? As usual we are saved by theory and it's paladin. Let's digress a bit and find a lazier number representation.

The length of a list is natural number, not an integer. There's no list with negative length, it makes no sense. If we use the Peano axioms to express natural numbers we end up with the following datatype:
data N = Z   -- A natural N is either zero (Z)
       | S N -- or the sucessor of a natural (S N)
Naturals have a total order:
instance Ord N where
    Z   <= _   = True
    _   <= Z   = False
    S x <= S y = x <= y
This formulation of the naturals is as lazy as we need. The comparison will only go as far as it needs, until we reach a zero on either side.

Now we restate the property, but using genericLength instead of length and using our natural number formulation (assume the usual Eq and Num instances for N):
prop xs = genericLength xs > (2 :: N)
Now the strictness problem disappears. If we think about this a bit we'll realize our earlier performance improvement is isomorphic to this, because the spine of a list is isomorphic to the natural numbers.

Let that sink in for  a while.

If we write prop using naturals, but in the second form we have this:
prop xs = case (genericLength xs :: N) of
    (S (S (S _))) = True
    _             = False
The only difference in the patterns is we're using S x instead of _:x. Remembering that _:x means we care only about the shape of the list not its contents it becomes clear that empty lists are equivalent to zero and a cons is equivalent to a successor.

Our optimization was just stating in long form the isomorphism between the spine of a list and natural numbers.