Tuesday, December 20, 2011

Understanding higher-order code for free


Recently Rahul Goma Phulore asked for an intuition to understand the Haskell expression:
(∙ repeat) ∙ take
In Haskell both (f ∙) and (∙ g) are syntactic sugar for (λg → f ∙ g) and (λf → f ∙ g) respectively. These are called sections.

Interestingly either sections or function composition alone are pretty easy to grasp, but when combined become food for sphinxes' enigmas.

Runar Oli gave a couple of excellent answers for his particular question, but I thought this is a good example to walk through some beauties from logic and higher-order types. We could try to look at this code and build some sort of intuitions based on spacesuits or burritos, but instead we can follow the golden road of logic and exploit a well known fact about programs and types. The Curry-Howard isomorphism states that types are isomorphic to theorems and programs are isomorphic to proofs, so whenever we have a type and a program implementing that type the program is a proof of whatever theorem the type represents. Usually a type is much more general than the particular program we're writing so the type doesn't capture the entire semantics of our program. In a sense this is not good situation, because our program depends on a stricter semantics than the types give, but I'll ignore this conundrum for now and instead focus on the types as theorems bit.

Once we start seeing types as theorems we can apply other results from logic to our problem. For example, if we know our theorem is related to a another theorem (e.g. it implies something else) than we can exploit the results of the related theorem. In the excellent paper Theorems for Free Wadler shows that parametricity gives us many useful theorems because we can't produce polymorphic values ex nihilo. Suppose you have the following type:
f ∷ a → a
Ignoring the implementation (don't even try to think about it) what it says about f, what is this theorem? For starters, it's a valid, tautological, implication that states "a implies a". Also it states that there exists a program, such that if you give it something that is a proof of a theorem it'll return something that is a proof of the same theorem, for all possible theorems imaginable. Let's that thought sink for a while: for all possible theorems imaginable.

There is only one proof that is valid for all theorems, a proof that exploits a loophole in the given logic system (if such loopholes exist) and proves anything. In Haskell this loophole is (i.e. bottom) the value that inhabits all types, in other words the proof for all theorems. can manifest in many guises, but essentially it's due to non-termination: all programs that fail to terminate are isomorphic to bottom and can be used to prove any theorem. Any Turing-complete language suffers from this problem. We can ignore  even in Haskell because fast and loose reasoning is morally correct, so we'll pretend it doesn't exist for the rest of this post.

Ignoring  , in all forms, we can see that there's only a proof that's isomorphic to any given proof, namely the same given proof. In other words, the only program implements a → a is the identity:
λx → x
For more complex types we can have many possible proofs, which means the type is more lenient than our program and if we depend on the program semantics we may find some bugs later when we change the program and it still type checks but it no longer does what we want. Let's take a look on the type of a function we're interested in:
repeat ∷ a  [a]
It states that for all proofs it'll return a list of equivalent proofs. A list is either empty or an element followed by a list:
data List a = Empty | Cons a (List a)
We can encode algebraic data types  as explicit sums of products. Ignoring the labels the above is equivalent to:
data List a = () + (a × List a)
() is the unit of our algebra, if we see List as a logical term, then the equivalent of unit is (i.e. true, in this formulation we can view  as false, which only means our proof proves nothing). We can also use conjunction as × and disjunction  as +:
data List a =   (a  List a)
A list is either a trivial theorem or the conjunction of a theorem with itself. A list is not a very interesting theorem, but the proofs represented by lists are more interesting. Going back to our example, repeat is a theorem that states "a theorem implies either  or itself", which is also not much interesting. Which kinds of proofs can satisfy this theorem (i.e. type check)? Below are some choices:
repeat x = []    -- ignores the given proof and returns the empy list
repeat x = [x]   -- returns a singleton list with the given proof
repeat x = [x,x] -- a list with the proof twice
repeat x = ...   -- and so on
Actually as the list theorem is isomorphic to  or a conjunction of any number of copies of a theorem, we can clearly see that it is isomorphic to the pair of a theorem and a natural number, so all possible proofs of repeat are lists containing any natural number of copies of a proof. This is all neat but the most important thing about this is a free theorem stating that repeat will return either the [] or a list of a number of copies of the given element. It can't produce the elements of the list ex nihilo, they must all be equal to the given element.

Going further let's see the type of ( repeat):
( repeat) ∷ ([a]  b)  (a  b)
It takes a function, an a and returns a b. The function takes a [a] and returns a b. Now it's easier to see what it's doing. ( repeat) can't produce proofs of b ex nihilo, it must use the given function (remember we disallowed , so it can't just diverge). What are possible possible proofs that satisfy this theorem:
(∙ repeat) = λf → λx → f []
(∙ repeat) = λf → λx → f [x]
(∙ repeat) = λf → λx → f [x,x]
(∙ repeat) = ... -- and so on
The pattern is identical to the pattern saw in the possible implementations of repeat, which is either surprising or unsurprising, depending on how it fancies you. The main difference is that repeat just returned the value (i.e. [] or [x] or ...) and ( repeat) takes a function and applies it to the value.

Let's see the type of take:
take ∷ Int → [a] → [a]
This theorem has a few possible proofs:
take = λn → λxs → []
take = λn → λxs → xs
take = λn → λxs → case xs of {[] → []; (x:_) → [x]}
take = ...
None of these implementations do what we expect from take but all type check (i.e. prove the theorem). Now we're seeing how flimsy our types are, but it's mostly because we expect too much from general theorems. Either way, we have the same free theorems here, the resulting list will either be [] or contain only elements from the original list. This free theorem comes from the parametricity of take, every parametric function has some related free theorem.

Now what is the type of (∙ take)?
( take) ∷ (([a]  [a])  b)  Int  b
It looks familiar to the type of ( repeat) doesn't it? Actually we can make it more similar, let's first contrast the types of repeat the partially applied take:
repeat ∷  a   [a]
take 1 ∷ [a] 
 [a]
Both have the same structure and their sections also look similar:
( repeat) ∷ ([a]  b)  ( a   b)
(
 take 1) ∷ ([a]  b)  ([a]  b)
It seems that taking the right section of a function will take it inside out and "promote" types in both sides of the  to functions to b. You can see this a CPS transform of the function, with the continuation as the first argument.

As with (∙ repeat), (∙ take 1) has more interesting free theorems, relating on how it can produce b from [a]. In most higher order compositions the number of free theorems increase, because we (usually) have more parametricity and the universal quantitification limits how we can compose those.

Dissecting ( take) type we see that it takes a higher-order function an Int and returns a b. Let's see some possible proofs for this theorem:
(∙ take) = λf → λn → f id
(∙ take) = λf → λn → f (const [])
(∙ take) = λf → λn → f (λxs → case xs of {[] → []; (x:_) → [x]})
(∙ take) = ...
Like what happened with the possible (∙ repeat) proofs we are mainly repeating the possibles proofs for take and making it higher-order. We can fake some but not much, namely we can return [] (which are trivial due to being isomorphic to ) or we can pick some elements from the list, but we can't produce elements ex nihilo and we must call f (i.e. our continuation).

We can read the type of (∙ take) as "Given a function that uses mappings from lists to lists to produce something and an int it'll return something". Whatever ( take) does is limited to look at an Int an to provide a function of type [a]  [a].  f can ignore the function provided by ( take) or use it but it must provide a b.

Finally let's look at our original enigma. First we can use the right section of take to make everything prefix:
(∙ repeat) ∙ take = (∙ take) (∙ repeat)
Remembering the types of (∙ repeat) and (∙ take) we can use the basic application rule to find the resulting type (with some unification):
(∙ repeat) ∷ ( [a] →  b ) → (a → b)
(∙ take)   ∷ (([a] → [a]) →  c     ) → Int → c
As we apply (∙ take) to (∙ repeat) we unify b with [a] and c with (a  b), which is  (a  [a]), and we end up with:
(∙ take) (∙ repeat) ∷ Int → (a → [a])
This final type has the same free theorems as take and repeat, namely it will either return [] or a list with some number of copies of the same element.

Most of this exercise may seem pointless but it's mostly due to the uninteresting list theorem and, consequently, to the uninteresting take theorem. With some dependent type encoding the length of list we could have a much stronger (and less general) theorem for take and even repeat would be a bit more interesting.

The relation of right sections to continuations is more relevant to understand the types and how these combinators work. A left section it's the reverse, the result is the continuation of something, instead of expecting a continuation:
(repeat ∙)       ∷ (b  a         )  (b  [a])
(uncurry take ∙) ∷ (b 
 (Int, [a]))  (b  [a])
Here I used uncurry to make the correspondence between the types clearer in this case. The reasoning is similar and left as an exercise for the reader.