Memoizing Polymorphic Functions with High School Algebra and Quantifiers (original) (raw)
A little while back Conal Elliott asked about the memoization of polymorphic types. I thought it'd be fun to describe how to memoize such functions in the same spirit as Ralph Hinze's use of tries to memoize non-polymorphic functions. Along the way I'll try to give a brief introduction to quantified types in Haskell as well as showing some applications of the Yoneda lemma at work.
You can think of a generalized trie for a function type T as a type that's isomorphic to T but doesn't have an arrow '->' anywhere in its definition. It's something that contains all the same information as a function, but as a data structure rather than as a function. Hinze showed how to construct these by using the high school algebra axioms on non-polymorphic types. Polymorphic types are types involving quantification. So to make suitable tries for these we need to add some rules for handling quantifiers to high school algebra.
At first it seems unlikely that we could memoize polymorphic types. When Hinze demonstrated how to construct generalized tries he showed how to make a tree structure that was tailored to the specific types in hand. With polymorphic functions we don't know what types we'll be dealing with, so we need a one-size fits all type. That sounds impossible, but it's not.
The first thing we need to look at is universal quantification. Suppose F(a) is a type expression involving the symbol a. Then the type ∀a.F(a) can be thought of as being a bit like the product of F(a) for all possible values of a. So ∀a.F(a) is a bit like the imaginary infinite tuple
data Forall f a = (f Bool, f Int, f Char, f String, f [Bool], f (IO (Int -> Char)), ...)
One reason you can think of it this way is that all of the projections exist. So for any type you choose, say B, there is a function (∀a.F(a)) -> F(B). In Haskell the ∀ is written as
forall
and probably the best known example is the Haskell
id
function of type
forall a. a -> a
. For any concrete type B,
id
gives us a function of type
B -> B
. Note that we usually write the signature simply as
a -> a
. Haskell implicitly prepends a
forall
for every free variable in a type. We have to use the following line of code if we want to be able to use
forall
explicitly (among other things):
{-# LANGUAGE RankNTypes, ExistentialQuantification, EmptyDataDecls #-}
I'll approach the tries through a series of propositions. So here's our first one:
Proposition 1
∀a. a = 0
0 is the type with no elements. ∀a. a is a type that can give us an object of type B for any B. There is no way to to this. How could such a function manufacture an element of B for any B with nothing to work from? It would have to work even for types that haven't been defined yet. (By the way, do you notice a similarity with the axiom of choice?) So ∀a. a is the type with no elements. Here's the usual way to write the type with no elements:
data Void
We also have:
Proposition 2
∀a. aa = 1
If we have a function of type
forall a. a -> a
then for any element of type a you give it, it can give you back an element of type a. There's only one way to do this - it must give you back what you gave it. It can't transform that element in any way because there is no uniform transformation you could write that works for all values of a. So ∀a. aa has one element,
id
.
A slightly more interesting proposition is this:
Proposition 3
∀a. aa.a = 2
A function of type
(a,a) -> a
gives you an a when you give it a pair of a's. As we don't know in advance what type a will be we can't write code that examines a in any way. So a function of this type must return one of the pair, and which one it returns can't depend on the value of the argument. So there are only two functions of this type,
fst
and
snd
.
We can rewrite the last proposition as ∀a. aa2 = 2. That suggests that maybe ∀a. aan = n for any type n. We can go one better. Here's another proposition:
Proposition 4
For any functor F and type n, ∀a. F(a)an = F(n)
I've already talked about that result. Here's an implementation of the isomorphisms:
yoneda :: (forall b . (a -> b) -> f b) -> f a
yoneda t = t id
yoneda' :: Functor f => f a -> (forall b . (a -> b) -> f b)
yoneda' a f = fmap f a
Throughtout this article I'll use the convention that if f is an isomorphism, f' is its inverse.
Now it's time to look at a kind of dual of the above propositions. Instead of universal quantification we'll consider existential quantification. The type ∃a.F(a) is a kind of infinite sum of all types of the form F(a). We can imagine it being a bit like the following definition:
data Exist f a = ABool (f Bool) | AnInt (f Int) | AChar (f Char) | AString (f String) | AListOfBool (f [Bool]) ...
The important point is that given any element of any type we can turn it into an element of ∃a.F(a). You'd think that we could write this in Haskell as
exists a. F(a)
but unfortunately Haskell does things differently. The idea behind the notation is this: as we can put anything of type F(b) into it. So if X = ∃a.F(a) then we have a function F(a) -> X for any a. So we have a function of type ∀a. F(a) -> X. So although this type is existentially quantified, its constructor is universally quantified. We tell Haskell to make a type existentially quantified by telling it the constructor is universally quantified:
data Exist f a = forall a. Exist (f a)
You can think of
Exist
as not being a single constructor, but an infinite family of constructors, like
ABool
,
AnInt
, etc. above.
If you have an element of an ordinary non-polymorphic algebraic sum type then the only way you can do stuff to it is to apply case analysis. To do something with an existential type means you have to perform a kind of infinite case analysis. So to do something with an element of ∃a. F(a) you need to provide an infinite family of functions, one for each possible type. In other words, you need to apply a function of type ∀a. F(a) →B to it.
Time for another proposition:
Proposition 5
∃a. a = 1
It seems weird at first that the sum of all types is 1. But once you put something into this type, you can no longer get any information about it back out again. If you try doing case analysis you have to provide a polymorphic function that accepts an argument of type ∀a. a, which is as good as saying you can't do any case analysis. Proposition 5 is actually a special case of the following:
Proposition 6
For any functor f, ∃a. (na, f(a)) = f(n)
Briefly, the reason this is that the only thing you can do with a matching pair of na and f(a) is apply the former to the latter using
fmap
. This is a kind of dual to the Yoneda lemma and I say more about it here.
We already know from high school algebra that this is true:
Proposition 7
xy+z=xy.xz.
We can write the isomorphisms explicitly:
prop7 :: (Either a b -> c) -> (a -> c, b -> c)
prop7 f = (f . Left, f . Right)
prop7' :: (a -> c, b -> c) -> Either a b -> c
prop7' = uncurry either
It should be no surprise that the following 'infinite' version is true as well:
Proposition 8
x∃a. f(a) = ∀a. xf(a).
We can write the isomorphism directly:
prop8 :: (Exist f a -> x) -> forall a. f a -> x
prop8 g x = g (Exist x)
prop8' :: (forall a. f a -> x) -> Exist f a -> x
prop8' g (Exist x) = g x
We're now equipped to start constructing generalized tries for polymorphic functions. So let's consider memoizing the type
forall a. [a] -> f a
, for
f
a functor. At first this looks hard. We have to memoize a function that can take as argument a list of any type. How can we build a trie if we don't know anything in advance about the type of a? The solution is straightforward. We follow Hinze in applying a bit of high school algebra along with some of the propositions from above. By definition, L(a) =
[a]
is a solution to the equation L(a) = 1+a.L(a). So we want to simplify ∀a. f(a)L(a). We have
f(a)L(a) = f(a)1+a.L(a) = f(a).f(a)a.L(a) = f(a).f(a)a+a2.L(a) = f(a).f(a)a.f(a)a2.L(a)
I hope you can see a bit of a pattern forming. Let's define T(n) = f(a)an.L(a). Then
T(n) = f(a)an.(1+a.L(a)) = f(a)an.T(n+1) = f(n).T(n+1)
That's it! We can translate this definition directly into Haskell.
data T f n = T (f n) (T f (Maybe n))
I'm using the fact that
Maybe n
is standard Haskell for the type n+1. (But note that this equality is only valid when we think of the list type as data, not codata. So like with Hinze's original tries, values at infinite lists don't get memoized.)
To build the isomorphism we need to trace through the steps in the derivation. At one point we used an+a1+n.L(a) = an.L(a) which we can implement as the pair of isomorphisms:
lemma :: Either (n -> a) (Maybe n -> a, [a]) -> (n -> a, [a])
lemma (Left f) = (f, [])
lemma (Right (f, xs)) = (\n -> f (Just n),f Nothing : xs)
lemma' :: (n -> a, [a]) -> Either (n -> a) (Maybe n -> a, [a])
lemma' (f, []) = Left f
lemma' (f, x:xs) = Right (maybe x f, xs)
We can put the other steps together with this to give:
memoize :: Functor f => (forall a. (n -> a, [a]) -> f a) -> T f n
memoize f = let x = prop7 (f . lemma)
in T (yoneda (fst x)) (memoize (snd x))
memoize' :: Functor f => T f n -> forall a. (n -> a, [a]) -> f a
memoize' (T a b) = let y = (yoneda' a, memoize' b)
in prop7' y . lemma'
Let's try a few examples. I'll use the identity functor for the first example.
data I a = I a deriving Show
instance Functor I where
fmap f (I a) = I (f a)
Here's our first test function and some data to try it on:
test1 (f, xs) = I $ if length xs>0 then head xs else f ()
data1 = (const 1,[2,3,4])
In
data1
we'e using a function to represent a kind of 'head' before the main list. For the next example we're leaving the first element of the pair undefined so that
data2
is effectively of list type:
test2 (f, xs) = reverse xs
data2 = (undefined,[1..10])
We can test them by building the memoized versions of these functions.
memo1 = memoize test1 :: T I ()
memo2 = memoize test2 :: T [] Void
and then apply them
ex1 = memoize' memo1 data1
ex2 = memoize' memo2 data2
It appears to work!
So what's actually going on? We have
T(0) = f(0).T(1) = f(0).f(1).T(2) = ... = f(0).f(1).f(2).f(3)...
Now consider a function
g :: [a] -> f a
applied to a list. If the list isn't an infinite stream then it must have a certain length, say n. From these elements it builds something of type
f a
. However this
f a
is constructed, each of the elements of type
a
in it must be constructed from one of the n elements in the list. So if we apply g to the list [0,1,2,...,n-1] it will construct an element of
f a
where each
a
in it contains a label saying which position in the list it came from. (Compare with Neel's comment here). If we use integers we don't get a perfect trie because there are more elements of type
f Integer
than there are ways to indicate source positions. What we need is that for each length of list, n, we have a type with precisely n elements. And that's what the type n gives us.
We can memoize many different functions this way, though if the functor f is a function type you'll need to use some of Hinze's techniques to eliminate them. And you'll notice I haven't used all of the propositions above. I've tried to give some extra tools to allow people to memoize more types than just my two examples.
One last thing: I wouldn't use the type above to memoize in a real world application. But the methods above could be used to derive approximate tries that are efficient. One obvious example of an approximation would be to use
Int
instead of the finite types.
Update: I forgot to provide one very important link. This post was inspired by Thorsten Altenkirch's post here.