Haskell: Birecursion Schemes
Alternatively: Recursion Schemes 2: Here We Go Again
Birecursion schemes
Recursion: from the Latin recursio, roughly meaning "to return again".
I have long been a reader of computer science literature, but I now I feel that I ought to contribute something back. As part of a larger project, I've been examining the concept of birecursion. Here, I will do my best to explain my motivating factors, and point out which bits of the resulting code I find to be interesting, and why.
If you just want to skip straight to the code, you can check out the experimental library that I've been working that implements all of this.
The implications of asking questions
There is an essential link between functors and recursion, and it goes in both directions. On the surface, we may think of recursion as a functor. If we fix a functor, we get a recursive data type - this is what functors say about recursion. But what of a deeper question? What does recursion say about functors?
Instead of 'a recursive data structure is a functor where the content is the same type as the original type', what if we took the perspective of 'a functor is a recursive data structure where the content type is different from the original type'?
Maybe this is nonsense, but I like asking nonsense questions sometimes, on the offhand chance that they might not be nonsense.
Given some functor data type fa :: f a
, a natural transformation natf :: forall a . f a -> g a
, and a function f :: a -> Fix g
, then we can trivially compose hoist natf . fmap f
(or fmap f . hoist natf
) to get toG :: f a -> Fix g
. I am of course ignoring a bit of Fix
wrapping and unwrapping, but the point is made - we have unified f
and a
into a single data type g
.
If g
internally preserves the distinction between f
and a
, and the proper inverses exist, then we can then undo this by unwrapping a layer of Fix g
to check and see if it is an a
and thus we should use g :: Fix g -> a
, or if it an f
and thus we should use natg :: forall a . g a -> f a
. This allows us to (again, ignoring a bit of Fix
wrapping and unwrapping), obtain the original f a
.
This relates to our earlier question, and makes us ask other ones. What if f
itself is recursive? What if a
is a recursive data type too? If recursion is functors, and functors are recursion, and bifunctors exist, what does this mean for recursion?
Recursion is something I've long been fascinated with, and I spend a fair bit of time pondering things that are, on the surface, a tad ridiculous. Things like "what happens if you recurse recursion itself?" What does that even mean? Is it a game, trying to spot the recursion, or figure out 'the next way' that something recurses? Or is it meaningful, with there being a practical use for some of it?
This is what we'll be finding out today. Now on with the code!
Mono-recursion
We'll start with a definition of standard mono-recursion (akin to monofunctor
). This is just a standard case of recursion, and we're writing it out for completeness, and to compare against our extended variants
We'll start with the type classes:
type family Base t :: * -> *
class Functor (Base t) => Recursive t where
project :: t -> Base t t
class Functor (Base t) => Corecursive t where
embed :: Base t t -> t
type Iso a b = (Recursive a, Corecursive b, Base a ~ Base b)
A quick refresher, a recursive structure can be interpreted as a stack of Base
functor layers. That is, t ~ Base t (Base t (Base t ...)) ~ Fix (Base t)
, you can unwrap and rewrap them one layer at a time with project
and embed
, and if two recursive types share a base, they are Iso
-morphic, and you can freely turn one into the other.
When traversing a recursive structure layer by layer, you can do something going down, you recurse, and you can do something coming back up. If you do something both going down and coming up, you've got a hylo
morphism.
hylo :: (Functor f) => (f b -> b) -> (a -> f a) -> a -> b
hylo alg coalg = h where h = alg . fmap h . coalg
This is easier to understand if we rename our arguments, and flip the direction using >>>
instead of .
. First we go down
, then we recurse with fmap
, and then we come back up
:
hylo up down = down >>> fmap (hylo up down) >>> up
If we expand this version of hylo a few times, we can more clearly see that we're just burrowing through layers using fmap
:
down >>> fmap (down >>> fmap (down >>> (...) >>> up) >>> up) >>> up
It is important to understand how hylo
works, and what it does, because we'll be using it as the core of our exploration.
So lets explore with our next question: what are we doing while traversing this structure?
If you do nothing going down, and something coming up, you've got a cata
morphism where Recursive
provides us with project
, a 'do nothing going down' function. cata
is great for combining results from the data structure into a single answer, like when counting the number of results, or producing a cryptographic hash.
cata :: (Recursive t) => (Base t a -> a) -> t -> a
cata alg = hylo alg project
If you do something going down, and nothing coming up, you've got an ana
morphism where Corecursive
provides us with embed
, a 'do nothing coming up' function. ana
is great for generating a data structure from a seed, like iterating over a function to produce a list, or tracking the history of a computation.
ana :: (Corecursive t) => (a -> Base t a) -> a -> t
ana = hylo embed
There's a lot we can do with just these functions, but we can also trivially make monadic variants to allow for effects such as IO
while using them, such as hyloM
:
hyloM :: (Monad m, Traversable t) => (t b -> m b) -> (a -> m (t a)) -> a -> m b
hyloM alg coalg = h where h = alg <=< traverse h <=< coalg
-- hylo for comparison
hylo alg coalg = h where h = alg <<< fmap h <<< coalg
Using hylo
for comparison, we can see that we've simply upgraded from functors to monads by using traverse
instead of fmap
and kliesli arrows <=<
for monadic function composition instead of arrows <<<
for function composition. Remember this, as we'll use this understanding to create variants in other ways.
Finally, we also have a few special functions - the iso
morphism, and the natural trans
formation.
iso :: (Iso s t) => s -> t
iso = hylo embed project
trans :: (Recursive s, Corecursive t) => (forall a. Base s a -> Base t a) -> s -> t
trans f = hylo (embed . f) project
iso
is what you get when you do both nothing going down, and nothing coming back up. However, it doesn't quite do 'nothing' - if two data types are isomorphic, we can change one type into the other because they have the same base functor Base s ~ Base t
. For example, newtype AB = AB (a,b)
is isomorphic to newtype BA = BA (b,a)
, and we should be able to convert freely between the two.
trans
lets you do the same for things that are not isomorphic, if you happen to have a natural transformation function. Unlike iso
, which is bidirectional, trans
allows for one-way functions. For example, for data List a = Nil | Cons a (List a)
and data Tree a = Tree a [Tree a]
we can convert any list Cons a (Cons b (Cons c Nil))
into a rose tree Node a [Node b [Node c []]]
without losing any information - however, since a tree may contain multiple children, we can't always perform the reverse trick without losing information.
If you haven't already, you should check out the recursion-schemes library, which covers these mono-recursive schemes and several others. The documentation has become increasingly friendly over the last several years, and if I were to re-implement everything in it here, I'd never get around to exploring the new concepts - hence why I am focusing on hylo
, cata
, and ana
here.
Base bifunctors
So far, this is just the same stuff we'd find in the recursion-schemes
library - we've defined it here to help with asking (and answering) our questions:
How do we recurse more? If this is 'mono-recursion', what then is 'bi-recursion'?
Given that recursion is about base functors, and that the answer to 'what is more functor?' is 'bifunctor', the answer to 'what is more recursion?' is 'base bifunctor'. This is aligned with our perspective of there being two ways of looking at recursion - the 'same' recursion side, and the 'different' functor side, and as a result, there are two ways of 'being bi-recursive' - we can either recurse in one slot and not the other, or we can recurse in both slots. We can be 'half-recursive, half-functor', or we can be 'twice-recursive'. This splitting is quite appropriate, as by going up a dimension of recursion, we have induced a new symmetry to break everything along.
The former, we will call 'direcursive', yields us recursion in serial - recursive data types containing recursive data types - and the latter, keeping the name 'birecursive', yields us recursion in parallel - partitioning recursion accordingly to the left and to the right.
Di-recursion: half-recursive half-functors, or: increasing the functor-ness from 0 to 1
The prefix 'di-' is used here to mean 'half'. Whereas a recursive data type has a base functor, a (right) direcursive functor only recurses in half of its base bifunctor's slots - the right slot. In addition to performing the same functions as mono-recursion schemes, di-recursion schemes are geared to allow for mapping over the recursive functor's content at the same time. This allows arbitrary access to inner layers, not unlike functor / monad transformer stacks, and is especially useful for partial / distributed data structures.
If we view recursion as functors a la 'fixed stack of base functors where the data varies' as one level of recursion, then 'fixed stack of recursive functors where the type (and data) varies' is the next level of recursion in the serial direction. It describes a stack of recursive functors, such as a Tree (List (IntMap a))
.
Defining the di-
variants of the recursive classes is simple - we extend them to include a new type parameter for the functor content type:
type family Dibase (t :: * -> *) :: * -> * -> *
class (Bifunctor (Dibase f)) => Direcursive f where
diproject :: f a -> Dibase f a (f a)
class (Bifunctor (Dibase f)) => Codirecursive f where
diembed :: Dibase f a (f a) -> f a
type Diiso a b = (Direcursive a, Codirecursive b, Dibase a ~ Dibase b)
NOTE: Linking
Direcursive
toRecursive
likeBifunctor
toFunctor
is at time of writing incomplete.
Note the subtle difference in the type of Dibase
as compared to Base
, and the exposure of the functor content type a
within the class function signatures. We have now exposed the recursive data type as a functor, allowing us access to its content type at the cost of having to now mention it when using it.
Similarly, to extend hylo
, we lift the functor type to be a bifunctor, and add a new function argument for mapping the functor content. Because a direcursive is a recursive functor, instead of just recursing using fmap
, we map and recurse at the same time, using bimap
. We're finally starting to put that knowledge we gained from dissecting how to upgrade hylo
to use traverse
- here we're upgrading it to use bimap
instead.
dihylo :: (Bifunctor f) => (a -> b) -> (f b d -> d) -> (c -> f a c) -> c -> d
dihylo f alg coalg = h where h = alg . bimap f h . coalg
With this in mind, dicata
and diana
are simple:
dicata :: (Direcursive f) => (a -> b) -> (Dibase f b c -> c) -> f a -> c
dicata f alg = dihylo f alg diproject
diana :: (Codirecursive f) => (a -> b) -> (c -> Dibase f a c) -> c -> f b
diana f coalg = dihylo f diembed coalg
Because every Direcursive
is Recursive
, we know that the following should hold true:
hylo = dihylo id
cata = dicata id
ana = diana id
It is also just as easy to make monadic di-
variants, just by combining the tricks for making di-
and -M
variants. We can't use both bimap
and traverse
at the same time, so we'll use bitraverse
which combines the two. Bitraversable
doesn't show up too often, but it sure is handy here.
diprojectM :: (Monad m, Direcursive f) => f a -> m (Dibase f a (f a))
diprojectM = pure . diproject
diembedM :: (Monad m, Codirecursive f) => Dibase f a (f a) -> m (f a)
diembedM = pure . diembed
dicataM :: (Monad m, Direcursive f, Bitraversable (Dibase f)) => (a -> m b) -> (Dibase f b c -> m c) -> f a -> m c
dicataM f alg = dihyloM f alg diprojectM
dianaM :: (Monad m, Codirecursive f, Bitraversable (Dibase f)) => (a -> m b) -> (c -> m (Dibase f a c)) -> c -> m (f b)
dianaM f coalg = dihyloM f diembedM coalg
dihyloM :: (Monad m, Bitraversable f) => (a -> m b) -> (f b d -> m d) -> (c -> m (f a c)) -> c -> m d
dihyloM f alg coalg = h where h = alg <=< bitraverse f h <=< coalg
Interestingly, unlike mono- and bi-recursion, direcursion allows for us to define applicative variants of the recursive functions. This is because direcursives are recursive functors, which implies recursive applicatives. We're getting some good mileage out of Bitraversable
here, aren't we?
dicataA :: (Applicative f, Direcursive t, Bitraversable (Dibase t)) => (a -> f b) -> (Dibase t b c -> c) -> t a -> f c
dicataA f alg = dihyloA f alg diproject
dianaA :: (Applicative f, Codirecursive t, Bitraversable (Dibase t)) => (a -> f b) -> (c -> Dibase t a c) -> c -> f (t b)
dianaA f coalg = dihyloA f diembed coalg
dihyloA :: (Applicative f, Bitraversable t) => (a -> f c) -> (t c d -> d) -> (b -> t a b) -> b -> f d
dihyloA f alg coalg = h where h = fmap alg . bitraverse f h . coalg
With that out of the way, the di-
functions could not be complete without their own version of iso
and trans
.
diiso :: (Diiso s t) => (a -> b) -> s a -> t b
diiso f = dihylo f diembed diproject
ditrans :: (Direcursive s, Codirecursive t) => (a -> b) -> (forall c . Dibase s b c -> Dibase t b c) -> s a -> t b
ditrans f g = dihylo f (diembed . g) diproject
These I find to be particularly interesting. Notably, the half-functor, half-recursive nature of things is evident here. It turns out that fmap
is diiso
and trans
(hoist
) is ditrans id
. This is a rather important result, as it shows how, for f (g ...)
, a natural transformation of g
at a lower level, is a map transformation of f
at a higher level, and how the fmap
of a recursive functor is composed from many smaller first
s and second
s of its base bifunctor. There is an relationship / symmetry here between stacks of base functors, and stacks of recursive functors.
We can take this as far as creating a Difunctor
type class for all types that implement fmap
and hoist
-like functions, although the function names do conflict with the existing Profunctor
class, so I may have some renaming to do.
class (Functor f, Direcursive f, Codirecursive f) => Difunctor f where
dimap :: (a -> b) -> (forall c . Dibase f b c -> Dibase g b c) -> f a -> g b
-- dimap = ditrans
lmap :: (a -> b) -> f a -> f b
-- lmap = diiso
rmap :: (forall b. Dibase f a b -> Dibase g a b) -> f a -> g a
-- rmap = ditrans id
I'm still playing around with this, but this emphasis of fmap
and trans / ditrans id
as two sides of the same process has struck me soundly.
Regardless, what's the point of these classes? We can actually implement them in terms of the original Base
, Recursive
, and Corecursive
classes, relying on bimap == first . second
.
dihylo :: (forall x . Functor (f x), Bifunctor f) => (a -> b) -> (f b d -> d) -> (c -> f a c) -> c -> d
dihylo f alg coalg = hylo alg (first f . coalg)
However an explicit distinction of recursive functors has its advantages. It allows for a precise discussion of partial data structures that may be otherwise too large to transfer across the network or fit in memory, and makes it almost trivial to lazily fetch a data structure as-needed, or update just part of a data structure with a patch.
For example, we can fetch and filter a paged list of search results using dihyloM
:
searchArticles
:: (ArticleURL -> IO Article) -- fetchArticle
-> (Page Article [Article] -> IO [Article]) -- filterPage
-> (PageURL -> IO (Page ArticleURL PageURL)) -- fetchPage
-> PageURL -> IO [Article]
searchArticles = dihyloM
The breaking up of a recursive functor into fragments allows it to be distributed by reference (a form of abstraction akin to naming and pointers), and fragmenting data structures in this manner is one part of solving the problem of generalized distributed computing, which sounds a bit blockchain-y, except it doesn't rely on hashing or mining to define it. It is a flexible enough concept to talk about using CPU registers and caches, or C pointers and memory, or file paths and disks, or network URLs and servers.
At this point, it may be worth providing an alternative, friendlier terminology. While having a precise generative nomenclature was necessary for codifying all of this to avoid confusion between bi- and di- recursion, the original nomenclature for dihylo
was actually hyloMap
, and that is reflected here in the suffixes of functions.
type Base1 t = Dibase t
type Recursive1 f = Direcursive f
type Corecursive1 f = Codirecursive f
type Iso1 f g = Diiso f g
type Refunctor f = Difunctor f
type Fix1 f = Difix f
type Free1 f a b = Difree f a b
type Cofree1 f a b = Dicofree f a b
-- NOTE: Not Data.Foldable.foldMap
foldMap :: (Recursive1 f) => (a -> b) -> (Base1 f b c -> c) -> f a -> c
foldMap = dicata
unfoldMap :: (Corecursive1 f) => (a -> b) -> (c -> Base1 f a c) -> c -> f b
unfoldMap = diana
refoldMap :: (Bifunctor f) => (a -> b) -> (f b d -> d) -> (c -> f a c) -> c -> d
refoldMap = dihylo
refixMap :: (Iso1 s t) => (a -> b) -> s a -> t b
refixMap = diiso
hoistMap :: (Recursive1 s, Corecursive1 t) => (a -> b) -> (forall c . Base1 s b c -> Base1 t b c) -> s a -> t b
hoistMap = ditrans
Please let me know what you think of this terminology vs using the di-
prefix.
So what's the point of all of this? Doesn't Haskell already do all of this recursive functor stuff for us automatically when we stack data types a la Tree (List (IntMap a))
? Well, yes, but Haskell also does recursive data types for us, and yet it was worth writing recursion-schemes
to break things into base functors.
Here, we can see that just implementing diproject
, diembed
, and bimap
gets us a free implementation of fmap
through diiso
.
data List a = Nil | Cons a (List a)
type instance Base (List a) = ListF a
type instance Dibase List = ListF
instance Recursive (List a) where
project = diproject
instance Corecursive (List a) where
embed = diembed
instance Direcursive List where
diproject Nil = NilF
diproject (Cons a r) = ConsF a r
instance Codirecursive List where
diembed NilF = Nil
diembed (ConsF a r) = Cons a r
instance Functor List where
fmap = diiso
data ListF a r = NilF | ConsF a r
instance Functor (ListF a) where
fmap = second
instance Bifunctor ListF where
bimap _ _ NilF = NilF
bimap f g (ConsF a r) = ConsF (f a) (g r)
Instead of being implicitly recursive and hidden by the compiler, now it is exposed, and we can get a ton of stuff either for free, or much more easily using base functors.
Bi-recursion: bi-recursive bi-functors, or: increasing the recursive-ness from 1 to 2
If we view recursion as recursion, then 'fixed stack of base functors where the recursion varies' is the next level of recursion in the parallel direction.
I have admittedly spent less time considering the concept, but it seems to be promising for its own reasons, and I suspect that I have arrived at something Clowns-and-Jokers-esque, but involving recursion, and a with a different meaning for bihoist
. I'm sure that further investigation of this will tell me more in due time. For now, onto the definitions! We can follow the same pattern as before, as its getting to be quite familiar.
Once again, we'll define our typeclasses:
type family Bibase a :: (* -> * -> *)
class (Bifunctor (Bibase t)) => Birecursive t where
biproject :: t -> Bibase t t t
class (Bifunctor (Bibase t)) => Cobirecursive t where
biembed :: Bibase t t t -> t
Note the subtle difference in the type of Bibase
as compared to both Base
and Dibase
. Additionally, biproject
is the projection of a bifunctor, but it is only projecting it once. The same goes for biembed
.
As before, we extend hylo
again, lifting the functor type parameter to be a bifunctor, but this time there is no ancillary mapping function because we recurse in both base bifunctor slots.
bihylo :: (Bifunctor f) => (f b b -> b) -> (a -> f a a) -> a -> b
bihylo alg coalg = h where h = alg . bimap h h . coalg
bicata
and biana
predictably follow:
bicata :: (Birecursive a) => (Bibase a b b -> b) -> a -> b
bicata alg = bihylo alg biproject
biana :: (Cobirecursive b) => (a -> Bibase b a a) -> a -> b
biana = bihylo biembed
The monadic variants are easy too, using bitraverse
instead of traverse
.
biprojectM :: (Monad m, Birecursive t) => t -> m (Bibase t t t)
biprojectM = pure . biproject
biembedM :: (Monad m, Cobirecursive t) => Bibase t t t -> m t
biembedM = pure . biembed
bicataM :: (Monad m, Bitraversable (Bibase a), Birecursive a) => (Bibase a b b -> m b) -> a -> m b
bicataM alg = bihyloM alg biprojectM
bianaM :: (Monad m, Bitraversable (Bibase b), Cobirecursive b) => (a -> m (Bibase b a a)) -> a -> m b
bianaM = bihyloM biembedM
bihyloM :: (Monad m, Bifunctor f, Bitraversable f) => (f b b -> m b) -> (a -> m (f a a)) -> a -> m b
bihyloM alg coalg = h where h = alg <=< bitraverse h h <=< coalg
And finally, of course, we have our biiso
and bitrans
as well:
biiso :: (Biiso a b) => a -> b
biiso = bihylo biembed biproject
bitrans :: (Birecursive s, Cobirecursive t) => (forall a b. Bibase s a b -> Bibase t a b) -> s -> t
bitrans f = bihylo (biembed . f) biproject
It is worth noting that bitrans
is the natural transformation of a bifunctor, and not a bi-transformation. The bifunctor has two functor slots, and we transform the bifunctor once without affecting either of the slots.
As I said before, I'm not quite sure what to use Birecursive
for, but it seems it might be useful when combining multiple types involving recursion into one super-type (such as with Generics
), or to consider that Birecursive t ~ Recursive (Join t)
. We could even extend the concept further, but on second thought, let’s not ~~go to Camelot~~ talk about trirecursion - tis a silly place.
Conclusion
Although they were confusing at first, once we figured out the pattern for deriving things, di-
and bi-
recursion turned out to be quite simple. These classes appear to be quite useful when used appropriately, with the di-
variants being especially powerful for composing recursive types. In particular, that fmap f = diiso f = ditrans f id
and trans
= ditrans id
is enlightening. The serial vs parallel nature of di-
vs bi-
recursion is also worth investigating more.
I think this gives recursion a place the functor hierarchy, along side things like bifunctors, and profunctors, and indexed functors, and it composes nicely with them as well - we can easily consider things such as recursive profuntors
or indexed recursive functors
or whatnot.
You can check out this code as a library that I'm building to reflect all of this, and I hope to further extend these classes the next time I post, likely on the topic of indexed mono-, bi-, and di- recursives
.
An aside about recursion and tetration
The next level of recursion is more than just doing the same thing again extending deeper along the same dimension - it is extending deeper in a new direction. Sum types a + b
(a or b
) and product types a * b
(a and b
) are an example of this - one can recurse deeper in the same way by composing longer sum types a + b + c + ...
, or in a new way and direction with product types a * b * c * ...
. If we extend this system further, the next level of recursion is exponent types (functions), where b ^ a
means the same thing as a -> b
.
It is important to note a few things - the first of which is that chaining one level of recursion does not get you to the next level. That is, no matter how deep your sum type is a + b + c + ...
, it never gets you a product type a * b
, and no matter how deep your (sum-of-) product type is a * b * c * ...
, it will never get you an exponential (function) type b ^ a
.
The second thing to note is that, neither does defining a sum-of-products data type give us a value, nor does defining a function type get us an implementation - as in, it does not generate a value that we can inspect, or function that we can run, and we have merely described the shape of such things.
These two things are important to remember this when considering attempting to define what a tetration (B ^^ A
) type* is in this system. Chaining exponentiation A -> B -> C
is still just exponentiation A -> (B -> C)
and not tetration, any more than chaining sum types begets you a product type. This also explains why and how higher-kinded types are still just types, because expressions like f :: * -> (* -> *)
just make them a deeper type. Our tetration type B ^^ A
then must be some function A :: X -> Y or A :: Y ^ X
repeated or iterated across a space defined by B
.
* Research around this topic is sparse to find and dense to read, and so I am still playing with the concept.
I think that a natural description of distributed computing is to be found somewhere here in this process, and this discussion of birecursion schemes is an essential part of it, since 'distributed' is in some sense 'and more' in yet another direction, the direction from one computer to many. A description of distributed computing does not create new computers, but it does describe the shape of how a function A
runs within a network B
. It appears to be part of our sum / products / exponents system, and it may be worth exploring this notion more thoroughly, in light of how our 'singular' computers are in fact made up of many small computers, down to individual logic gates.
This is a bigger discussion that ties many things together, including Birecursion, the Free Boolean Cube, and more. That discussion is going to take time, so stick around!