## We're still under construction!

And you just found out where. We're still working on this page. How did you get in here, anyway?

# Writing a Parser Combinator Library in Haskell

We write our own stripped-down version of the popular parsec library.

# Parser Combinators real quick

Parser-combinators are what truly got me into Haskell. I saw a video of a professor giving a lecture on building a parser from scratch. He did it with just a handful of code, over the course of an hour, explaining everything along the way.

I was at the time working with Erlang due to my interest in distributed computing and filesystems, but I was already entertaining the idea of learning haskell. There was a distributed computing library somewhat modeled after Erlang that I wanted to check out (the now-defunct cloud-haskell project, which I have carried on its spirit). This video made me jump ship entirely.

So what are parser combinators? They're a great way of building larger parsers our of bits of smaller parsers. You take *one parser* and **combine it** with *another parser*. Parser combinators! Quite often, you can end up with a parser that is roughly in the shape of the data it is parsing, which is a rather neat and useful structural trick.

So with that out of the way, let's get to defining our project `cabal`

file! It's rather short, being 90% boilerplate. We only are using the `base`

library, and we only have the one `Lambda.Parser`

module.

With the project file done, there's nowhere to go now except to the code.

*NOTE: We're going to try and keep it rather simple, and try to avoid using higher-level haskell concepts in favor of conceptual clarity.*

## cabal project file - lc-parser.cabal

`cabal-version: 3.0`

`name: lc-parser`

`version: 0.0.1`

`license: NONE`

`author: Leo`

`maintainer: leo@apotheca.io`

`build-type: Simple`

`extra-source-files: README.md`

`library`

`exposed-modules:`

`Lambda.Parser`

`default-extensions:`

`FlexibleContexts`

`NoImplicitPrelude`

`OverloadedStrings`

`build-depends:`

`base >= 4 && < 5`

`hs-source-dirs: src`

`default-language: Haskell2010`

# Module and Imports

First up, we have our module declaration and imports. We're not doing anything really fancy, so there's not all that many of them

We pull in `Prelude`

of course, and then some standard data types `Bool`

, `Char`

, `Either`

and `Maybe`

. This gives us the majority of what we need for building our parser.

We also pull in `Applicative`

and `Monad`

for a few important bits used for gluing parsers together - we'll cover the whole Functor-Applicative-Monad thing in depth some day.

Finally, we pull in a cheeky little `first`

function from `Bifunctor`

so we can map over the first element of a tuple later. We could use `first`

from `Arrow`

, as those two functions happen to coincide for tuples, but `Arrow`

generalizes in a different direction than what we want so we'll stick with `Bifunctor`

.

## haskell - src/Lambda/Parser.hs

`module Lambda.Parser where`

`import Prelude`

`import Data.Bool`

`import Data.Char`

`import Data.Either`

`import Data.Maybe`

`import Control.Applicative`

`import Control.Monad`

`import Data.Bifunctor (first)`

# Parser types

Next, we define our parser data type, plus a supporting error type.

So what is a parser anyway? In a broad sense, just it is a function that reads in some input, and produces some output, returning the result along with the rest of the remaining input. Then, the next parser can continue where the first one left off.

This corresponds to a type that looks like `String -> (a, String)`

, which works well so long as you only give it input that doesn't fail to parse. This makes it a bit nitpicky to use (not unlike partial functions), as in the real world, we don't always know if a parser is going to succeed ahead of time.

So in practice, our parser will handle the possibility of errors by wrapping the returned `(a, String)`

with an `Either ParserError`

. We could throw an error or exception instead of explicit error handling and keep the `String -> (a, String)`

type clean of mentioning the possibility of errors, but that would require hiding the error using the fact that the Parser is a Monad, and would involve invoking the dreaded exception-catching machinery which is out of scope for this article.

With that explanation out of the way, our `Parser`

data type is just a newtype wrapper around a `String -> Either ParseError (a, String)`

function. This makes it easy to just say `Parser a`

instead of constantly copy-pasting `String -> Either ParseError (a, String)`

everywhere. We also define our `ParseError`

which will tell us a bit about what went wrong in the case that things do. We also define a quick `parse`

function for completely ignoring the error handling that we just talked about.

```
````newtype Parser a = Parser { runParser :: String -> Either ParseError (a, String) }`

`data ParseError`

` = EndOfInput`

` | InputRemaining String`

` | Unexpected String`

` | Failure String`

` | Empty`

` deriving (Show)`

`parse :: Parser a -> String -> a`

`parse p s = case runParser p s of`

` Left e -> error $ show e`

` Right (a,[]) -> a`

` Right (a,s) -> error "unexpected input remaining"`

# Parser class instances

Now, we have a bit of boilerplate to define. As I said earlier, we'll cover the `Functor`

`Applicative`

`Monad`

stack and related concepts more fully elsewhere, but we are going to implement some conforming instances and use them, so we'll briefly touch on their concepts anyway. We've got to define this stuff because it lets us glue our parsers together easily.

`Functors`

are abstract contexts that generalize functions and containers, and allow you to `map`

over them. So they may 'contain' a thing in a concrete manner, like `Arrays`

and `Dictionaries`

, but they also may contain things in an abstract way, such as how a function 'contains' the answers to all of its results. We're using it in the second sense - our parsers 'contain' their result, even though we've not provided them any input yet! How strange!

Looking at our implmentation of map (or, fmap as it is actually named), we can see that it just takes a function f :: a -> b and applies it to the a in the `(a, String)`

, turning our `Parser a`

into a `Parser b`

.

`Applicative`

is `Functor`

with a bit extra - we start being able to 'put things in' to the to the parser with `pure`

, and can `ap`

ply functions that are 'in the functor' to data that is also 'in the functor', like how `fmap`

lets us apply a function *outside of the functor* data that is 'in the functor'. `Monad`

lets us do these things in a specific order, so we can be sure that the past happens before the future `Alternative`

gives us a way of continuing with a second parser if the first parser fails, allowing us to differentiate `many`

(zero or more) from `some`

(one or more).

There's simpler ways to define some of these implementations - taking advantage of Either's Monad instance, or using list comprehensions, or do syntax - but we've kept it simple to keep it self-contained.

Congratulations! This was actually the hardest part! Although we haven't parsed a single thing yet, that's because we havent actually defined any parsers yet, although we have set up most of the machinery that we'll need to do it!

```
````instance Functor Parser where`

` fmap f p = Parser $ \ s -> case runParser p s of`

` Left e -> Left e`

` Right (a, s') -> Right (f a, s')`

`instance Applicative Parser where`

` pure a = Parser $ \ s -> Right (a, s)`

` (<*>) p q = Parser $ \ s -> case runParser p s of`

` Left e -> Left e`

` Right (f, s') -> case runParser q s' of`

` Left e -> Left e`

` Right (a, s'') -> Right (f a, s'')`

`instance Alternative Parser where`

` empty = Parser $ \ s -> Left Empty`

` (<|>) p q = Parser $ \ s -> case runParser p s of`

` Left e -> runParser q s`

` a -> a`

`instance Monad Parser where`

` (>>=) p m = Parser $ \ s -> case runParser p s of`

` Left e -> Left e`

` Right (a, s') -> runParser (m a) s'`

# Char and String parsers

At last, we get to actual parsing. Where to begin?

Well, the first parser to write, the parser that we always have, is the parser that returns the first part of any input. Since our input is a `List of Chars aka a String`

, that means we need to write the parser `anyChar :: Parser Char`

. It simply takes the first character of the string, and returns it as the result with the rest of the string as the remaining input, else erroring with an unexpected EndOfInput if the string is empty.

We can now run `parse anyChar "a"`

, and get the result `'a'`

. If we try to parse an `empty string`

, it fails with `EndOfInput`

.

This alone isn't useful - we need to be able to match our input against a predicate to have behavior determined by the input. In other words, it helps to be able to ask what a piece of input is, or if it matches our criteria. For this, we need the parser `satisfy :: (Char -> Bool) -> Parser Char`

. It is like the `anyChar`

parser, except we check the input against the predicate, returning it if it passes, and returning an Unexpected error if it fails.

We can now run `parse (satisfy isAlpha)`

and succeed on "a" but fail on "0". In fact, we can implement a slew of simple character parsers using a bunch of predicates from `Data.Char`

.

Finally, we can also use create a `string`

parser that matches a sequence of characters!

```
````anyChar :: Parser Char`

`anyChar = Parser $ \ s -> case s of`

` [] -> Left EndOfInput`

` (c:cs) -> Right (c, cs)`

`satisfy :: (Char -> Bool) -> Parser Char`

`satisfy p = Parser $ \ input -> case input of`

` [] -> Left EndOfInput`

` (c:cs) -> if p c`

` then Right (c, cs)`

` else Left (Unexpected (c:[]))`

`char :: Char -> Parser Char`

`char c = satisfy (== c)`

`oneOf :: [Char] -> Parser Char`

`oneOf cs = satisfy (flip elem cs)`

`noneOf :: [Char] -> Parser Char`

`noneOf cs = satisfy (not . flip elem cs)`

`lower :: Parser Char`

`lower = satisfy isLower`

`upper :: Parser Char`

`upper = satisfy isUpper`

`letter :: Parser Char`

`letter = satisfy isAlpha `

`digit :: Parser Char`

`digit = satisfy isDigit`

`alphaNum :: Parser Char`

`alphaNum = satisfy isAlphaNum`

`string :: String -> Parser String`

`string [] = pure []`

`string (c:cs) = (:) <$> char c <*> string cs`

`spaces :: Parser String`

`spaces = many $ oneOf " \n\r\t"`

# Combinators

With our primitive character parsers in place, it is time to start combining them.

Just as it is nice to be able to satisfy a predicate because we aren't certain what is going to be next, sometimes we want to parse more than just a single input at a time - we want to be able to parse a list of something, or maybe we want a sequence of one thing separated by another. To that end, we've implemented some of the useful combinators from `Parsec`

.

Give it a try! Run `parse (between (char '(') (char ')') (some digit)) "(123)"`

and it should yield `"123"`

!

All of this enables us to write parsers of higher and higher order, allowing us to construct more complex data types from our primatives. Try building a parser that turns a string of digits into an integer.

```
````choice :: [Parser a] -> Parser a`

`choice ps = foldr (<|>) empty ps`

`count :: Int -> Parser a -> Parser [a] `

`count n p `

` | n <= 0 = return []`

` | otherwise = sequence (replicate n p)`

`between :: Parser a -> Parser b -> Parser c -> Parser c`

`between o c a = do`

` o`

` a' <- a`

` c`

` pure a'`

`surround :: Parser a -> Parser b -> Parser b`

`surround q a = q *> a <* q`

`chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a`

`chainl p op a = chainl1 p op <|> return a`

`chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a`

`chainl1 p op = p >>= loop where `

` loop x = (do`

` f <- op`

` y <- p`

` loop (f x y)`

` ) <|> return x`

`manySep :: Parser a -> Parser sep -> Parser [a]`

`manySep p sep = someSep p sep <|> return []`

`someSep :: Parser a -> Parser sep -> Parser [a]`

`someSep p sep = do`

` x <- p`

` xs <- many (sep >> p)`

` return (x:xs)`

`someEnd :: Parser a -> Parser sep -> Parser [a]`

`someEnd p sep = some $ do`

` x <- p`

` sep`

` return x`

`manyEnd :: Parser a -> Parser sep -> Parser [a]`

`manyEnd p sep = many $ do`

` x <- p`

` sep`

` return x`

`manySepEnd :: Parser a -> Parser sep -> Parser [a]`

`manySepEnd p sep = someSepEnd p sep <|> return []`

`someSepEnd :: Parser a -> Parser sep -> Parser [a]`

`someSepEnd p sep = do`

` x <- p`

` (do`

` sep`

` xs <- manySepEnd p sep`

` return (x:xs)`

` ) <|> return [x]`

`eof :: Parser ()`

`eof = Parser $ \ s -> case s of`

` [] -> Right ((),[])`

` _ -> Left (InputRemaining s)`

# And that's it!

Aside from a bit of nomenclature (fooBy vs manyFoo, fooBy1 vs someFoo), it's practically a drop-in replacement for Parsec in our earlier use cases!

At some point in the future, we will actually replace our use of `parsec`

with this library. Part of the reason we kept it simple and avoided use of extended features such as even classes was so that we can implement a version of this using our lambda calculus interpreter!

Stay tuned!

TODO: Add repo / source download links.

- Leo D., Undated, revised / edited / published Aug 2022