We're still under construction!
And you just found out where. We're still working on this page. How did you get in here, anyway?
Writing a Parser Combinator Library in Haskell
We write our own stripped-down version of the popular parsec library.
Parser Combinators real quick
Parser-combinators are what truly got me into Haskell. I saw a video of a professor giving a lecture on building a parser from scratch. He did it with just a handful of code, over the course of an hour, explaining everything along the way.
I was at the time working with Erlang due to my interest in distributed computing and filesystems, but I was already entertaining the idea of learning haskell. There was a distributed computing library somewhat modeled after Erlang that I wanted to check out (the now-defunct cloud-haskell project, which I have carried on its spirit). This video made me jump ship entirely.
So what are parser combinators? They're a great way of building larger parsers our of bits of smaller parsers. You take one parser and combine it with another parser. Parser combinators! Quite often, you can end up with a parser that is roughly in the shape of the data it is parsing, which is a rather neat and useful structural trick.
So with that out of the way, let's get to defining our project cabal
file! It's rather short, being 90% boilerplate. We only are using the base
library, and we only have the one Lambda.Parser
module.
With the project file done, there's nowhere to go now except to the code.
NOTE: We're going to try and keep it rather simple, and try to avoid using higher-level haskell concepts in favor of conceptual clarity.
cabal project file - lc-parser.cabal
cabal-version: 3.0
name: lc-parser
version: 0.0.1
license: NONE
author: Leo
maintainer: leo@apotheca.io
build-type: Simple
extra-source-files: README.md
library
exposed-modules:
Lambda.Parser
default-extensions:
FlexibleContexts
NoImplicitPrelude
OverloadedStrings
build-depends:
base >= 4 && < 5
hs-source-dirs: src
default-language: Haskell2010
Module and Imports
First up, we have our module declaration and imports. We're not doing anything really fancy, so there's not all that many of them
We pull in Prelude
of course, and then some standard data types Bool
, Char
, Either
and Maybe
. This gives us the majority of what we need for building our parser.
We also pull in Applicative
and Monad
for a few important bits used for gluing parsers together - we'll cover the whole Functor-Applicative-Monad thing in depth some day.
Finally, we pull in a cheeky little first
function from Bifunctor
so we can map over the first element of a tuple later. We could use first
from Arrow
, as those two functions happen to coincide for tuples, but Arrow
generalizes in a different direction than what we want so we'll stick with Bifunctor
.
haskell - src/Lambda/Parser.hs
module Lambda.Parser where
import Prelude
import Data.Bool
import Data.Char
import Data.Either
import Data.Maybe
import Control.Applicative
import Control.Monad
import Data.Bifunctor (first)
Parser types
Next, we define our parser data type, plus a supporting error type.
So what is a parser anyway? In a broad sense, just it is a function that reads in some input, and produces some output, returning the result along with the rest of the remaining input. Then, the next parser can continue where the first one left off.
This corresponds to a type that looks like String -> (a, String)
, which works well so long as you only give it input that doesn't fail to parse. This makes it a bit nitpicky to use (not unlike partial functions), as in the real world, we don't always know if a parser is going to succeed ahead of time.
So in practice, our parser will handle the possibility of errors by wrapping the returned (a, String)
with an Either ParserError
. We could throw an error or exception instead of explicit error handling and keep the String -> (a, String)
type clean of mentioning the possibility of errors, but that would require hiding the error using the fact that the Parser is a Monad, and would involve invoking the dreaded exception-catching machinery which is out of scope for this article.
With that explanation out of the way, our Parser
data type is just a newtype wrapper around a String -> Either ParseError (a, String)
function. This makes it easy to just say Parser a
instead of constantly copy-pasting String -> Either ParseError (a, String)
everywhere. We also define our ParseError
which will tell us a bit about what went wrong in the case that things do. We also define a quick parse
function for completely ignoring the error handling that we just talked about.
newtype Parser a = Parser { runParser :: String -> Either ParseError (a, String) }
data ParseError
= EndOfInput
| InputRemaining String
| Unexpected String
| Failure String
| Empty
deriving (Show)
parse :: Parser a -> String -> a
parse p s = case runParser p s of
Left e -> error $ show e
Right (a,[]) -> a
Right (a,s) -> error "unexpected input remaining"
Parser class instances
Now, we have a bit of boilerplate to define. As I said earlier, we'll cover the Functor
Applicative
Monad
stack and related concepts more fully elsewhere, but we are going to implement some conforming instances and use them, so we'll briefly touch on their concepts anyway. We've got to define this stuff because it lets us glue our parsers together easily.
Functors
are abstract contexts that generalize functions and containers, and allow you to map
over them. So they may 'contain' a thing in a concrete manner, like Arrays
and Dictionaries
, but they also may contain things in an abstract way, such as how a function 'contains' the answers to all of its results. We're using it in the second sense - our parsers 'contain' their result, even though we've not provided them any input yet! How strange!
Looking at our implmentation of map (or, fmap as it is actually named), we can see that it just takes a function f :: a -> b and applies it to the a in the (a, String)
, turning our Parser a
into a Parser b
.
Applicative
is Functor
with a bit extra - we start being able to 'put things in' to the to the parser with pure
, and can ap
ply functions that are 'in the functor' to data that is also 'in the functor', like how fmap
lets us apply a function outside of the functor data that is 'in the functor'. Monad
lets us do these things in a specific order, so we can be sure that the past happens before the future Alternative
gives us a way of continuing with a second parser if the first parser fails, allowing us to differentiate many
(zero or more) from some
(one or more).
There's simpler ways to define some of these implementations - taking advantage of Either's Monad instance, or using list comprehensions, or do syntax - but we've kept it simple to keep it self-contained.
Congratulations! This was actually the hardest part! Although we haven't parsed a single thing yet, that's because we havent actually defined any parsers yet, although we have set up most of the machinery that we'll need to do it!
instance Functor Parser where
fmap f p = Parser $ \ s -> case runParser p s of
Left e -> Left e
Right (a, s') -> Right (f a, s')
instance Applicative Parser where
pure a = Parser $ \ s -> Right (a, s)
(<*>) p q = Parser $ \ s -> case runParser p s of
Left e -> Left e
Right (f, s') -> case runParser q s' of
Left e -> Left e
Right (a, s'') -> Right (f a, s'')
instance Alternative Parser where
empty = Parser $ \ s -> Left Empty
(<|>) p q = Parser $ \ s -> case runParser p s of
Left e -> runParser q s
a -> a
instance Monad Parser where
(>>=) p m = Parser $ \ s -> case runParser p s of
Left e -> Left e
Right (a, s') -> runParser (m a) s'
Char and String parsers
At last, we get to actual parsing. Where to begin?
Well, the first parser to write, the parser that we always have, is the parser that returns the first part of any input. Since our input is a List of Chars aka a String
, that means we need to write the parser anyChar :: Parser Char
. It simply takes the first character of the string, and returns it as the result with the rest of the string as the remaining input, else erroring with an unexpected EndOfInput if the string is empty.
We can now run parse anyChar "a"
, and get the result 'a'
. If we try to parse an empty string
, it fails with EndOfInput
.
This alone isn't useful - we need to be able to match our input against a predicate to have behavior determined by the input. In other words, it helps to be able to ask what a piece of input is, or if it matches our criteria. For this, we need the parser satisfy :: (Char -> Bool) -> Parser Char
. It is like the anyChar
parser, except we check the input against the predicate, returning it if it passes, and returning an Unexpected error if it fails.
We can now run parse (satisfy isAlpha)
and succeed on "a" but fail on "0". In fact, we can implement a slew of simple character parsers using a bunch of predicates from Data.Char
.
Finally, we can also use create a string
parser that matches a sequence of characters!
anyChar :: Parser Char
anyChar = Parser $ \ s -> case s of
[] -> Left EndOfInput
(c:cs) -> Right (c, cs)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser $ \ input -> case input of
[] -> Left EndOfInput
(c:cs) -> if p c
then Right (c, cs)
else Left (Unexpected (c:[]))
char :: Char -> Parser Char
char c = satisfy (== c)
oneOf :: [Char] -> Parser Char
oneOf cs = satisfy (flip elem cs)
noneOf :: [Char] -> Parser Char
noneOf cs = satisfy (not . flip elem cs)
lower :: Parser Char
lower = satisfy isLower
upper :: Parser Char
upper = satisfy isUpper
letter :: Parser Char
letter = satisfy isAlpha
digit :: Parser Char
digit = satisfy isDigit
alphaNum :: Parser Char
alphaNum = satisfy isAlphaNum
string :: String -> Parser String
string [] = pure []
string (c:cs) = (:) <$> char c <*> string cs
spaces :: Parser String
spaces = many $ oneOf " \n\r\t"
Combinators
With our primitive character parsers in place, it is time to start combining them.
Just as it is nice to be able to satisfy a predicate because we aren't certain what is going to be next, sometimes we want to parse more than just a single input at a time - we want to be able to parse a list of something, or maybe we want a sequence of one thing separated by another. To that end, we've implemented some of the useful combinators from Parsec
.
Give it a try! Run parse (between (char '(') (char ')') (some digit)) "(123)"
and it should yield "123"
!
All of this enables us to write parsers of higher and higher order, allowing us to construct more complex data types from our primatives. Try building a parser that turns a string of digits into an integer.
choice :: [Parser a] -> Parser a
choice ps = foldr (<|>) empty ps
count :: Int -> Parser a -> Parser [a]
count n p
| n <= 0 = return []
| otherwise = sequence (replicate n p)
between :: Parser a -> Parser b -> Parser c -> Parser c
between o c a = do
o
a' <- a
c
pure a'
surround :: Parser a -> Parser b -> Parser b
surround q a = q *> a <* q
chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = chainl1 p op <|> return a
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= loop where
loop x = (do
f <- op
y <- p
loop (f x y)
) <|> return x
manySep :: Parser a -> Parser sep -> Parser [a]
manySep p sep = someSep p sep <|> return []
someSep :: Parser a -> Parser sep -> Parser [a]
someSep p sep = do
x <- p
xs <- many (sep >> p)
return (x:xs)
someEnd :: Parser a -> Parser sep -> Parser [a]
someEnd p sep = some $ do
x <- p
sep
return x
manyEnd :: Parser a -> Parser sep -> Parser [a]
manyEnd p sep = many $ do
x <- p
sep
return x
manySepEnd :: Parser a -> Parser sep -> Parser [a]
manySepEnd p sep = someSepEnd p sep <|> return []
someSepEnd :: Parser a -> Parser sep -> Parser [a]
someSepEnd p sep = do
x <- p
(do
sep
xs <- manySepEnd p sep
return (x:xs)
) <|> return [x]
eof :: Parser ()
eof = Parser $ \ s -> case s of
[] -> Right ((),[])
_ -> Left (InputRemaining s)
And that's it!
Aside from a bit of nomenclature (fooBy vs manyFoo, fooBy1 vs someFoo), it's practically a drop-in replacement for Parsec in our earlier use cases!
At some point in the future, we will actually replace our use of parsec
with this library. Part of the reason we kept it simple and avoided use of extended features such as even classes was so that we can implement a version of this using our lambda calculus interpreter!
Stay tuned!
TODO: Add repo / source download links.
- Leo D., Undated, revised / edited / published Aug 2022