We're still under construction!

And you just found out where. We're still working on this page. How did you get in here, anyway?

Parsing JSON in 38 lines of Haskell

We build a tiny javascript object notation parser, while still remaining RFC compliant and without losing readability.

Let's get started!

First, let's get the module declaration out of the way.

haskell

module Tiny.Json where

Then, a handful of imports. We'll be using a few things from base, and parsec, a commonly-used parser-combinator library.

haskell

import Prelude
import Data.List (dropWhileEnd, intercalate)
import Control.Applicative (liftA2)
import Control.Monad (when)
import Data.Char (chr)
import Text.Parsec

We need to define our JSON data type. This is really easy.

haskell

data Json = JO [(String,Json)] | JA [Json] | JS String | JN (Int,Int) | JB Bool
    | JU deriving (Show, Eq)

Here we define some convenience functions for running our top-level json parser, which we will define next.

haskell

parseJson :: String -> Json -- To stop aggressive type checker over-inferring
parseJson str = either (error . show) id $ runJsonParser str
runJsonParser = runParser (json <* eof) () ""

Now we can get to the JSON parser. We define the top-level json parser as the sum of its subparsers.

haskell

json = spaces *> (jo <|> ja <|> js <|> jn <|> jb <|> ju) <* spaces <?> "json"

Now, we're define each of the sub-parsers.

haskell

jassoc = (,) <$> (spaces *> str) <*> (spaces *> char ':' *> json)
jo = JO <$> betweenChar '{' '}' (sepBy jassoc (char ',')) <?> "jobject"
ja = JA <$> betweenChar '[' ']' (sepBy json (char ',')) <?> "jarray"
js = JS <$> str <?> "jstring"
jn = JN <$> sci <?> "jnumber"
jb = JB <$> (string "true" $> True <|> string "false" $> False) <?> "jbool"
ju = string "null" $> JU <?> "jnil"

Then we need to define our sci scientific number parser.

haskell

sci = do
    (rsign, real) <- liftA2 (,) sign (many1 digit)
    frac <- dropWhileEnd (== '0') <$> option "" (char '.' *> many1 digit)
    (esign, exp) <- option (1,0)
        $ char 'e' *> liftA2 (,) sign (read <$> many1 digit)
    return ((rsign * read (real ++ frac), esign * exp - length frac)) <?> "sci")

And then the str escaped string parser too.

haskell

str = betweenChar '\"' '\"' (many escChar) <?> "escaped string" where
    escChar = (noneOf "\\\"\n\t\f\b\r" <?> "unescaped char")
        <|> (char '\\' *> (try unicode <|> unesc <$> esc) <?> "escaped char")
    esc = oneOf "\\/\"ntfbr" <?> "escape sequence"
    unesc x = maybe x id $ lookup x escapes
    escapes = [('n','\n'),('t','\t'),('f','\f'),('b','\b'),('r','\r')]
    unicode = do
        code <- char 'u' *> count 4 hexDigit
        when (null code) $ fail "bad unicode escape"
        (return $ chr $ read ("0x" ++ code)) <?> "unicode"

Finally, we're left with defining a few helper functions betweenChar and sign, which we use in a few places.

haskell

betweenChar a b = between (char a) (char b)
sign = option 1 (char '+' $> 1 <|> char '-' $> (-1)) <?> "sign"

All put together, it's only 38 lines of code, and it fully and properly implements the RFC JSON spec. Not too bad.

Originally, this parser was 34 lines, but a few of those lines broke the 80 character limit that I decided to impose for this article.


- Leo D., May 2022