We're still under construction!
And you just found out where. We're still working on this page. How did you get in here, anyway?
Parsing JSON in 38 lines of Haskell
We build a tiny javascript object notation parser, while still remaining RFC compliant and without losing readability.
Let's get started!
First, let's get the module declaration out of the way.
haskell
module Tiny.Json where
Then, a handful of imports. We'll be using a few things from base
, and parsec
, a commonly-used parser-combinator library.
haskell
import Prelude
import Data.List (dropWhileEnd, intercalate)
import Control.Applicative (liftA2)
import Control.Monad (when)
import Data.Char (chr)
import Text.Parsec
We need to define our JSON data type. This is really easy.
haskell
data Json = JO [(String,Json)] | JA [Json] | JS String | JN (Int,Int) | JB Bool
| JU deriving (Show, Eq)
Here we define some convenience functions for running our top-level json
parser, which we will define next.
haskell
parseJson :: String -> Json -- To stop aggressive type checker over-inferring
parseJson str = either (error . show) id $ runJsonParser str
runJsonParser = runParser (json <* eof) () "
"
Now we can get to the JSON parser. We define the top-level json
parser as the sum of its subparsers.
haskell
json = spaces *> (jo <|> ja <|> js <|> jn <|> jb <|> ju) <* spaces <?> "json"
Now, we're define each of the sub-parsers.
haskell
jassoc = (,) <$> (spaces *> str) <*> (spaces *> char ':' *> json)
jo = JO <$> betweenChar '{' '}' (sepBy jassoc (char ',')) <?> "jobject"
ja = JA <$> betweenChar '[' ']' (sepBy json (char ',')) <?> "jarray"
js = JS <$> str <?> "jstring"
jn = JN <$> sci <?> "jnumber"
jb = JB <$> (string "true" $> True <|> string "false" $> False) <?> "jbool"
ju = string "null" $> JU <?> "jnil"
Then we need to define our sci
scientific number parser.
haskell
sci = do
(rsign, real) <- liftA2 (,) sign (many1 digit)
frac <- dropWhileEnd (== '0') <$> option "" (char '.' *> many1 digit)
(esign, exp) <- option (1,0)
$ char 'e' *> liftA2 (,) sign (read <$> many1 digit)
return ((rsign * read (real ++ frac), esign * exp - length frac)) <?> "sci")
And then the str
escaped string parser too.
haskell
str = betweenChar '\"' '\"' (many escChar) <?> "escaped string" where
escChar = (noneOf "\\\"\n\t\f\b\r" <?> "unescaped char")
<|> (char '\\' *> (try unicode <|> unesc <$> esc) <?> "escaped char")
esc = oneOf "\\/\"ntfbr" <?> "escape sequence"
unesc x = maybe x id $ lookup x escapes
escapes = [('n','\n'),('t','\t'),('f','\f'),('b','\b'),('r','\r')]
unicode = do
code <- char 'u' *> count 4 hexDigit
when (null code) $ fail "bad unicode escape"
(return $ chr $ read ("0x" ++ code)) <?> "unicode"
Finally, we're left with defining a few helper functions betweenChar
and sign
, which we use in a few places.
haskell
betweenChar a b = between (char a) (char b)
sign = option 1 (char '+' $> 1 <|> char '-' $> (-1)) <?> "sign"
All put together, it's only 38 lines of code, and it fully and properly implements the RFC JSON spec. Not too bad.
Originally, this parser was 34 lines, but a few of those lines broke the 80 character limit that I decided to impose for this article.
- Leo D., May 2022