Reworking my language parser with fparsec
Since I was playing with fparsec last week, I decided to redo (or mostly) the parser for my homebrew language that I’ve previously posted about. Using fparsec made the parser surprisingly succinct and expressive. In fact I was able to do most of this in an afternoon, which is impressive consideringmy last C# attempt took 2 weeks to hammer out.
As always, it starts with the data
type Op =
| Plus
| Minus
| GreaterThan
| LessThan
| Mult
| Divide
| Carrot
type Ast =
| Statement of Ast
| Expression of Ex
| Function of string option \* Argument list option \* Ast
| Scope of Ast list option
| Class of Ex \* Ast
| Conditional of Ex \* Ast \* Ast option
| WhileLoop of Ex \* Ast
| ForLoop of Ast \* Ex \* Ex \* Ast
| Call of string \* Argument list option
| Assign of Ex \* Ex
and Ex =
| Single of Ast
| Full of Ex \* Op \* Ex
| Float of float
| Int of int
| Literal of string
| Variable of string
and Argument =
| Element of Ex
Operators
Parsing operators is trivial
let plus = pstring "+" \>\>% Plus
let minus = pstring "-" \>\>% Minus
let divide = pstring "/" \>\>% Divide
let mult = pstring "\*" \>\>% Mult
let carrot = pstring "^" \>\>% Carrot
let gt = pstring "\>" \>\>% GreaterThan
let lt = pstring "\<" \>\>% LessThan
let op = spaces \>\>. choice[plus;minus;divide;mult;carrot;gt;lt]
Expressions
But what was great was parsing expressions. These were complicated because I had to avoid left recursion, and in my C# parser I had a lot of edge conditions and had to deal with special backtracking. It was a nightmare. With FParsec you can create forward recursive parsers, basically you create a dummy variable that you use as the recursive parser. Later you populate a tied reference to it with what are the available recursive parser implementations.
// create a forward reference
// the expr is what we'll use in our parser combinators
// the exprImpl we'll populate with all the recursive options later
let expr, exprImpl = createParserForwardedToRef()
let expression1 = spaces \>\>? choice[floatNum;intNum;literal;variable]
let between a b p = pstring a \>\>. p .\>\> pstring b
let bracketExpression = expr |\> between "(" ")"
let lhExpression = choice[expression1; bracketExpression]
let expressionOperation = lhExpression \>\>=? fun operandL -\>
op \>\>=? fun operator -\>
choice[expr;bracketExpression] \>\>=? fun operandR -\>
preturn (operandL, operator, operandR) |\>\> Full
do exprImpl := spaces \>\>. choice[attempt expressionOperation;
attempt bracketExpression;
expression1]
expression1
is a type of expression that is just a single element. expressionOperation
is an expression that has an operator inbetween two expressions. To avoid left recursion, the left hand side of an expression is limited to either expressions of single elements, or expressions encapsulated in parenthesis. Then the right hand side can be either a parenthesis expression, or a regular expression. You’ll notice that expr
isn’t actually defined yet when its used here on line 16. It’s a placeholder for the recursive parser, which is tied to the mutable reference cell (exprImpl) that I populate after I’ve defined all the parsers. This lets you define a parser that can actually call itself recursively! Neat.
The >>=?
operator applies the result to the following function, but if it fails, backtracks to the beginning of that parsers state.
Scope
I defined a scope as any valid statements between two curly brackets.
let funcInners, funcInnersImpl = createParserForwardedToRef()
let scope = parse{
do! spaces
do! skipStringCI "{"
do! spaces
let! text = opt funcInners
do! spaces
do! skipStringCI "}"
do! spaces
return Scope(text)
}
The forward parser here is going to be populated at the very end of my parser, since it will allow for any kind of statement like while loops, for loops, conditionals, assignment, etc. All the scope parser cares about is that it got stuff between curly brackets. Using this we can leverage it anywhere else that has curly brackets. Later on in the program I set this up:
(\* things that can be in functions \*)
do funcInnersImpl := many1 (spaces \>\>? choice [scope; func; statement])
Which allows scopes, statements (delineated by semicolons), or other functions, to appear inside of a function or scope. So you can see how a scope and function can be recursive (by containing other scopes and functions inside of them).
Anyways, lets parse a function
let innerArgs = sepEndBy1 (expr |\>\> Element) (pstring ",")
let arguments = innerArgs |\> between "(" ")"
let func = parse {
do! skipStringCI "func"
do! spaces
let! name = opt (many1Chars (satisfy isAsciiLetter))
let! arguments = opt arguments
do! spaces
do! skipStringCI "-\>"
let! scope = scope
return Function(name, arguments, scope)
}
Conditionals
Conditionals were fun, because you can have an if statement, an if/else, or an if/elseif, or an if/elseif/…/else combo. In my previous C# parser I covered each type independently so I had a lot of extra overlap, but this time I wanted to see if I could create an aggregate parser combinator to handle all these scenarios for me in one.
let conditionalParser, conditionalParserImpl = createParserForwardedToRef()
let ifBlock = parse{
do! skipStringCI "if"
let! condition = expr |\> between "(" ")"
do! spaces
let! onTrue = scope
do! spaces
let elseKeyword = skipStringCI "else" .\>\> spaces
let elseParse = parse{
do! elseKeyword
let! onFalse = scope
return (condition, onTrue, Some(onFalse)) |\> Conditional
}
let elseIfParse = parse{
do! elseKeyword
let! onFalse = conditionalParser
return (condition, onTrue, Some(onFalse)) |\> Conditional
}
let noElseParse = parse{
return (condition, onTrue, None) |\> Conditional
}
let! result = choice[attempt elseIfParse;elseParse;noElseParse]
return result
}
This time, I created a recursive parser that optionally removes an else statement and captures the scope, or removes the else element and calls back into the if parser, or just terminates (so an if with no else). The final result is a 3 way alternative that the if block can evaluate.
Loops
Here is a while loop
let whileLoop = (pstring "while" \>\>. spaces) \>\>. (expr |\> between "(" ")") \>\>= fun predicate -\>
scope \>\>= fun body -\>
preturn (WhileLoop(predicate, body))
And here is a for loop
let assign = parse{
let! ex = expr
do! spaces
do! skipStringCI "="
do! spaces
let! assignEx = expr
do! spaces
return (ex, assignEx) |\> Assign
}
let forLoop =
let startCondition = assign .\>\> pstring ";"
let predicate = expr .\>\> pstring ";"
let endCondition = expr
let forKeyword = pstring "for" .\>\> spaces
let forItems = tuple3 startCondition predicate endCondition |\> between "(" ")"
forKeyword \>\>. forItems .\>\>. scope \>\>= fun ((start, predicate, end), body) -\>
preturn (start, predicate, end, body) |\>\> ForLoop
Which gives you a result like this
test "for(x=1;y\<z;y+1){}";;
val it : Ast list =
[ForLoop
(Assign (Variable "x",Float 1.0),
Full (Variable "y",LessThan,Variable "z"),
Full (Variable "y",Plus,Float 1.0),Scope null)]
Conclusion
My fiance is probably pissed that I spent 4th of july working on parser combinators, but fparsec is just too fun not to. I really can’t wait for an opportunity to use this for some production code, since I’m extremely happy with fparsecs abilities and the experience of working in it.
For the full parser check out this fsharp snippet.