I already wrote that I’m quite interested in the things Marpa can do for writing parsers. The latest release of Marpa includes an addition that makes parsing strange languages even more possible. The new functionality allows you to control the parser from your own code while it doesn’t stop you from using the internal scanner.
To try out this new functionality in Marpa I tried to write a parser for a small language with just one function and heredocs. I choose this language, because it can’t be parsed with just the internal scanner. The scanner needs to find the end marker with the same name as the begin marker of the heredoc. The parser doesn’t know about the name of the marker, just that it looks like a marker.
The code is in a repository
on github. I will talk about the lines containing the expression,
heredoc and lexemes. The comments in the file explain what happens. The tests
t directory show a few ways in which we can combine the heredocs in
statements and expressions.
The moment the parser finds a beginning of a heredoc it pauses and passes
control to the parser. Where the parser is paused can be controlled by the
:lexeme rule and the
The external scanner first finds the end of the line where the beginning of the
heredoc was found. This is the position where the text of the heredoc will
begin. Then it will find the name of the marker. If it found the name it will
try to find the marker at the beginning of one of the following lines. The
marker and the literal will be send to the parser with the two calls to the
lexeme_read method. Then it moves the
$last_heredoc_end to the end of the
line of text and the position of the input string to the position after the
beginning of the marker and passes control back to the parser with a call to
The program will do this until the parser pauses again at the end of the line. Then it moves the position of the parser to the end of the possibly multiple heredocs. The repeats until the end of the input is reached.
I was impressed with how short the code is and how easy it was to parse heredocs and pass control back and forth between the internal and external scanner. This allows for many new ways to parse text for all kinds of languages.