Peter Stuifzand

Parse a heredoc with Marpa

I already wrote that I’m quite interested in the things Marpa can do for writing parsers. The latest release of Marpa includes an addition that makes parsing strange languages even more possible. The new functionality allows you to control the parser from your own code while it doesn’t stop you from using the internal scanner.

To try out this new functionality in Marpa I tried to write a parser for a small language with just one function and heredocs. I choose this language, because it can’t be parsed with just the internal scanner. The scanner needs to find the end marker with the same name as the begin marker of the heredoc. The parser doesn’t know about the name of the marker, just that it looks like a marker.

The code is in a repository on github. I will talk about the lines containing the expression, heredoc and lexemes. The comments in the file explain what happens. The tests in the t directory show a few ways in which we can combine the heredocs in statements and expressions.

The moment the parser finds a beginning of a heredoc it pauses and passes control to the parser. Where the parser is paused can be controlled by the :lexeme rule and the pause adverb.

The external scanner first finds the end of the line where the beginning of the heredoc was found. This is the position where the text of the heredoc will begin. Then it will find the name of the marker. If it found the name it will try to find the marker at the beginning of one of the following lines. The marker and the literal will be send to the parser with the two calls to the lexeme_read method. Then it moves the $last_heredoc_end to the end of the line of text and the position of the input string to the position after the beginning of the marker and passes control back to the parser with a call to the resume method.

The program will do this until the parser pauses again at the end of the line. Then it moves the position of the parser to the end of the possibly multiple heredocs. The repeats until the end of the input is reached.

I was impressed with how short the code is and how easy it was to parse heredocs and pass control back and forth between the internal and external scanner. This allows for many new ways to parse text for all kinds of languages.

© 2023 Peter Stuifzand