In my last post I wrote about what the different rewrites of more advanced rules would look like. The question now is, when do you have a grammar that’s basic enough?
This is the basic configuration of the Marpa::XS class. In this configuration you can specify left hand sides and right hand sides and the star and plus operator. If your tree looks like this, then it’s basic enough.
Parser ::= Rule+ Rule ::= Lhs DeclareOp Rhs Lhs ::= Name Rhs ::= Names Rhs ::= Name Star Rhs ::= Name Plus Names ::= Name+
This grammar consists of only 7 lines, so that’s pretty good. This are the basics, but it assumes a few things.
- Ignores whitespace. This grammar completely ignores whitespace. It will work if your tokenizer removes whitespace from the front of each token before it passes it to the Recognizer.
- DeclareOp, Name, Star and Plus are terminals which are recognized by the tokenizer.
- Tokens and actions are declared somewhere else. In the grammar that I used I can specify characters, regex tokens and actions. See github/MarpaX-Parser-Marpa.
A grammar rewriter should be able to rewrite the advanced rules to a grammar that looks like this.
I will leave you with the following example, that tries to match
1 occurences of
A ::= B? => A ::= Null A ::= B