What's allowed in the basic Marpa grammar

In my last post I wrote about what the different rewrites of more advanced rules would look like. The question now is, when do you have a grammar that's basic enough?

This is the basic configuration of the Marpa::XS class. In this configuration you can specify left hand sides and right hand sides and the star and plus operator. If your tree looks like this, then it's basic enough.

Parser    ::= Rule+
Rule      ::= Lhs DeclareOp Rhs
Lhs       ::= Name
Rhs       ::= Names
Rhs       ::= Name Star
Rhs       ::= Name Plus
Names     ::= Name+

This grammar consists of only 7 lines, so that's pretty good. This are the basics, but it assumes a few things.

  1. Ignores whitespace. This grammar completely ignores whitespace. It will work if your tokenizer removes whitespace from the front of each token before it passes it to the Recognizer.
  2. DeclareOp, Name, Star and Plus are terminals which are recognized by the tokenizer.
  3. Tokens and actions are declared somewhere else. In the grammar that I used I can specify characters, regex tokens and actions. See github/MarpaX-Parser-Marpa.

A grammar rewriter should be able to rewrite the advanced rules to a grammar that looks like this.

I will leave you with the following example, that tries to match 0 or 1 occurences of B.

A     ::= B?         => A ::= Null
                        A ::= B 


