The writings of Peter Stuifzand

Archive for April 2012

I had a nice idea a few days ago. A simple module that creates the rules part of a Marpa grammar, but without code generation and lexers. Just the bit that Marpa::XS::Grammar uses.

So I wrote it in a few minutes. It's not really complicated and provides an example how to write a short parser.

Lets start with a short example of what you would write.

use Marpa::XS;
use MarpaX::Simple::Rules 'parse_rules';


my $rules = parse_rules(<<"RULES");
RULES

my $grammar = Marpa::XS::Grammar->new({
    rules => $rules,
    ...
});

At the ... you should add more arguments to Marpa::XS depending on how you want to use it. Now let's write a simple example of some rules.

my $rules = parse_rules(<<"RULES");
expr   ::= term
term   ::= term plus term
term   ::= factor
factor ::= factor mul factor
factor ::= number
RULES

These rules don't yet contain actions, so Marpa will call the default_action or a function named like that left hand side name. For information you should take a look at the Marpa::XS documentation.

Now let's some specific action to all the rules. We do this by adding => action_name to the end of each of the rules.

my $rules = parse_rules(<<"RULES");
expr   ::= term                  => arg_0
term   ::= term plus term        => plus
term   ::= factor                => arg_0
factor ::= factor mul factor     => mul
factor ::= number                => arg_0
RULES

No functions are provided for you. The names like arg_0 are just generic functions that you should write yourself. If you create a package that uses this module you could use the actions argument for the grammar.

my $grammar = Marpa::XS::Grammar->new({
    ...
    actions => __PACKAGE__,
    ...
});

If you write the functions like the following, the Marpa will call the right thing.

package YYY;
...

sub arg_0 {
    shift; return $_[0];
}
...

I will release the code to CPAN so it will appear somewhat later, but you can already take a look at it at github/MarpaX::Simple::Rules.

The three problems that need to be fixed are subscription, feed reading and feed creation.

Subscription

Subscribing to feeds is really hard. It should be really simple. Unsubscribing should be just as simple.

  1. See something that you want to subscribe to? Click the subscribe button.
  2. Done!

Unsubscribing should be just as easy.

  1. You see a post in your feed reader that you don't want to see.
  2. Click Hide.
  3. Show an option for unsubscription.
  4. Click, gone!

Or:

  1. Visit the page you want unsubscribe from in your browser.
  2. Click the unsubscribe button.
  3. No step three!

That's easy. Why isn't this possible yet?

Feed reading

Reading a feed should also be really simple. Dave Winer has the River of News. A small paragraph with optional title, description and link. That's how simple it should be.

Do you see something that you want to read? Read it. If you don't want to read it, just scroll further for other posts.

Nothing is remembered or saved. Unless you want it too. Click 'star', 'favorite' or 'bookmark', and it is saved in a to read feed, so you can find it later.

Feed creation

You have a feed. Write an update in a textarea. Click 'Publish'. Done. It appears in your feed and the rivers of everyone who subscribed.

These three parts are easy and important.

In my last post I wrote about what the different rewrites of more advanced rules would look like. The question now is, when do you have a grammar that's basic enough?

This is the basic configuration of the Marpa::XS class. In this configuration you can specify left hand sides and right hand sides and the star and plus operator. If your tree looks like this, then it's basic enough.

Parser    ::= Rule+
Rule      ::= Lhs DeclareOp Rhs
Lhs       ::= Name
Rhs       ::= Names
Rhs       ::= Name Star
Rhs       ::= Name Plus
Names     ::= Name+

This grammar consists of only 7 lines, so that's pretty good. This are the basics, but it assumes a few things.

  1. Ignores whitespace. This grammar completely ignores whitespace. It will work if your tokenizer removes whitespace from the front of each token before it passes it to the Recognizer.
  2. DeclareOp, Name, Star and Plus are terminals which are recognized by the tokenizer.
  3. Tokens and actions are declared somewhere else. In the grammar that I used I can specify characters, regex tokens and actions. See github/MarpaX-Parser-Marpa.

A grammar rewriter should be able to rewrite the advanced rules to a grammar that looks like this.

I will leave you with the following example, that tries to match 0 or 1 occurences of B.

A     ::= B?         => A ::= Null
                        A ::= B 

Let's try to write some rewrite rules for a next level Marpa to a lower level Marpa. The syntax is not important, this is about how we can rewrite the rules.

In the actual grammars none of the left side patterns work, but all of the right side rewritten version should work.

Alternations

An alternation is a rule that can match either way B, C or D.

A ::= B | C     =>  A ::= B
                    A ::= C

A ::= B | C | D =>  A ::= B
                    A ::= C
                    A ::= D

Plus, Star

Plus means match one or more times, star means match zero of more times. Normally you can't use + or * inside of a rule. Only whole rules can be used like this. If you want to use these you need to create a new rule and use it inside.

A ::= B+ C      =>  A   ::= SR0 C
                    SR0 ::= B+

Subrule

A subrule is a rule that is part of another rule. A subrule behaves as if it is a rule.

A ::= (B C) D   =>  A   ::= SR0 D
                    SR0 ::= B C

A ::= (B C)+ D  =>  A   ::= SR0 D
                    SR0 ::= SR1+
                    SR1 ::= B C

Count

Match exactly n times.

A ::= B{n}

A ::= B{1}      =>  A   ::= B
A ::= B{2}      =>  A   ::= B B
A ::= B{10}     =>  A   ::= B B B B B B B B B B

Min

Match at least min times.

A ::= B{min,}

A ::= B{2,}     =>  A   ::= B B
                    A   ::= B B SR0
                    SR0 ::= B*

Max

Match no more than max times.

A ::= B{,max}

A ::= B{,10}    =>  A   ::= Null
                    A   ::= B
                    A   ::= B B
                    A   ::= B B B
                    A   ::= B B B B
                    A   ::= B B B B B
                    A   ::= B B B B B B
                    A   ::= B B B B B B B
                    A   ::= B B B B B B B B
                    A   ::= B B B B B B B B B
                    A   ::= B B B B B B B B B B

Min, Max

Match at least min times, but no more than max times.

A ::= B{min, max}

A ::= B{2,5}    =>  A   ::= B B
                    A   ::= B B B
                    A   ::= B B B B
                    A   ::= B B B B B

I will keep the namespaces, modules and includes in my brain for another time.