Grok - Peter Stuifzand

In this video Steve Yegge talks about a project he’s working on called Grok. I seems ambitious and useful.

I worked on a few parsers in the last few months and it seems that most of the work still has to be done even if you’ve got a parse tree of a language.

Yegge explains that they don’t write the parsers themselves, but use the work compiler builders have done. This is a good idea, because writing a parser isn’t easy, even if you have a really good tools to build a parser with.

After you have a parse tree you still need to generate code or evaluate the tree to do the actual work.

It’s the same problem you have when you’re parsing HTML: why would you write that code (badly) when someone else already did it for you?

You have a piece of HTML you want to use some part of. You start with regexes and people start yelling at you for using the wrong tool for the job. So you move on to a better tool.

That’s all well and good when you want to parse HTML (I would use Marpa::R2::HTML), but what if you would like to parse C, Javascript or my language of choice, Perl. It would be the same (but better) madness to write a grammar and parser for those languages.

If I need to parse a few lines of C function declarations, I would start with a small regex, fail and move on to a combination of regexes, an ad-hoc state machine and an actual parser. When you have an actual parser in your toolbox you can take on more problems. It’s not always the right tool. But I will try it first the next time I need to mangle and transmute text, even before I start with the regexes.

There is a need for parse Javascript and C and general tools to work with them. Parse a Javascript file and list the function names.

use Parse::Javascript;
my $file = Parse::Javascript->parse($js_file);
say $_->name for $file->functions;

Or the same for C.

use Parse::C;
my $file = Parse::C->parse($c_file);
say $_->name for $file->functions;

Or change all function calls the match one thing and change them to another thing.

use Refactor 'refactor';
my $refactored_file = refactor($file, 'compare(@a, @b) == 0', 'equal(@a, @b)');

Or how about this create C code from a Javascript AST?

my $c_file = Serialize::C->serialize($js_ast);

Or as Steve Yegge says creating a parser (with semantics), code generator, tools and all that is a life’s work. Maybe we should try to make that easier?