In this video Steve Yegge talks about a project he’s working on called Grok. I seems ambitious and useful.
I worked on a few parsers in the last few months and it seems that most of the work still has to be done even if you’ve got a parse tree of a language.
Yegge explains that they don’t write the parsers themselves, but use the work compiler builders have done. This is a good idea, because writing a parser isn’t easy, even if you have a really good tools to build a parser with.
After you have a parse tree you still need to generate code or evaluate the tree to do the actual work.
It’s the same problem you have when you’re parsing HTML: why would you write that code (badly) when someone else already did it for you?
You have a piece of HTML you want to use some part of. You start with regexes and people start yelling at you for using the wrong tool for the job. So you move on to a better tool.
If I need to parse a few lines of C function declarations, I would start with a small regex, fail and move on to a combination of regexes, an ad-hoc state machine and an actual parser. When you have an actual parser in your toolbox you can take on more problems. It’s not always the right tool. But I will try it first the next time I need to mangle and transmute text, even before I start with the regexes.
Or the same for C.
use Parse::C; my $file = Parse::C->parse($c_file); say $_->name for $file->functions;
Or change all function calls the match one thing and change them to another thing.
use Refactor 'refactor'; my $refactored_file = refactor($file, 'compare(@a, @b) == 0', 'equal(@a, @b)');
my $c_file = Serialize::C->serialize($js_ast);
Or as Steve Yegge says creating a parser (with semantics), code generator, tools and all that is a life’s work. Maybe we should try to make that easier?