Peter Stuifzand

Use Perl 5.10: Named capture buffers

In the latest version of Perl 5 the regex engine also got a big upgrade. There are many changes that made it faster and more correct for certain regexes. This time I will explain the new features called Named capture buffers.

Named capture buffers are similar to the numbered capture buffers, like $1 and $2. The named versions of these work the same except that you can give them a name like name or value. This will help you with documenting the regex that you use.

Here is a small example:

use 5.010;

# The regex with named capture buffers
my $regex = qr{(?<name>\w+)=(?<value>\d+)};

# For testing
my @lines = ('hello=1', 'test=2', 'perl=5010');

# The test program
for (@lines) {
    if (m/$regex/) {
        say 'Name: ', $+{name}, "\tValue: ", $+{value};
    }
}

The output of the program:

$ perl namevalue.pl
Name: hello Value: 1
Name: test  Value: 2
Name: perl  Value: 5010

The syntax for specifying the buffers in the regex is:

(?<name>pattern)

The name should match /^[_A-Za-z][_A-Za-z0-9]*\z/, pattern can be any legal perl regex.

After you successfully match the regex with a string, you can refer to the matched value with the %+ hash. In the example it is the $+{name} value of the hash that contains the matched value.

To created named backreference to a named capture buffer, you can use the \k<name> syntax. You could for example do the following:

(?<name>\w+) \k<name>

This would match with hello hello for example.

The best thing about this new feature is that it helps with documenting your program if you use meaningful names in your regexes.

© 2023 Peter Stuifzand