The writings of Peter Stuifzand

Weblog: perl

Monday I was creating a graph for my webshop software. I use the Chart module, but the colors and the graph of the default settings aren't really pretty. So I tried to create a better looking chart. I think I succeeded at least a bit.

Example of the blue and gray chart

Here is the code that I used to get this result:

my $c = Chart::Composite->new(1024, 250);

$c->set(composite_info => [
    [ 'Lines', [1] ],
    [ 'Points', [2] ],
]); 
$c->set(precision    => 0); 
$c->set(skip_x_ticks => 7); 
$c->set(max_y_ticks  => 11);
$c->set(min_y_ticks  => 2); 

$c->set(legend => 'none');
$c->set(grey_background => 'false');
$c->set(y_grid_lines => 'true');

$c->set(colors => { 
    'background' => [255,255,255],
    'misc' => [200,200,200],
    'x_grid_lines' => [220,220,220],
    'y_grid_lines' => [220,220,220],
    'dataset0'=>[150,180,250],
    'dataset1'=>[120,140,240],
}); 


my $label = 'x';
my $val   = 10;
$c->add_pt($label, $val, $val);

You should call add_pt for each point you want to show. I use $val twice because there are two graphs that both need the same value.

I also noticed that setting min_y_ticks to a lower value (default=6) will produce a nicer chart when you have few values.

Now that we have two programs that parse log files, we can start to take a look at how many lines the program parses per second. First we have to make the two programs as similar as possible. In pseudocode it looks likes this.

  1. Load all modules
  2. Take the start time using Time::HiRes
  3. Put the code of the program here
  4. Set line_count = 0
  5. Using stdin: loop through all lines
    1. Parse the line
    2. Set line_count++
  6. Find the time difference
  7. Divide and line_count / time as n lines/s

In Perl this looks like:

use Time::HiRes 'gettimeofday', 'tv_interval';

my $start = [gettimeofday];

# Your program

my $line_count = 0;

while (<>) {
    # Parse one line using your software
    $line_count++;
}

my $diff = tv_interval($start);
printf "%.2f lines/s\n", $line_count / $diff;

Now run the two programs a few times and look at the parsing speed. In my case there was a big difference between the speed of the two programs. I expect a difference in your run as well.

Writing an apache access log parser isn't that hard. Below is a parser that does just that. It creates Data::Dumper output of all the lines. No warranty.

use Data::Dumper;
use Parse::RecDescent;

$Parse::RecDescent::skip = '';

my $grammar = q{
line: ip ws '-' ws user ws datetime ws request ws status ws responsesize
            ws referrer ws useragent "\n" 
{ $return = {
            ip        => $item[1],
            user      => $item[5],
            datetime  => $item[7],
            method    => $item[9]->{method},
            url       => $item[9]->{url},
            protocol  => $item[9]->{protocol},
            status    => $item[11],
            size      => $item[13],
            referrer  => $item[15],
            useragent => $item[17],
        } }
user: '-' | /\w+/
request: '"' method ws url ws protocol '"' 
    { $return = { method => $item[2], url => $item[4], protocol => $item[6] } }
datetime: '[' date ':' time ws timezone ']' 
    { $return = $item[2] . ' ' . $item[4] . ' ' . $item[6] }
status: /\d{3}/
protocol: 'HTTP/' version
method: 'GET' | 'POST' | 'PUT' | 'DELETE'
ws: /[ ]+/
url: /\S+/
referrer: quotedstring2
responsesize: '-' | /\d+/
useragent: quotedstring2
date: day '/' month '/' year
    { $return = join('/', $item[1], $item[3], $item[5]) }
day: /\d+/
month: 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' | 'Jun' |
    'Jul' | 'Aug' | 'Sep' | 'Oct' | 'Nov' | 'Dec'
year: /\d{4}/
time: /\d{2}:\d{2}:\d{2}/
timezone: ('+'|'-') /\d{4}/  { $return = $item[1].$item[2] }
octet: /\d+/
ip: octet ('.' octet)(3) { $return = $item[1] . '.' . join('.', @{$item[2]}) }    
version: /\d.\d/
quotedstring2: '"' /[^"]+/ '"'   {$return = $item[2]}
};

my $parser = Parse::RecDescent->new($grammar) or die "Bad Grammer";
while (<>) {
    my $ret = $parser->line($_) or print "Parse error\n";
    print Dumper($ret);
}

Today I tried to create a report of some basic statistics about Abacus downloads. Normally I would use grep, awk and a few other commandline tools to find a rough estimate of these numbers. However this time I needed a bit more information than these tools could give me. A problem in need of a solution.

My first question was: how many people have downloaded Abacus? The answer is

grep '/abacus/files/Abacus' | grep -v '<localip>' \
    | grep -v 'somebots' | awk '{print $1}'
    | sort | uniq | wc -l

The pattern here is the following. First find the lines you want. Then remove the lines you don't want. Print the first field-the client-and makes this list unique. I don't want to count multiple downloads from the same ip.

The next questions was: where do people who download Abacus come from? For this I take the answer from the last question (without wc -l) and write it to a file. Now I can use the file as extra argument for grep like this:

grep -f abacus-downloads.txt -F logs/access.log 

This makes grep use the lines in abacus-downloads.txt as the patterns that it needs to find in logs/access.log. Now I need to find the first line where a match appears, which should contain the referrer where the person comes from. How to do that? I did the following:

  1. Download Parse::AccessLogEntry from CPAN
  2. Write a little script

This script will only print a line if it's the first line containing a client.

use Parse::AccessLogEntry;
my $p = Parse::AccessLogEntry->new();

my %hosts;

while (<>) {
    my $line = $p->parse($_);
    if (!$hosts{$line->{host}}) {
        print;
        $hosts{$line->{host}} = 1;
    }
}

I pipe the output of the previous grep through this program and now I have the lines with the referrers I'm looking for. A small improvement could be to filter out favicons because in my case one browser downloaded the favicon before it got the page itself.

Just add

next if $line->{file} =~ m{^/favicon};

at the appropriate spot. Now I need a list of the referrers from these lines. I could change the print statement in this program to that for me. That wouldn't be the unix way. So I wrote another small program that prints the field from the log if it's specified in the arguments.

use Parse::AccessLogEntry;
my $p = Parse::AccessLogEntry->new();
my @args = @ARGV;
@ARGV=();
while (<>) {
    my $line = $p->parse($_);
    print join("\t", map { $line->{$_} } @args) . "\n";
}

This program can be called using one or more arguments. The argument should be a key from the $line hashref, like host, user, date, time, diffgmt, rtype, file, proto, code, bytes, refer or agent.

Using refer as an argument, the program ave me a list of the referrers from the log file. Using sort | uniq -c | sort -rn on this gave me a top X list of the referrer where the people who downloaded Abacus came from.

Sometimes you need to create a large tree of subdirectories. But why? Two examples that I think of are structured directories for weblogs, e.g. /[year]/[month]/[day]/, or the automatic backing up of files e.g. invoices/[company]/[year].

Before you begin, you know that using a split and chdir, or some other combination of built-ins, will just make big mess. A call to mkdir -p could also work, but let's use the available modules this time.

If you use Perl there is always the CPAN that can help you. So, also this time. Enter File::Path.

use File::Path 'make_path';
make_path('posts/2011/04/04');

This will create this structure below the current directory. Simple. There are many of these modules hidden (or less hidden) in the CPAN. THey like the light.

I've been using Plack for a few days now, and I have to say, I really like it. I rewrote the base of my webshop platform to use it. The change wasn't really difficult and the code is a lot cleaner now.

Especially with the Middleware structure I was able to remove a lot of code from the main application. Middleware classes are classes that are called between the webserver and the main application. These classes have the opportunity to rewrite the request and the response classes for each request.

Because I use the same code for each webshop, but different databases for each customer, so I need a way to select the right database for a request. This is a great use for a Middleware class. The software selects and connects to the database based on the hostname in the request. The connection is then added to the environment hash. And the way back the database connection gets closed.

Furthermore there are already a few nice Middleware classes for logging and debugging. That is code you don't have to write. I really like it.

The latest version of Perl, version 5.12 has been released. There are many changes and bugfixes and I would recommend you check the release notes for the whole list.

  • Implicit strictures when you use an version number greater than 5.12.0
  • The yada, yada operator
  • Y2038 compliance
  • New package naming for including the version
  • Updated to the latest Unicode standard

The feature I'll talk about here is the given/when construct, which was added in perl 5.10. It works like switch/case in other programming languages, but is much more powerful. The matching is based on smart matching, which is another feature added in 5.10;

I will start with a simple example to give you an idea of the syntax that is used.

use 5.010;

my $x = <>;
chomp $x;

given ($x) {
    when ([0..99]) {
        say "Looking good";
    }
    when ([100..199]) {
        say "That's a bit much";
    }
    default {
        say "This could be a problem";
    }
}

This code compare the value of $x with the array's in the when statements. If $x is between 0 and 99 (inclusive) it will the text Looking good. If it's between 100 and 199 then it will say That's a bit much. The default block will be called when the value isn't matched by the when blocks.

Next I will give a more useful example, but not much more.

use 5.010;

my ($x, $y) = (0,0);

LINE: while (<>) {
    my @parts = split /\s+/;

    for (@parts) {
        when (/^x(\d+)/) {
            $x = $1;
        }
        when (/^y(\d+)/) {
            $y = $1;
        }
        when (/^p/) {
            say $x + $y;
        }
        when (/^q/) {
            last LINE;
        }
    }
}

This example reads lists of tokens from STDIN and matches them and executes code based on the input. In effect it's a small programming language. Notice that this code doesn't use the given statement. It's not needed here, because the for already assigns each element of @parts to $_.

It's also possible to use simple expressions like you would use in an if statement. For example:

use 5.010;

my $age = <>;
chomp $age;

given ($age) {
    when (!/^\d+$/) {
        say "Not a number";
    }
    when ($_ > 100) {
        say "That's quite old";
    }
    when (18) {
        say "Now your life begins...";
    }
    when (0) {
        say "Just born, and already using the computer.";
    }
    default {
        say "I have nothing useful to say about '$age'";
    }
}

As you can see when is quite smart about what to do with different expressions. The first when clause contains a negated regular expression. This will be matched using $age !~ m/REGEX/. The second one do what you expect. The 18 and 0 clauses will match using $age == 18 and $age == 0. You should watch out with comparing to 0 because this will also match empty strings or just strings. For example if $age = 'hello', when(0) will match.

Smartmatching is really powerful. With given and when it's easy to use this power for deciding what to do with the value that you've been given. You should take a look at the manual for more information about the possible smart matches and the things you can with given and when.

Example of how you should send email in 2010 with Perl.

#!/usr/bin/perl -w
use strict;

use Email::Sender::Simple qw(sendmail);
use Email::Simple;
use Email::Simple::Creator;

my $email = Email::Simple->create(
    header => [
        To      => '"Peter Foo" <foo@example.com>',
        From    => '"Peter Bar" <bar@example.org>',
        Subject => "What's up",
    ],
    body => "Hey, how are you doing?",
);

sendmail($email);

You can see that this is simple enough. First you create an Email::Simple object. This object will format the message that wil be sent.

After that sendmail will send this message using it's default transport.

The nice thing about using these modules is that you can, while testing, replace the default transport with testing modules.

use Test::More;
BEGIN { $ENV{EMAIL_SENDER_TRANSPORT} = 'Test' }
use YourModule;

YourModule->run;
my @deliveries = Email::Sender::Simple->default_transport->deliveries;
is(@deliveries, 1);

This way you can check how many emails were send using in the code your testing. By picking apart @deliviries you can check out the text of the emails that were send. See Email::Sender::Transport::Test for more information about this.

I use prove from Test::Harness to test the code of my Perl projects. The program prove runs tests files and summarizes the output. If tests fail then it shows the error in its output.

The problem is, my defaults aren't the same as the defaults that prove uses. I keep my main application files in app/lib and a few vendor libraries in vendor. The problem with this is that I have type out the options of prove every time I wanted to test. For the longest time I used a Makefile to solve this problem.

Then I found out about .proverc. If you put this file in your current directory of in your home directory, then prove will use each line as an extra command-line option.

The .proverc in the root dir of my project looks like this:

-Iapp/lib
-Ivendor
-r

It includes two extra directories in the Perl include path; app/lib and vendor. It also find all test files recursively in the t directory, which is also not a default setting of prove.

Perl 5.11.4 is released:

This is the fifth DEVELOPMENT release in the 5.11.x series leading to a stable release of Perl 5.12.0. You can find a list of high-profile changes in this release in the file perl5114delta.pod inside the distribution.

Plack is a Perl Web Server:

Plack is the superglue interface between perl web application frameworks and web servers, just like Perl is the duct tape of the internet.

This is interesting and maybe useful in my webshop platform.

Let's say I have this line of code

my $numbers = [ 1, 2, 3, 4, 5, 6 ];

and I want to reverse the items in the list. In Perl 5 there is this useful builtin function which does just that. There problem is, it only works on lists, not arrayrefs. If you use it on an arrayref, you will reverse the string representation of the arrayref. Which will return something like this:

)8936f22x0(YARRA

This is not very useful. I think, the only way to reverse a arrayref is by dereferencing it to an array, then reverse it, and then make it an arrayref again. Something like this:

my $newarray = [ reverse @$numbers ];

This line contains synthetic code. It should be easier to reverse arrays.

osfameron asked:

I've been prototyping ways for customers to import data into our system. [...] How would you tackle this task? Is there even an elegant way to do it imperatively?

He is trying to import data into his system, but his information is not all in a flat structure. It looks like this:

Name        Price   Tag
Brie        2.00    Dairy
                    Cheese
                    Food
Chablis     5.00    Wine
                    Alcohol
                    Drink

As you can see the there is a start row and a few extra data rows after that.

The program that I would write to import this data looks like this:

use strict;
use warnings;
use Product;
use Data::Dumper;

my @products;

while (<>) {
    if (my ($name, $price, $tag) = m/^(\w+)\s+(\d\.\d{2})\s+(\w+)$/) {
        my $product = Product->new(product => $name, price => $price);
        $product->add_tag($tag);
        push @products, $product;
    }
    elsif (($tag) = m/^\s+(\w+)$/) {
        my $product = $products[$#products];
        $product->add_tag($tag);
    }
}

print Dumper(\@products);

In the way that I have written this code it doesn't need two pushes. Pushing it on the array is the first thing that I do after creating the product. The other parts of the code that need to refer to the product that was added last use the last product in the array.

My program is different to osfamerons code by using regexes, but this allowed my to actually test my code.

This is a small Perl 5.10 tip. You can use named capture buffers to fill a hash. Before 5.10 you needed to 'unpack' the values from a regex match into a hash.

I will give a small example that shows the problem this creates.

my $str = "name=value";
my %result;

if ($str =~ m/(\w+)=(\w+)/) {
    $result{name} = $1;
    $result{value} = $2;
}

The problem with this is that you need to number the variables and assign each individually to the hash. In Perl 5.10 you can do the following.

if ($str =~ m/(?<name>\w+)=(?<value>\w+)/) {
    %result = %+;
}

This is a lot simpler, cleaner and contains less possible bugs.

This week I will show a simple example about how to use the underscore prototype which is new in Perl 5.10.

This feature was added to allow you to write functions that work like builtin functions. It's up to you to not abuse it and write unreadable code, of course.

This example shows how to use this new feature. It allows you to use the $_ variable as an argument for a user defined function.

use strict;
use feature 'say';

sub greeting(_) {
    my ($greeting) = @_; 
    $greeting //= 'world';
    say "Hello $greeting";
    return;
}

for (qw/planet people/) {
    greeting();
}

greeting();

The output of the program is:

Hello planet
Hello people
Hello world

As you can see, you don't need to specify an argument to the greeting function. The first two calls in the for loop use the two values from the loop.

The call outside the loop uses the default value ('world') that was specified in the function.

This is the first article in a series about new Perl 5.10 features. I'll try to write one article each week, but you'll never know.

This first article is about the new builtin function say. The say function is similar to print in how it works. Whenever you want to use say or other features from Perl 5.10, you need to declare that you want to use these features.

use 5.010;

After you have done that you can use the function like this.

say 'Hello world';

This will print the text 'Hello, world' to STDOUT and add a newline "\n" after it, equal to:

print "Hello, world\n";

This doesn't seem like a big feature, but it will also help with the following.

say for @lines;

This is instead of

print "$_\n" for @lines;

The say function removes some complexity in this example. You don't have to add the newline anymore.

Jesse Vincent released a new development version of Perl 5:

It gives me great pleasure to announce the release of Perl 5.11.0.

Perl 5.11.0 is a DEVELOPMENT release. We're making it available to you today to make it easy for you to test your software on what will eventually become Perl 5.12.

The nice thing about this release is that there will be releases of Perl every month, with the dates already scheduled. The next release (5.11.1) is coming on October 20.

I hope this will increase the development speed of Perl. I also think that this way it's easier to make new releases.

I created a small Perl program to convert relative dates to absolute dates in the format that I use for my calendar. My current calendar file looks like this.

2008
    08
        2008-08-12
        2008-08-13
    09
        2008-09-10
        ...

If I want to add a date and I don't know the actual numbers, I can use the following program to convert the date. It will also respect the whitespace in front of the text.

#!/usr/bin/perl -w
use v5.10;

use strict;
use warnings;

use Date::Manip;

Date_Init('Language=Dutch');

my $inp = <>;

if (my ($ws, $date) = $inp =~ m/^(\s*)(.+)$/) {
    say $ws . UnixDate($date, "%Y-%m-%d");
}
else {
    print $inp;
}

I use Date::Manip for parsing the date. It works with human language style dates like thursday. I added Date_Init so it will parse Dutch days like donderdag. Also I use Perl 5.10, because I can. It has some nice features.

To use it put this script in the path. I called it refdate.pl. To use it type a date and type !!refdate.pl<Enter>. In Vim this will call the program on the current line. The date will be formatted in the YYYY-MM-DD format.

The results for the perl survey are in. You can download the raw data from the results page. The analysis of the results will be on that same page in some time.

The first survey results are in about the conversion rates. There is still time to complete the questions. For all the reasons mentioned on the page: fill out the perl survey.

I was thinking about writing a list of the perl modules that I used most often, liked best or somehow used. Some are golden oldies, others are new this year. There is no order in the list, so no Top 10 or anything.

  • DBIx::DWIW this is a module that helps writing database code. It contains methods for retrieving data and executing query's. It removes almost all of the repetitive database coding work. It's nice to be able to subclass and create a file with the configuration of the databases.

    With all this niceness there are also a few hairy parts to this module. Especially when using it in a web environment. The 'verbose' statements are printed to STDOUT, which will result in 500 Internal server errors. The logging itself is really useful, but could use some work still.

    View the module on CPAN: DBIx::DWIW

  • File::Slurp, this is one of those modules that will make you really lazy. You provide a new and the functions of this module will write or read them. Not much more to be said.

     use File::Slurp qw/read_file/;
    
     my $text = read_file('config.yml');
    

    View the module on CPAN: File::Slurp

  • POE, this is of course more of a framework for creating applications than just one module. There is much to be found in POE namespace. When you're creating network applications are daemons this is the place where you should start your search. POE has many different modules that can be reused. Creating networking applications has never been easier.

    View the module on CPAN: POE

  • YAML, I think more programs should use yaml. It has many different uses: config file, data transfer between applications. You could use it where you would normally use XML. But yaml is more consise and humand readable and writable. I use it for config files and my blogging software uses it as an index to all the different blog entries.

    View the module on CPAN: YAML

  • svk, not an actual perl module, but still a great perl application.

    SVK homepage

  • plagger, the perl rss aggregator.

    Plagger homepage

Yesterday I received the Perl Testing Developers Notebook. It only took about six days to get here, but it still is three months ago, since I won. Of course this is no problem and I'm very happy with the new book.

I like to thank Josh, for the contest. Which was actually quite easy :).

I've started reading it and it looks like it contains real nice examples for perl testing. Some stuff that will be really useful, when I'm working on the webshop and the concert website.

I created a script that allows me to update a page on a media wiki. It uses WWW::Mechanize, IO::All and Config::Std. The last two modules aren't needed probably. But it makes it all a lot simpler.

#!/usr/bin/perl -w

use IO::All;
use WWW::Mechanize;
use Config::Std;

read_config "$ENV{HOME}/.mediawiki" => my %config;

my $username = $config{login}{username};
my $password = $config{login}{password};
my $hostname = $config{login}{hostname};

my $title = $ARGV[0] or die "usage:   mediawiki [page title with under scores]\n\n";
my $url = "http://$hostname/mediawiki/index.php?title=$title&action=edit";

my $ua = WWW::Mechanize->new;

my $login_url = "http://$hostname/mediawiki/index.php?title=Special:Userlogin&returnto=Main_Page";
$ua->get($login_url);

$ua->field('wpName', $username);
$ua->field('wpPassword', $password);
$ua->click('wpLoginattempt');

$ua->get($url);

$ua->form_number(1);

my $form = $ua->current_form();

my $tmpfile = io('/tmp/mediawiki-1');
$form->value('wpTextbox1') > $tmpfile;

system('vi /tmp/mediawiki-1');

$tmpfile->open('r');

my $content = $tmpfile->slurp;
$ua->field('wpTextbox1', $content);
$ua->click('wpSave');

print "Done!\n\n";

The configuration is read from /home/$USER/.mediawiki. It should look something like this.

[login]
username=your name
password=your password
hostname=your wiki hostname

Where hostname should be something like 'localhost' or 'www.somedomain.org'.

Oh, this code comes without any warranty. Use at your own risk.

This post is not about perl plugins (what would that be?). But about programs that use plugins. One of those programs is Qpsmtpd, a perl mail server. They use a special and very nice way to use plugins.

Moveable Type is another perl program, that uses plugins. These plugins are a little bit simpler than the qpsmtpd plugins, but just as powerful.

And now there is another perl program that uses plugins, my webshop. It is really nice to see how easy it is to create a new type of shop on top of the current code. I can't yet show it, but within a few days it will be online.

At the company, where I'm doing my internship, I needed a code generator for some C code. There was some C code needed. So I started of writing it. After some lines, there started to show some patterns, which could be easily handled by some description language.

The first description language was a line based format. First a line with the name of the object. And after that the names of the functions. This format is easy to parse and easy to use. The output was done with print statements.

For code generation I always used this method. But why should the webapplications have all the template fun? So I downloaded Template (which is the Template Toolkit), from the CPAN and I tried some things with it. It was really easy to generate code like this. The code was also much cleaner and shorter.

All nice and easy, but not good enough, because it should be possible to add C code to the functions. I almost started to try and create regular expressions for this task, but then the laziness started to kick in again. I remembered something about a Text::Balanced, which is written by Damian Conway. I can extract delimeted text from strings. Which was exactly what I wanted to do.

The code is again much smaller, and writing the regexen would be the wrong way to go. In this case Text::Balanced is the middle ground between regexen and some parser written with something like Parse::RecDescent. Laziness + CPAN is a good thing.

I won the Perl Testing Developer Notebook written by Ian Langworth and chromatic on perlcast.com. I'm curious about when the book will arrive.

It was hard work, but at last a new version of the hobby-koopjes.nl website is online. It has become a nice website, with an interface that's actually quite pretty.

View archived entries