Peter Stuifzand

Apache accesslog reporting tools

Today I tried to create a report of some basic statistics about Abacus downloads. Normally I would use grep, awk and a few other commandline tools to find a rough estimate of these numbers. However this time I needed a bit more information than these tools could give me. A problem in need of a solution.

My first question was: how many people have downloaded Abacus? The answer is

grep '/abacus/files/Abacus' | grep -v '<localip>' \
    | grep -v 'somebots' | awk '{print $1}'
    | sort | uniq | wc -l

The pattern here is the following. First find the lines you want. Then remove the lines you don’t want. Print the first field-the client-and makes this list unique. I don’t want to count multiple downloads from the same ip.

The next questions was: where do people who download Abacus come from? For this I take the answer from the last question (without wc -l) and write it to a file. Now I can use the file as extra argument for grep like this:

grep -f abacus-downloads.txt -F logs/access.log

This makes grep use the lines in abacus-downloads.txt as the patterns that it needs to find in logs/access.log. Now I need to find the first line where a match appears, which should contain the referrer where the person comes from. How to do that? I did the following:

  1. Download Parse::AccessLogEntry from CPAN
  2. Write a little script

This script will only print a line if it’s the first line containing a client.

use Parse::AccessLogEntry;
my $p = Parse::AccessLogEntry->new();

my %hosts;

while (<>) {
    my $line = $p->parse($_);
    if (!$hosts{$line->{host}}) {
        print;
        $hosts{$line->{host}} = 1;
    }
}

I pipe the output of the previous grep through this program and now I have the lines with the referrers I’m looking for. A small improvement could be to filter out favicons because in my case one browser downloaded the favicon before it got the page itself.

Just add

next if $line->{file} =~ m{^/favicon};

at the appropriate spot. Now I need a list of the referrers from these lines. I could change the print statement in this program to that for me. That wouldn’t be the unix way. So I wrote another small program that prints the field from the log if it’s specified in the arguments.

use Parse::AccessLogEntry;
my $p = Parse::AccessLogEntry->new();
my @args = @ARGV;
@ARGV=();
while (<>) {
    my $line = $p->parse($_);
    print join("\t", map { $line->{$_} } @args) . "\n";
}

This program can be called using one or more arguments. The argument should be a key from the $line hashref, like host, user, date, time, diffgmt, rtype, file, proto, code, bytes, refer or agent.

Using refer as an argument, the program ave me a list of the referrers from the log file. Using sort | uniq -c | sort -rn on this gave me a top X list of the referrer where the people who downloaded Abacus came from.

© 2023 Peter Stuifzand