Today I tried to create a report of some basic statistics about
Abacus downloads. Normally I would use
grep
, awk
and a few other commandline tools to find a rough estimate of
these numbers. However this time I needed a bit more information than these
tools could give me. A problem in need of a solution.
My first question was: how many people have downloaded Abacus? The answer is
grep '/abacus/files/Abacus' | grep -v '<localip>' \
| grep -v 'somebots' | awk '{print $1}'
| sort | uniq | wc -l
The pattern here is the following. First find the lines you want. Then remove the lines you don’t want. Print the first field-the client-and makes this list unique. I don’t want to count multiple downloads from the same ip.
The next questions was: where do people who download Abacus come from?
For this I take the answer from the last question (without wc -l
) and
write it to a file. Now I can use the file as extra argument for grep
like
this:
grep -f abacus-downloads.txt -F logs/access.log
This makes grep use the lines in abacus-downloads.txt
as the patterns that
it needs to find in logs/access.log
. Now I need to find the first line
where a match appears, which should contain the referrer where the person
comes from. How to do that? I did the following:
- Download Parse::AccessLogEntry from CPAN
- Write a little script
This script will only print a line if it’s the first line containing a client.
use Parse::AccessLogEntry;
my $p = Parse::AccessLogEntry->new();
my %hosts;
while (<>) {
my $line = $p->parse($_);
if (!$hosts{$line->{host}}) {
print;
$hosts{$line->{host}} = 1;
}
}
I pipe the output of the previous grep
through this program and now I have
the lines with the referrers I’m looking for. A small improvement could be
to filter out favicons because in my case one browser downloaded the favicon
before it got the page itself.
Just add
next if $line->{file} =~ m{^/favicon};
at the appropriate spot. Now I need a list of the referrers from these
lines. I could change the print
statement in this program to that for me.
That wouldn’t be the unix way. So I wrote another small program that prints
the field from the log if it’s specified in the arguments.
use Parse::AccessLogEntry;
my $p = Parse::AccessLogEntry->new();
my @args = @ARGV;
@ARGV=();
while (<>) {
my $line = $p->parse($_);
print join("\t", map { $line->{$_} } @args) . "\n";
}
This program can be called using one or more arguments. The argument should
be a key from the $line
hashref, like host, user, date, time, diffgmt,
rtype, file, proto, code, bytes, refer or agent.
Using refer
as an argument, the program ave me a list of the referrers
from the log file. Using sort | uniq -c | sort -rn
on this gave me a top X
list of the referrer where the people who downloaded Abacus came from.