2011-06-16

Parsing Apache Access Logs with Mixed Formats

I changed the LogFormat for my Apache server recently from the default common log format (CLF) to the "vhost_common" format and couldn't find an easy solution to handle access logs using multiple formats. The easiest Perl module I found to use is "Parse::AccessLogEntry" (from CPAN.org), but it only works with the common log format. However, with a simple hack I can now parse logs with either or both formats. The trick is that the "vhost_common" format has only one token extra compared to the CLF and that token is the first one on the log line. It is easily detected so it can be removed if present, and the rest of the line can then be handled normally since it will then be in CLF.

Following is a fragment of code from the log line parsing loop showing the hack:

# the incoming line may be in CLF or vhost_common format
# split the line on space to tokenize it
my @d = split(' ', $line);
next if !defined $d[0];
my $vhost = $d[0];
# the vhost token is in format "servername:port"
# and the next token (the first in the CLF format) is
# the remote host address in format "xxx.xxx.xxx" so
# the presence of the ':' tells us the type
# of format we have
my $idx = index $vhost, ':';
if ($idx >= 0) {
  # we have detected the vhost info
  # I remove the port info, you may want it
  $vhost = substr $vhost, 0, $idx;
  # remove the vhost token from the list
  shift @d;
  # reconstitute the log line into the CLF
  $line = join(' ', @d);
}
else {
  # we din't find vhost so set it to zero
  $vhost = 0;
}
# parse the CLF
my $href = $p->parse($line);
if ($vhost) {
  # add the vhost to the hash
  $href->{vhost} = $vhost;
}

I'm sure the code can be improved, but it does work as is.

2011-06-08

SSL Certificates, Virtual Web Sites, Perl for Apache AuthDigest

Note that I have changed two of my favorite links: (1) Debian is now my Linux distribution of choice and (2) I had to fall back to the more powerful Apache HTTPD server because of features I could not find elsewhere.

I have just finished two Perl scripts to aid in Apache htdigest handling for large numbers of users (a database solution will be better but I don't have the time for that at the moment). The first uses the program *nix program pwgen() to yield reasonably secure clear text passwords for an input list of user names. The second takes an input list of pairs of user names with clear text passwords and produces an Apache htdigest file. I will post them if I get any interested responses.

I have been working part time on multiple virtual hosts on my single cloud httpd server and have learned much--with lots to go. The main sites are two for my college and high school graduating classes and one for my company (and I would be very interested in critiques of them):

In the process of building the web sites I found a source for a reasonably priced SSL certificate for multiple virtual hosts on a single server: StartSSL whom I highly recommend.