parsing large amounts of text

From: Andrew Gaffney (agaffney_at_skylineaero.com)
Date: 06/28/04


Date: Mon, 28 Jun 2004 14:18:19 -0500
To: beginners <beginners@perl.org>

I'm working on a custom Perl script to parse my Apache logs and report custom
information. When I run the following program, it ends up eating all available
RAM (the system has 1GB) and dying. My access_log is ~410MB. Am I doing
something wrong?

#!/usr/bin/perl

use strict;
use warnings;

use CGI();

my $months = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul
=> 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 };
my @requests;
my $start = time;
open LOG, "< /var/log/apache/access_log";

while(<LOG>) {
   my $line = $_;
   $line =~ /^(\d+\.\d+\.\d+\.\d+) (.+?) (.+?) \[(.+?)\] \"(?:(.+?) )?(.+)(?:
(.+?))?\" (\d+) (.+?) \"(.+?)\" \"(.+?)\"$/;
   my ($ip, $date, $request, $requestcode, $bytesreturned, $browser) = ($1, $4,
$6, $8, $9, $11);

   $request = CGI::unescape($request);
   push @requests, [$ip, $date, $request, $requestcode, $bytesreturned, $browser];
}
my $end = time;
my $elapsed = $end - $start;
close LOG;

print "$#requests total records. $elapsed seconds elapsed\n";

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Relevant Pages

  • Re: parsing large amounts of text
    ... > I'm working on a custom Perl script to parse my Apache logs and report custom ... inefficient and your AoA @requests may get very large. ... You shouldn't use the numererical scalars unless the regular expression ...
    (perl.beginners)
  • Re: parsing large amounts of text
    ... > I'm working on a custom Perl script to parse my Apache logs and report custom ... inefficient and your AoA @requests may get very large. ... You shouldn't use the numererical scalars unless the regular expression ...
    (perl.beginners)
  • Re: Strange CONNECT entries in apache logs
    ... Subject: AW: Strange CONNECT entries in apache logs ... I find some of this requests in my logs too; ... My apache server answers with 400 or 405 on this ...
    (Incidents)
  • CONNECT in apache logs
    ... Just looking for some peace of mind on getting lines like this in my ... Can I configure anything so that these CONNECT requests give an error ...
    (Debian-User)