counting matched lines in extremely large files.

From: mikester (submikester_at_yahoo.com)
Date: 12/18/03


Date: 18 Dec 2003 10:25:12 -0800

First off I'll say - I am a bad perl programmer.

I want to be better and with your help I'll get there and then be able
to contribute more here.

That being said, I have a simple problem compounded by file size.

I have a PIX that logs to my syslog server for a ton of items - my
logs sizes get extremely large; ~13 GIGABYTEs daily and they are
rotated daily.

I'm trying to set up some intrusion detection but with file sizes that
big just counting incidents to start getting a baseline gets time, cpu
and memory intensive using shell commands like grep. So I wanted to do
something in perl but I don't know if because of the file size and
memory limitations I can do that.

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Anyway - your help is appreciated.

The Mikester



Relevant Pages