Re: Need to pull matched string plus a few additional bytes



Phil Miller am Freitag, 27. Oktober 2006 15:36:
I am working on my very first program and have run into a bit of a
roadblock. I am trying to print a report of users who show up in an IIS
Log file. The good news is that the format of the userid is
WINDOWSDOMAIN\USERID. The bad news is that it is not always at the same
place in the IIS Log file due to some variable length fields that come
before it. Its location can vary left or right by about 10 bytes.



I read the IIS Log file in one line at a time. I have gotten far enough
that I can identify the lines with WINDOWSDOMAIN on it, but am stuck
there. The code $userid = substr($logfile_in, 33, 12); gets me close
but depending on the length of the date, the time or the IP address, it
is usually off by a few bytes. A sample of the input is below to
explain what I am talking about.



2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
GET /itd/styles/main.css

2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
GET /itd/styles/contents.aspx

2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
GET /itd/styles/footer.aspx



Essentially what I need to do is find the WINDOWSDOMAIN on a line, and
write to a file the matched string plus \USERID data (up to the next
space). Does anyone have any suggestions? I'm thinking there must be
some very easy way to do it since Perl is made for this sort of thing.
I remember reading about some Perl built-in capability that would take a
scalar variable and parse it into an array based on a delimiter, but I
can't remember what it is. That would probably do it for me. But if
you know of a better way, I'm all ears.

Here's demonstration code how you can do it with a regex or with split.
The code assumes that the GET line and the line above are on one line in the
log.

The two demonstration subs return 1 on match and 0 otherwise, so the counter
can be updated by the subs' return value.

The $miss_counter is calculated only once, from the hits and the number of
lines read.

The data after __DATA__ may be wrapped by your mail client (4 lines).

I'm not sure if "WINDOWSDOMAIN" is meant as a hardcoded constant.


#!/usr/bin/perl

use strict;
use warnings;


# see perldoc perlre
#
sub do_regex {
$_=shift;
if (m; \w+ \\ (\w+) .* \s/itd/ ;ix) { # NOT OPTIMAL!
print "userid (regex): $1\n";
return 1;
}
return 0;
}

# see perldoc -f split
#
sub do_split {
$_=shift;
my @parts=split;
if ($parts[7]=~m;/itd/;i) {
if ( my ($domain, $userid)=split m;\\;, $parts[3] ) {
print "userid (split): $userid\n";
return 1;
}
}
return 0;
}

my $hit_counter=0;

while (<DATA>) {
$hit_counter+=do_regex($_);
do_split($_);
}

my $miss_counter=$.- $hit_counter;

print "hits: $hit_counter / missed: $miss_counter / read: $. lines\n"

__DATA__
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80 GET /itd/styles/main.css
blubb blubb foo bar dummy asdf 44 44 55 66
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80 GET /itd/styles/contents.aspx
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80 GET /itd/styles/footer.aspx


==============

Some random annotations to your code (there are others as well),
UNTESTED:

Below is the code I am using.

# with the following statements your life will be easier!
#
use strict; use warnings;

open USERIDOUT, ">userid.out.txt";

# perldoc -f open
# perldoc perlvar
#
open my $outf, '>', 'userid.out.txt' or die $!;

open IISLOG, "<ex061023.log";

open my $log, '<', 'ex061023.log' or die $!;

$ctr = 0;
$hit_counter = 0;
$miss_counter = 0;
$logfile_in;
$userid;

Put "my" in front of all these declarations/definitions.

while (<IISLOG>)

while (<$log>)

{
$logfile_in = $_;
if ( ($logfile_in =~ m/WINDOWSDOMAIN/i && $logfile_in =~
m/itd/i)

I think you can omit on () pair here.

)
{
print "\n** Found success\n";
$hit_counter += 1;

# same as
#
$hit_counter++;

$userid = substr($logfile_in, 33, 12);
# This is not correct but is somewhat close
print "\n", $userid;
}
else
{
print "Did not find success\n";
$miss_counter += 1;
}
}
print "\n Hit Counter = ", $hit_counter;
print "\n Miss Counter = ", $miss_counter;
print "\n Total Records Counter = ", $hit_counter + $miss_counter;

close USERIDOUT;

close $outf or die $!;

close IISLOG;

close $log or die $!;





Dani
.



Relevant Pages