Re: Please critique this short script that scans a log file

From: Brian McCauley (nobull_at_mail.com)
Date: 12/17/03


Date: 17 Dec 2003 18:18:14 +0000

J.W. <jwngaa@att.net> writes:

> I'm new to Perl and wrote a simple script to scan a database log file
> with embedded timestamps. This is similar to scanning other common
> text files with timestamped entries such web server utilization logs,
> transaction logs, etc.
>
> This script already works correctly and the -w doesn't report any
> warnings. However, I'd like to get any feedback on making it better.

Oooh.

First, "use warnings" rather than -w unless you need to run on Perl <5.6

> My experience has been with C++ and ksh/awk, so I'm not sure my brain
> is solving problems the "Perl" way yet.

> The example input file and resultant output file is at the bottom of
> this message.
>
> Notes:
> -- purposely didn't use Date::Calc (or any other non-standard module)
> -- could've used longer regex instead of nested if(regex) {if (regex)}
> but avoided it because of NFA regex engine
> -- the timestamp regex could've been simplified to /(...) (...) (..)
> (..):(..):(..) (....)/ but I deliberately wanted to use character
> classes to only extract valid date strings

Hmmm... OK I won't criticise that. (But it's a bad idea).

> -----
>
> #!/usr/bin/perl
> #
> # Read db2uext2.log.NODE0000 and calculate archive copy
> # durations (in seconds) between "Time Started" and
> # "Time Completed"
>
> use strict;

Add "use warnings"

> use Time::Local;
> use Time::Local 'timegm_nocheck';
>
> my $regex_ctime;
> my %MonthIndex;
> my $ts1;
> my $ts2;
> my $ts1_started;
> my $ts2_completed;
> my $rc;

Don't declare all your variables up-top, declare them in the smallest
applicable lexical scope as it defeats half the point of having
explicit declaration in the first place. 99% of the time this means
the declaration should be combined with the first assignment. This
has nothing particular to do with Perl, it applies equally in many
languages including C++.
 
> # $1 $2 $3 $4 $5 $6 $7
> #Time Started: Mon Dec 17 13:47:42 2001
> #Time Started: Wed Jul 23 17:35:35 2003 (20030723173535)
>
> $regex_ctime = qr/ (Sun|Mon|Tue|Wed|Thu|Fri|Sat)
> (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([ |0-3][0-9])
> ([0-2][0-9]):([0-5][0-9]):([0-5][0-9]) ([0-9][0-9][0-9][0-9])/o;

Never use the /o regex qualifier without understanding it. Since /o
does nothing in the line above I can reasonably assume you don't
understand it. :-)

> %MonthIndex = (
> "Jan", 0,
> "Feb", 1,
> "Mar", 2,
> "Apr", 3,
> "May", 4,
> "Jun", 5,
> "Jul", 6,
> "Aug", 7,
> "Sep", 8,
> "Oct", 9,
> "Nov", 10,
> "Dec", 11);

Consider using the "fat comma" => for readability.

> while (<>) {
> # chomp; # not needed for this type of processing
> if (/^Time Started:/) {
> if ($_ =~ $regex_ctime) {

IMHO this is better written as

                 if (/$regex_ctime/) {

> $ts1_started = timegm($6, $5, $4, $3,
> $MonthIndex{$2}, $7);
> $ts1 = "$1 $2 $3 $4:$5:$6 $7";
> }

You probably wanted an else clause to do something if Time Started:
wasn't followed by a valid time.

If not you may as well have combined the regex into

    if ( /^Time Started:\s*$regex_ctime/)

> if (/^RC\s+:\s+(\d+)/) { # Return Code
> $rc = $1;
> }

Consider using post-condition syntax

  $rc = $1 if /^RC\s+:\s+(\d+)/ # Return Code

> if (/^Time Completed:/ && defined($ts1_started)) {

Why are you checking defined($ts1_started)? Surely if the input was
invalid you'd _want_ your program to crash out or do something.

It is the bane of my life that inexperienced programmers often think
it is a good idea to make a program that continues without comment if
fed invalid data. Yes, there are times when this makes sense but in
general it is better that a program terminate or emit some sort of
error rather than do something undefined in the case that it is fed
invalid input. This has nothing to do with Perl.

> if ($_ =~ $regex_ctime) {
> $ts2_completed = timegm($6, $5, $4, $3,
> $MonthIndex{$2}, $7);
> $ts2 = "$1 $2 $3 $4:$5:$6 $7";

This code is a duplication of what you have above. Whenever you find
yourself writing the same thing twice you should always look to see
how it can be avoided. This has nothing to do with Perl, it applies
in all programming languages.

In this case you could have captured both started and completed in one
operation and stored the result in a hash.

  if ( /^Time (Started|Completed):\s*($regex_ctime)/ )
        $t{$1} = timegm($8, $7, $6, $5, $MonthIndex{$4}, $9);
        $ts{$1} = $2;
   )
> printf("$ts1 $ts2 %4d %3d\n", $rc ,

Do not compute the format string of printf() unless you
are really are computing a string of printf template elements.
Instead put %s in the format string and put the data to be
interpolated in the argument list. This has nothing to do with Perl,
it applies equally in all languages with printf().

> ($ts2_completed - $ts1_started));
> }
> }
> }
>
>
> -----
>
> #example log file excerpt used as input
> ********************************************************************************
> Time Started: Mon Dec 17 13:18:12 2001
>
> Parameters Passed:
> db2uext2 -OSAIX -RLSQL07010 -RQARCHIVE -DBKP1 -NNNODE0000
> -LP/db2/KP1/db2kp1/NODE0000/SQL00001/SQLOGDIR/ -LNS0007259.LOG
> System Action: ARCHIVE
> Target: DISK
> RC : 0
> Time Completed: Mon Dec 17 13:18:27 2001
>
> ********************************************************************************

You probably want a line of ******* to make your program clear all
its variables so that if one stanza is missing a particular data item
id does not carry forward from the previouis stanza.

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


Relevant Pages

  • Re: how to use system call within a cgi script
    ... I missed to declare my $result after ... Thanks a lot for the tip of checking logs. ... I have to execute an executable from a CGI script written in perl. ...
    (comp.lang.perl.misc)
  • RE: helping writing a script
    ... - Find the portion which contains the "firewalls logs", ... Now it is time to learn some perl. ... For well commented perl script examples search Randal's site at ...
    (perl.beginners)
  • Re: speed test
    ... How about the files parsing with huge size? ... I'm afraid perl can't finish the daily job so I want to know the speed ... It sounds a lot like you are parsing logs from a web server or well actually ... size of the data to parse grows there will be a bigger and bigger desire for ...
    (perl.beginners)
  • Re: PERL cant open file for logging (world writable directory Windows XP Home/ Active Perl / Apache)
    ... Apache woudn't display .html files if they are in cgi-bin directory. ... I get an error "Cannot open file" in the error logs ... Neither of these things is Perl related. ...
    (comp.lang.perl.misc)
  • Re: the google keywords for debug...
    ... I can not find this message in the project private logs. ... I receive a project in perl. ... I am sure that sub A was executed. ...
    (perl.beginners)