Re: run script on multiple files




On Dec 23, 2006, at 9:25 PM, Chris Charley wrote:


----- Original Message ----- From: "Kirk Wythers" <kwythers@xxxxxxx>
Newsgroups: perl.beginners
To: "John W. Krahn" <krahnj@xxxxxxxxx>
Cc: "Perl Beginners" <beginners@xxxxxxxx>
Sent: Saturday, December 23, 2006 1:32 PM
Subject: Re: run script on multiple files


Thanks or the reply John. I have a couple of questions inline.

On Dec 22, 2006, at 10:53 PM, John W. Krahn wrote:


#! /usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;

#MICIS climate data munger. Required input argument is the file to
process.
#Use > to redirect output to new file.

#Set the item delimiter to tabs instead of the default commas and the line

The Output Field Separator ($,) has the default value of undef.

I guess I'm too new at this. I don't understand your point.

Above, you said 'Set the item delimiter to tabs instead of the default commas'
and John is saying that the default of $, is undef, not a comma.


[snip]

#Part 1. Loop through the 11 header lines to identify the station id.
#The 7th line contains the station ID, and has the format of
#STATION: SOME_STATION, STATE (Station ID: ######)

for(my $i=1;$i<=6;$i++) {

Your comment says eleven lines but your code says six?

A mistake on my part not updating the comments. The earlier file format had 11 lines.

You need to correct this in John's code where it says:
if (1..11)
should then be
if (1..6)

[snip]

You are using the <> operator to read from the file(s) so this *will* read all
the lines from all the files listed on the command line. The only problem is
that you will not distinguish the headers from the second and subsequent files
listed on the command line.

That will not do. I need to start fresh on each file. Just as if I ran the program as:

./program.pl file1
./program.pl file2
./program.pl file3
ect....

And thats what John's solution does. In his code:

# At eof close the input filehandle to reset $.
if ( eof ) {
close ARGV;
next;
}

That 'resets' $. to 1 (beginning line of the next file) and the 'next' keyword
instructs your program to 'goto' the while (<>) statement above thus not processing
any of the statements below it at the end of each file.


[snip]

This may work better for you:

#!/usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;


my $dbh = DBI->connect( 'DBI:Pg:dbname=met_data;host=localhost', 'pguser',
'pguser' )
or die "Couldn't connect to PostgreSQL: $DBI::errstr ($DBI::err) \n";

my $sth = $dbh->prepare( 'INSERT INTO weather (station_id, year, month, day,
doy, date, precip, tmin, tmax, snowfall, snowdepth, tmean) VALUES
(?,?,?,?,?,?,?,?,?,?,?,?)' );


my $station_id = '';

while ( <> ) {

# Part 1. Loop through the 11 header lines to identify the station id.
# The station ID has the format of:
# STATION: SOME_STATION, STATE (Station ID: ######)
if ( 1 .. 11 ) {
$station_id = $1 if /\(Station ID:\s*(\S+)\)/;

Thi regular expression wants to match:
\( - a literal left parenthesis
Station ID: - then this text
\s* - 0 or more spaces
(\S+) - 1 or more non-space characters (enclosed in capturing parentheses
whose value will be held in $1)
\) - a final literal right parenthesis



It seems that this is more flexable. ie not dependent upon a certine number of header lines. Can you translate the f /\(Station ID:\s*(\S+) \)/; part though?

next;
}

# At eof close the input filehandle to reset $.
if ( eof ) {
close ARGV;
next;
}


I think this is suppose to allow the script to jump to the next file. Right?

Right


However, this script also reads the first file into the database, then stops.

Don't know why - maybe someone else could say.

I think I see what is happening. John's script was crashing at the end of the first file with an error that I saw earlier when writing my script. There is a footer at the end of each file and as soon as the script hits the footer junk, it gives:

Use of uninitialized value in concatenation (.) or string at ./ micis_final.pl line 38, <> line 37993.
Use of uninitialized value in concatenation (.) or string at ./ micis_final.pl line 38, <> line 37993.

In my script I solved this problem by telling the program to exit when there is nothing in $year (essentially the first empty line which always comes after the last line of data and before the footer). Since Johns script was giving the same error, I added my two bit solution. My guess is that I am telling the program to exit before the the file handle closes and $ is reset.

I below is john's code with my addition. Any ideas how to get out of each file when a blank line is hit, and still close the file handle re-set $?

#!/usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;


my $dbh = DBI->connect( 'DBI:Pg:dbname=met_data;host=localhost', 'pguser', 'pguser' )
or die "Couldn't connect to PostgreSQL: $DBI::errstr ($DBI::err) \n";

my $sth = $dbh->prepare( 'INSERT INTO weather (station_id, year, month, day, doy, date, precip, tmin, tmax, snowfall, snowdepth, tmean) VALUES (?,?,?,?,?,?,?,?,?,?,?,?)' );


my $station_id = '';

while ( <> ) {

#Part 1. Loop through the header lines to identify the station id.
#The station ID has the format of:
#STATION: SOME_STATION, STATE (Station ID: ######)
if ( 1 .. 11 ) {
$station_id = $1 if /\(Station ID:\s*(\S+)\)/;
next;
}

#At eof close the input filehandle to reset $.
if ( eof ) {
close ARGV;
next;
}

#Part 2. Loop through the records and prepare SQL statement.
my ( $year, $month, $day, $precip, $tmin, $tmax, $snowfall, $snowdepth, $tmean, $obstime, $datasource ) = split;

#Stop reading data at the end of the file, when $year is empty. This
#gets you out of the datafile before the program chokes on the footer.
exit unless $year;

#Initialize and concatenate date as YYYMMDD.
my $date = $year . $month . $day;

#Initialize and calculate day of the year (doy)
my $doy = Day_of_Year( $year, $month, $day );

#Switch T (trace) to 0.01 and M (missing) to -999
$precip = 0.01 if $precip eq 'T';
for ( $precip, $tmin, $tmax, $snowfall, $snowdepth, $tmean ) {
$_ = -999 if $_ eq 'M';
}

$sth->execute( $station_id, $year, $month, $day, $doy, $date, $precip, $tmin, $tmax, $snowfall, $snowdepth, $tmean );
#print join( "\t", $station_id, $year, $month, $day, $doy, $date, $precip, $tmin, $tmax, $snowfall, $snowdepth, $tmean ), "\n";
}

#$sth->finish();

# Disconntect from database
$dbh->disconnect();

__END__



.



Relevant Pages

  • Re: run script on multiple files
    ... use strict; ... use DBI; ... # At eof close the input filehandle to reset $. ...
    (perl.beginners)
  • Re: Evocative sights/sounds/smells on the modern railway
    ... have been due to a typing error on the script supplied by the Station ... supplied the recordings. ... it meant that the format of the announcements tended to vary ...
    (uk.railway)
  • Re: Wont Print
    ... An error has occurred in the script on this page ... The problem station ... >printer useing NetDirect connection) and IE6 ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Re: FC6 - sort
    ... That should fix the script that would break. ... It sank into the vortex. ... The third station burned down, fell over then sank into the vortex. ...
    (Fedora)
  • Re: run script on multiple files
    ... John's script was crashing at the end ... of the first file with an error that I saw earlier when writing my ... There is a footer at the end of each file and as soon as the ... Loop through the header lines to identify the station id. ...
    (perl.beginners)