Re: run script on multiple files



Kirk Wythers wrote:

On Dec 22, 2006, at 8:33 PM, Chad Perrin wrote:

On Fri, Dec 22, 2006 at 08:04:39PM -0600, Kirk Wythers wrote:
I have written a short perl script that munges climate data and then
loads it into a postgres database. It works fine on one file at a
time... syntax is ./program.pl filename

I would like to run it in a directory with multiple files. I have
tried syntax ./program.pl file1 file2, but only the first file gets
processed. Can anyone help me figure out how to run this script on a
directory full of files that all need to be processed?

Yes, some of us probably can help. We'll probably need to see what
you're trying so far to be able to give the most helpful responses
possible, however, for solving your problem. Right off the top of my
head, without any other information, I'm just inclined to say "Try using
'while (<>)' to access file contents." That may not suit your needs at
all, though, since I don't know exactly how you need your file access to
fit into the program.

Thanks for the reply Chad. Here is my script (I'm not sure if I should
be modifying the script itself, or piping something on the CL:

#! /usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;

#MICIS climate data munger. Required input argument is the file to
process.
#Use > to redirect output to new file.

#Set the item delimiter to tabs instead of the default commas and the line

The Output Field Separator ($,) has the default value of undef.

#delimiter to a newline character
$, = "\t";
$\ = "\n";

#Instantiate the global station ID variable
my $station_id = "";
#Initialize I/O variables
my ($year,$month,$day,$doy,$date,$precip,$tmin,$tmax,$snowfall,
$snowdepth,$tmean,$obstime,$datasource);

You don't really need to declare these variables in file scope, you probably
should declare them inside the while loop.

#Part 1. Loop through the 11 header lines to identify the station id.
#The 7th line contains the station ID, and has the format of
#STATION: SOME_STATION, STATE (Station ID: ######)

for(my $i=1;$i<=6;$i++) {

Your comment says eleven lines but your code says six?

my $header = <>;
#Remove the newline character
chomp $header;
if ($i == 2) {
#Split the line into an 3-item array based on the 2 colons.
my @line = split(":", $header);
#Extract everything after the 2nd colon.
$station_id = $line[2];
#Remove leading white spaces.
$station_id =~ s/^\s+//;
#Remove ending bracket.
$station_id =~ s/\)//;
}
}

#Connect to postgreql
my $dbh = DBI->connect( "DBI:Pg:dbname=met_data;host=localhost",
"pguser", "pguser" )
or die "Couldn't connect to PostgreSQL: $DBI::errstr ($DBI::err)\n";

#Part 2. Loop through the records and prepare SQL statement.
while (my $line=<>) {

You are using the <> operator to read from the file(s) so this *will* read all
the lines from all the files listed on the command line. The only problem is
that you will not distinguish the headers from the second and subsequent files
listed on the command line.

chomp $line;
#Split the line on white spaces.
($year,$month,$day,$precip,$tmin,$tmax,$snowfall,$snowdepth,$tmean,
$obstime,$datasource) = split(/\s+/, $line);
#Stop reading data at the end of the file, when $year is empty. This
#gets you out of the datafile before the program chokes on the footer.
exit unless $year;
# Initialize and concatenate date as YYYMMDD.
$date = $year . $month . $day;
# Initialize and calculate day of the year (doy)
$doy = Day_of_Year($year, $month, $day);
#Switch T (trace) to 0.01 and M (missing) to -999
if ($precip eq "T") { $precip = 0.01; }
elsif ($precip eq "M") {$precip = -999; }
if ($tmin eq "M") { $tmin = -999; }
if ($tmax eq "M") { $tmax = -999; }
if ($snowfall eq "M") { $snowfall = -999; }
if ($snowdepth eq "M") { $snowdepth = -999 }
if ($tmean eq "M") { $tmean = -999 }

my $sth = $dbh->prepare("INSERT INTO weather (station_id, year, month,
day, doy, date, precip, tmin, tmax, snowfall, snowdepth, tmean) VALUES
(?,?,?,?,?,?,?,?,?,?,?,?)");

You shouldn't call $dbh->prepare() inside the while loop, you only need to
call it once before the loop starts.

$sth->execute($station_id, $year, $month, $day, $doy, $date, $precip,
$tmin, $tmax, $snowfall, $snowdepth, $tmean);
#print $station_id, $year, $month, $day, $doy, $date, $precip, $tmin,
$tmax, $snowfall, $snowdepth, $tmean;
}

#$sth->finish();

#Disconntect from database
$dbh->disconnect();

This may work better for you:

#!/usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;


my $dbh = DBI->connect( 'DBI:Pg:dbname=met_data;host=localhost', 'pguser',
'pguser' )
or die "Couldn't connect to PostgreSQL: $DBI::errstr ($DBI::err)\n";

my $sth = $dbh->prepare( 'INSERT INTO weather (station_id, year, month, day,
doy, date, precip, tmin, tmax, snowfall, snowdepth, tmean) VALUES
(?,?,?,?,?,?,?,?,?,?,?,?)' );


my $station_id = '';

while ( <> ) {

# Part 1. Loop through the 11 header lines to identify the station id.
# The station ID has the format of:
# STATION: SOME_STATION, STATE (Station ID: ######)
if ( 1 .. 11 ) {
$station_id = $1 if /\(Station ID:\s*(\S+)\)/;
next;
}

# At eof close the input filehandle to reset $.
if ( eof ) {
close ARGV;
next;
}

# Part 2. Loop through the records and prepare SQL statement.
my ( $year, $month, $day, $precip, $tmin, $tmax, $snowfall, $snowdepth,
$tmean, $obstime, $datasource ) = split;

# Initialize and concatenate date as YYYMMDD.
my $date = $year . $month . $day;

# Initialize and calculate day of the year (doy)
my $doy = Day_of_Year( $year, $month, $day );

# Switch T (trace) to 0.01 and M (missing) to -999
$precip = 0.01 if $precip eq 'T';
for ( $precip, $tmin, $tmax, $snowfall, $snowdepth, $tmean ) {
$_ = -999 if $_ eq 'M';
}

$sth->execute( $station_id, $year, $month, $day, $doy, $date, $precip,
$tmin, $tmax, $snowfall, $snowdepth, $tmean );
#print join( "\t", $station_id, $year, $month, $day, $doy, $date, $precip,
$tmin, $tmax, $snowfall, $snowdepth, $tmean ), "\n";
}

#$sth->finish();

# Disconntect from database
$dbh->disconnect();

__END__



John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
.



Relevant Pages

  • Re: OK, so I built a (sort of) shielded loop....
    ... On the other end of the coax is a PL-259 connector. ... The loop seems to do a better job on AMBCB than it does on SW, ... bigstrap station about 80 miles from me. ... I just couldn't find 6mm^2 Litz wire. ...
    (rec.radio.shortwave)
  • Re: 1640KHz saga continues
    ... the Blue Grass Army Depot in Richmond, but they knew of no station on ... Loop Antennas: Relative Size and Ease of Construction ... Vertical Triangle One High Vertical Support-Point ... Horizontal Square-Rectangle Four Vertical Support-Points ...
    (rec.radio.shortwave)
  • Re: 1640KHz saga continues
    ... the Blue Grass Army Depot in Richmond, but they knew of no station on ... I made the loop myself so round isn't as easy as square. ... Vertical Triangle One High Vertical Support-Point ... Horizontal Square-Rectangle Four Vertical Support-Points ...
    (rec.radio.shortwave)
  • Re: Freight Route Utilisation Strategy published
    ... this RUS might be worth a read. ... freight loop on the north side of Basingstoke station. ... If they are going to run the container trains via Salisbury then such a loop ...
    (uk.railway)
  • Re: 1640KHz saga continues
    ... the Blue Grass Army Depot in Richmond, but they knew of no station on ... Try a round loop next time. ... I made the loop myself so round isn't as easy as square. ...
    (rec.radio.shortwave)