Re: run script on multiple files



Thanks or the reply John. I have a couple of questions inline.

On Dec 22, 2006, at 10:53 PM, John W. Krahn wrote:


#! /usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;

#MICIS climate data munger. Required input argument is the file to
process.
#Use > to redirect output to new file.

#Set the item delimiter to tabs instead of the default commas and the line

The Output Field Separator ($,) has the default value of undef.

I guess I'm too new at this. I don't understand your point.


#delimiter to a newline character
$, = "\t";
$\ = "\n";

#Instantiate the global station ID variable
my $station_id = "";
#Initialize I/O variables
my ($year,$month,$day,$doy,$date,$precip,$tmin,$tmax,$snowfall,
$snowdepth,$tmean,$obstime,$datasource);

You don't really need to declare these variables in file scope, you probably
should declare them inside the while loop.


Understood

#Part 1. Loop through the 11 header lines to identify the station id.
#The 7th line contains the station ID, and has the format of
#STATION: SOME_STATION, STATE (Station ID: ######)

for(my $i=1;$i<=6;$i++) {

Your comment says eleven lines but your code says six?

A mistake on my part not updating the comments. The earlier file format had 11 lines.


my $header = <>;
#Remove the newline character
chomp $header;
if ($i == 2) {
#Split the line into an 3-item array based on the 2 colons.
my @line = split(":", $header);
#Extract everything after the 2nd colon.
$station_id = $line[2];
#Remove leading white spaces.
$station_id =~ s/^\s+//;
#Remove ending bracket.
$station_id =~ s/\)//;
}
}

#Connect to postgreql
my $dbh = DBI->connect( "DBI:Pg:dbname=met_data;host=localhost",
"pguser", "pguser" )
or die "Couldn't connect to PostgreSQL: $DBI::errstr ($DBI::err) \n";

#Part 2. Loop through the records and prepare SQL statement.
while (my $line=<>) {

You are using the <> operator to read from the file(s) so this *will* read all
the lines from all the files listed on the command line. The only problem is
that you will not distinguish the headers from the second and subsequent files
listed on the command line.

That will not do. I need to start fresh on each file. Just as if I ran the program as:

../program.pl file1
../program.pl file2
../program.pl file3
ect....


chomp $line;
#Split the line on white spaces.
($year,$month,$day,$precip,$tmin,$tmax,$snowfall,$snowdepth,$tmean,
$obstime,$datasource) = split(/\s+/, $line);
#Stop reading data at the end of the file, when $year is empty. This
#gets you out of the datafile before the program chokes on the footer.
exit unless $year;
# Initialize and concatenate date as YYYMMDD.
$date = $year . $month . $day;
# Initialize and calculate day of the year (doy)
$doy = Day_of_Year($year, $month, $day);
#Switch T (trace) to 0.01 and M (missing) to -999
if ($precip eq "T") { $precip = 0.01; }
elsif ($precip eq "M") {$precip = -999; }
if ($tmin eq "M") { $tmin = -999; }
if ($tmax eq "M") { $tmax = -999; }
if ($snowfall eq "M") { $snowfall = -999; }
if ($snowdepth eq "M") { $snowdepth = -999 }
if ($tmean eq "M") { $tmean = -999 }

my $sth = $dbh->prepare("INSERT INTO weather (station_id, year, month,
day, doy, date, precip, tmin, tmax, snowfall, snowdepth, tmean) VALUES
(?,?,?,?,?,?,?,?,?,?,?,?)");

You shouldn't call $dbh->prepare() inside the while loop, you only need to
call it once before the loop starts.

I follow



$sth->execute($station_id, $year, $month, $day, $doy, $date, $precip,
$tmin, $tmax, $snowfall, $snowdepth, $tmean);
#print $station_id, $year, $month, $day, $doy, $date, $precip, $tmin,
$tmax, $snowfall, $snowdepth, $tmean;
}

#$sth->finish();

#Disconntect from database
$dbh->disconnect();

This may work better for you:

#!/usr/bin/perl -w
use strict;
use Date::Calc qw(Day_of_Year);
use DBI;


my $dbh = DBI->connect( 'DBI:Pg:dbname=met_data;host=localhost', 'pguser',
'pguser' )
or die "Couldn't connect to PostgreSQL: $DBI::errstr ($DBI::err) \n";

my $sth = $dbh->prepare( 'INSERT INTO weather (station_id, year, month, day,
doy, date, precip, tmin, tmax, snowfall, snowdepth, tmean) VALUES
(?,?,?,?,?,?,?,?,?,?,?,?)' );


my $station_id = '';

while ( <> ) {

# Part 1. Loop through the 11 header lines to identify the station id.
# The station ID has the format of:
# STATION: SOME_STATION, STATE (Station ID: ######)
if ( 1 .. 11 ) {
$station_id = $1 if /\(Station ID:\s*(\S+)\)/;

It seems that this is more flexable. ie not dependent upon a certine number of header lines. Can you translate the f /\(Station ID:\s*(\S+) \)/; part though?

next;
}

# At eof close the input filehandle to reset $.
if ( eof ) {
close ARGV;
next;
}


I think this is suppose to allow the script to jump to the next file. Right?
However, this script also reads the first file into the database, then stops.

# Part 2. Loop through the records and prepare SQL statement.
my ( $year, $month, $day, $precip, $tmin, $tmax, $snowfall, $snowdepth,
$tmean, $obstime, $datasource ) = split;

# Initialize and concatenate date as YYYMMDD.
my $date = $year . $month . $day;

# Initialize and calculate day of the year (doy)
my $doy = Day_of_Year( $year, $month, $day );

# Switch T (trace) to 0.01 and M (missing) to -999
$precip = 0.01 if $precip eq 'T';
for ( $precip, $tmin, $tmax, $snowfall, $snowdepth, $tmean ) {
$_ = -999 if $_ eq 'M';
}


much more efficient. Thankyou.

$sth->execute( $station_id, $year, $month, $day, $doy, $date, $precip,
$tmin, $tmax, $snowfall, $snowdepth, $tmean );
#print join( "\t", $station_id, $year, $month, $day, $doy, $date, $precip,
$tmin, $tmax, $snowfall, $snowdepth, $tmean ), "\n";
}

#$sth->finish();

# Disconntect from database
$dbh->disconnect();

__END__



John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall

--
To unsubscribe, e-mail: beginners-unsubscribe@xxxxxxxx
For additional commands, e-mail: beginners-help@xxxxxxxx
<http://learn.perl.org/> <http://learn.perl.org/first-response>



.



Relevant Pages

  • Re: run script on multiple files
    ... loads it into a postgres database. ... #Instantiate the global station ID variable ... Loop through the records and prepare SQL statement. ... # Initialize and concatenate date as YYYMMDD. ...
    (perl.beginners)
  • Re: run script on multiple files
    ... loads it into a postgres database. ... tried syntax ./program.pl file1 file2, but only the first file gets ... Loop through the 11 header lines to identify the station id. ... # Initialize and concatenate date as YYYMMDD. ...
    (perl.beginners)
  • DLL Initialization failed
    ... "The application failed to initialize because the window ... station is shutting down", it says. ...
    (microsoft.public.windowsxp.basics)
  • HELP!!! DLL Initialization Failed
    ... application failed to initialize because the window ... station is shutting down", it says.What does it mean and ...
    (microsoft.public.windowsxp.general)
  • trouble with a regular expresion
    ... Below is snipit of code that is intended to read in the station_id from the header of each example file. ... In each case the reg expression is supposed to find the 6 digit number within the parentheses. ... Oddly however, if I paste the data portion of the first file, into the 2nd file, then the the reg expression catches the station id in the 2nd example. ... Temp Temp Temp RelH RelH Speed Direc Press Radiat Evap ...
    (perl.beginners)