Re: Filehandles Referenced with a Variable

From: Mike Flannigan (mikeflan_at_earthlink.net)
Date: 03/27/04


Date: Sat, 27 Mar 2004 03:00:30 GMT


Eric Schwartz wrote:

> > foreach $st (@states) {
> > open *$st, ">$st.txt" or die "Cannot open $st.txt: $!"; # THIS IS
> > LINE 16
> > }
>
> @states is an array of strings. So when you try to treat $st as if
> it's a reference to a filehandle, when it's a string, naturally it
> says, "Can't use string as a symbol ref".

I got rid of the quotes:
my @states = (AK, AL, AR, AS, AZ, CA, etc );
but obviously that is not going to change anything. It still works
with 'use strict' commented out, but otherwise gives error:
Bareword "AK" not allowed while "strict subs" in use at line 10.

> Furthermore, you open all
> these files, and then do nothing with them. I don't get it-- are you
> just trying to see if you can open the files without actually opening
> them? If so, then you can replace that with:
>
> map { -r "$_.txt" or die "Cannot open $_.txt: $!" } @states;

Yeah, that is just a script in development. The records contain a
field like this: GA, Georgia; FL, Florida

In that case I would write that record to the GA file and the FL
file. Most records only have one state listed in that field.

My script works now, so there isn't much need to spend any
time on it. Well, I say it works. Actually, the only time I have
run it, it got about 1/3 of the way through the 1.7 million
records and locked up my machine. But I expect it will
get all the way through - maybe the next time I run it.

I'm on Win2000.

>
> You can just use $. (perldoc perlvar)

Yeah. I haven't gotten used to using those special variable
too much yet.

> > print "Did not find good pattern at line # $num.\n$line\n\n" unless
> > ($line =~ m/\t([^\t]*)\t[^\t]*\n$/);
> > $stm = $1;
>
> This variable could stand to be named more verbosely-- I have no idea
> what a 'stm' is, and your code doesn't help me much.

Sorry, it's the state field I described above. I think I first
thought of state match.

> > print "problem with record # $num\n$line\n\n" unless (@sttemp = grep
> > $line =~ m/$_, /i, @states);
>
> FYI, this error message doesn't help much-- it says there's a problem,
> which is good, but not what the problem is, or how to fix it. I'm on
> a campaign recently to get better error messages, so you may take this
> with a grain or two of salt. Also, since @sttemp appears to be an
> array with the names of the states found in $line, perhaps a name like
> @statesfound might help a reader understand better what you're trying
> to do with it.
>
> > # print "Did not find 2 character state at line # $num.\n$line\n\n"
> > unless ($stm =~ m/^(\w{2}),[^;]*\n?$/);
>
> I think this unless should be commented out as well; it doesn't
> actually do anything as written.

Right, that is just a word wrap problem. Not sure why usenet
wraps when other e-mail lists don't.

>
> > foreach $st (@sttemp) {
> > $count{$st}++;
> > if ($st eq 'DC') {
> > print { *$st } "$line";
> > }
>
> This print line is probably going to cause you trouble as well; you're
> confusing symrefs and strings again. Also, don't quote $line there;
> it's unnecessary, and could cause trouble later.

Thanks. I didn't have $line quoted at first, but changed it to
quoted mainly out of habit. It's not quoted now.

>
> In this case, $st is a string, 'DC'. You're trying to print to a
> filehandle, which I imagine you thought you were opening on line 16
> above. If you're only printing to 'DC.txt', then maybe you should
> open it just before that loop, and close it afterwards:
>
> open my $dc_count, '>', "DC.txt" or die "bah: $!";
> foreach(...) {
> if(...) {
> print $dc_count $line;
> }
> close $dc_count or die "grr: $!";
>
> If you're going to be doing this for more than one state later on,
> maybe your loop on line 16 should use a hash instead:
>
> my %state_files;
> foreach my $state (@states) {
> open $state_files{$state}, '>', "$state.txt" or die "bah: $!"
> }
>
> Then you can do something like:
>
> foreach my $state (@sttemp) {
> $count{$state}++;
> print $state_files{$state} $line; # or maybe
> print $state_files{$state} $line if $state eq 'DC';
> }
>
> and at the end of the program, of course:
>
> foreach my $state(keys %state_files) {
> close $state_files{$state} or die "bah: $!";
> }
>
> > }
> > print "$num\n" if $num % 50000 == 0;
> > }
>
> Again, you can use $. here.
>
> -=Eric
>

I write to all 60 files at once, basically, as you can see below.

Thanks for all your help.

This is what I am running now. It works for 300,000+ record
files, but got hung up on my 1.7 MM file. If I really have to, I'll
break my file up and run append writes. BTW, this thing
is going to take almost an hour to run through all 1.7 MM
records, if it ever does :-)

#use strict;
use warnings;

my $infile = 'USAshorter.txt';

my $fileout = 'out.txt';
my ($line, $stm, $state, $st, @sttemp, %count, $k, $v);
my $num=0;
my $prn=0;
my @states = (AK, AL, AR, AS, AZ, CA, CO, CT, DC, DE, FL, FM, GA, GU, HI, IA,
ID, IL, IN, KS, KY, LA, MA, MD, ME, MH, MI, MN, MO, MP, MS, MT, NC, ND, NE,
NH, NJ, NM, NV, NY, OH, OK, OR, PA, PR, PW, RI, SC, SD, TN, TX, UM, UT, VA,
VI, VT, WA, WI, WV, WY);

chdir 'C:\Copy2';

foreach $st (@states) {
    open *$st, ">$st.txt" or die "Cannot open $st.txt: $!";
}

open OUT, ">", $fileout or die "Cannot open $fileout: $!";

open INPUT, "<", $infile or die "Cannot open $infile: $!";

while ($line = <INPUT>) {
    $num++;
    print OUT "Did not find good pattern at line # $num.\n$line\n\n" unless
($line =~ m/\t([^\t]*)\t[^\t]*\n$/);
    $stm = $1;

    print OUT "problem with record # $num\n$line\n\n" unless (@sttemp = grep
$line =~ m/$_, /i, @states);

    foreach $st (@sttemp) {
        print { *$st } $line;
        $prn++;
    }
    print "$num\n" if $num % 50000 == 0;
}

foreach $st (@states) {
    close *$st;
}

print "\n\nProcessed $num lines.\nBut printed $prn times.\n";
print OUT "\n\nProcessed $num lines.\nBut printed $prn times.\n";

close INPUT;
close OUT;

__END__