File size too big for perl processing



Hi, I posted this to perl.beginners as well and will make sure
comments go to both groups.

I have a big file of 16-letter words that I am using as "bait" to
capture larger words in a raw data file. I loop through all of the
rawdata with a single word for 1) matches and 2) to associate the raw
data with the word. I then go to the next line in the word list and
repeat.

hashsequence16.txt is the 16-letter word file (203MB)
rawdata.txt is the raw data file (93MB)

I have a counter in the code to tell me how long it's taking to
process... 9500 hours or so to complete... I definitely have time to
pursue other alternatives.

Scripting with perl is a hobby and not a vocation so I apologize in
advance for ugly code. Any suggestions/comments would be greatly
appreciated.

Thanks,
Dan

========================

print "**fisher**";

$flatfile = "newrawdata.txt";
# 95MB in size

$datafile = "hashsequence16.txt";
# 203MB in size

my $filesize = -s "hashsequence16.txt";
# for use in processing time calculation

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";
open(FILE2, "$datafile") || die "Can't open '$flatfile': $!\n";
open (SEQFILE, ">fishersearch.txt") || die "Can't open '$seqparsed': $!
\n";

@preparse = <FILE>;
@hashdata = <FILE2>;

close(FILE);
close(FILE2);


for my $list1 (@hashdata) {
# iterating through hash16 data

$finish++;

if ($finish ==10 ) {
# line counter

$marker = $marker + $finish;

$finish =0;

$left = $filesize - $marker;

printf "$left\/$filesize\n";
# this prints every 17 seconds
}

($line, $freq) = split(/\t/, $list1);

for my $rawdata (@preparse) {
# iterating through rawdata

$rawdata=~ s/\n//;

if ($rawdata =~ m/$line/) {
# matching hash16 word with rawdata line

my $first_pos = index $rawdata,$line;

print SEQFILE "$first_pos\t$rawdata\n";
# printing to info to new file

}

}

print SEQFILE "PROCESS\t$line\n";
# printing hash16 word and "process"

}
.



Relevant Pages

  • Re: Perl Script runs to slow
    ... $marker is based on the line number and $filesize is based on the number of bytes in the file so this calculation makes no sense. ... # iterating through rawdata ... # matching hash16 word with rawdata line ...
    (perl.beginners)
  • Perl Script runs to slow
    ... for my $rawdata { ... # matching hash16 word with rawdata line ...
    (perl.beginners)
  • Re: File size too big for perl processing
    ... capture larger words in a raw data file. ... rawdata with a single word for 1) matches and 2) to associate the raw ... Xho's hash-based script. ...
    (comp.lang.perl.misc)
  • Re: File size too big for perl processing
    ... rawdata with a single word for 1) matches and 2) to associate the raw ... Well at least you have enough memory to read the files into memory. ... details such as periodic printing statements. ... # matching hash16 word with rawdata line ...
    (comp.lang.perl.misc)