Re: File size too big for perl processing



Cheez <danieldharkness@xxxxxxxxx> writes:

print "**fisher**";

$flatfile = "newrawdata.txt";
# 95MB in size

$datafile = "hashsequence16.txt";
# 203MB in size

my $filesize = -s "hashsequence16.txt";
# for use in processing time calculation

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";
open(FILE2, "$datafile") || die "Can't open '$flatfile': $!\n";
open (SEQFILE, ">fishersearch.txt") || die "Can't open '$seqparsed': $!
\n";

@preparse = <FILE>;
@hashdata = <FILE2>;

close(FILE);
close(FILE2);


for my $list1 (@hashdata) {

If you're looping through $datafile one line at a time, there's no
need to read the whole thing into RAM at once. Just leave the file
open, and use a while() loop to read one line at a time instead:

while (my $list1 = <FILE2>) {

# iterating through hash16 data

$finish++;

if ($finish ==10 ) {
# line counter

$marker = $marker + $finish;

$finish =0;

$left = $filesize - $marker;

printf "$left\/$filesize\n";
# this prints every 17 seconds
}

($line, $freq) = split(/\t/, $list1);

for my $rawdata (@preparse) {
# iterating through rawdata

$rawdata=~ s/\n//;

Chomp() is a faster way to remove newlines:

chomp($rawdata);

if ($rawdata =~ m/$line/) {
# matching hash16 word with rawdata line

my $first_pos = index $rawdata,$line;

Index() will scan the string a second time. There's no need to do
that, since the position of the matched expressions are already stored
in @-:

my $first_pos = $-[0];

sherm--

--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net
.



Relevant Pages

  • Re: File size too big for perl processing
    ... rawdata with a single word for 1) matches and 2) to associate the raw ... This substitution only needs to be done once, not for every @hashdata. ... Anyway, I'd write it to load hashdata into a hash, and then ... chomp $rawdata; ...
    (comp.lang.perl.misc)
  • Re: Perl Script runs to slow
    ... for my $list1 (@hashdata) { ... for my $rawdata (@preparse) { ... There's also a waste when you first use a regex to locate the substring and then ...
    (perl.beginners)