Re: A loop to parse a large text file--output is empty!



Michael Oldham am Freitag, 23. Juni 2006 18:20:
Hello again,

Hello Michael

Thanks to everyone for their helpful suggestions. I finally got it to
work, using the following script. However, it takes about 5 hours to
run on a fast computer. Using grep (in bash), on the other hand, takes
about 5 minutes (see below if you are interested). Thanks again!

SLOW perl script:

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID_all_X';

unless (open(IDFILE, $IDs)) {
print "Could not open file $IDs!\n";
}

my $probes = 'HG_U95Av2_probe_fasta';

unless (open(PROBES, $probes)) {
print "Could not open file $probes!\n";
}

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

There are at least two reasons for the slowlyness in the following nested
loop:

- thousands of regexes applied for each line
- if a line is selected, further regexes are applied, although
not necessary anymore

A faster strategy would be:

1. create a lookup hash with the IDs of IDFILE (IDs as keys)
2. in the while loop, first extract the string you want to test
for selection from the line.
Use split or a single capturing regex for this.
perldoc perlre
perldoc -f split
3. instead of the foreach loop below, simply use a single test
if the extracted string is a key in the lookup hash.
( print OUT $line if exists $lookup_hash{$extracted_string) )

(sorry, not much time left...)

my @ID = <IDFILE>;
print @ID;
chomp @ID;

while (my $line = <PROBES>) {
foreach my $identifier (@ID) {
if($line=~/^>probe:\w+:$identifier:/) {
print OUT $line;
print OUT scalar(<PROBES>);
}
}
}
exit;
[...]

Hope this helps
Dani
.



Relevant Pages

  • Re: how to realize this?
    ... But I wanted the script to finish a full foreach loop before exit. ...
    (comp.lang.tcl)
  • Re: Report in FM 6 or 7
    ... achieve this report format will have to be done via scripting. ... but they will simplify the script and Sort Order. ... End Loop ... The normal 'Dancer' field is used so that if there are more than one ...
    (comp.databases.filemaker)
  • Re: recreate database script not work
    ... I got a script which is supposed to regenerate database systax. ... REM gen_dbse_9.sql ... end loop; ...
    (comp.databases.oracle.server)
  • Re: Loop Problem (At Least I think)
    ... If there are no images matching the product ... "" Then" statement (the one below the loop, outside of it), is this just ... >> script to finish. ... > 2) Read the database table and acess the dictionary object for the ...
    (microsoft.public.scripting.vbscript)
  • Re: How to set up a global variable in a sub-routine?
    ... the 'global variables' are more like constants ... ... single script, I need a way to tell it only once where the file is and ... So, if you construct a loop with a loop variable, you ... >programmers avoid global variables completely. ...
    (perl.beginners)