Re: Searching large files with a regex and a list
- From: "Channing" <channing.clark@xxxxxxxxx>
- Date: 31 May 2006 13:26:17 -0700
Brian McCauley wrote:
Channing wrote:
I would like some suggestions (constructive) on some code I'm writing.
My Perl is rusty and that's reflected in the sample I'm posting. Here
is what I have to tackle. I have Gig files to parse for two different
RegEx's. Within those RegEx's there is a variable that is a list of
18,000+ numbers. I'm looking for some suggestions on what I can do to
speed things up, or at least make things better.
Thanks in advance for your time.
------- Code Begin ---------
#!/usr/bin/perl
my $match=0;
my $nonMatch=0;
open(DN_LIST, "<","big_list");
my @list = <DN_LIST>;
@list=sort(@list);
close(DN_LIST);
foreach (@list)
{
chomp;
s/ //g;
}
@list = join('|',@list);
Joining multiple RegEx into one like this is _less_ efficient than
simply looping over @list, which is why the answer given in the FAQ
(yes, your question is a FAQ) does not suggest doing so. (It does
suggest using qr// to precompile the RegEx though...
$_=qr/$_/; # Inside your loop
Well, I tried a number of the suggestions. The best combination (of
what I tried) is posted below. This took the runtime from 2 hours to
1.5 minutes! In a nutshell, the suggestion to use a hash in-place of
the RegEx was the break-through. Thanks to all for their time and
contribution to the list!
Regards,
Channing
----- Code Begin -----
#!/usr/bin/perl
my $nonMatched=0;
my $matched=0;
my %dnList;
my $dnFile = "big_list";
open(DN_LIST, "<","${dnFile}") or die "Cannot open ${dnFile} $!";
my @list = <DN_LIST>;
close(DN_LIST);
foreach (@list)
{
chomp;
s/ //g;
${dnList{"$_"}} = 1;
}
while (<>)
{
if ( ( /^123456/o and (exists $dnList{substr($_,14,10)})) or
( /^9876(21|99)/o and (exists $dnList{substr($_,29,10)})) )
{
$matched++;
}
else
{
$nonMatched++;
}
}
print "Matched:" . ${matched} . "\n";
print "Non-Matched:" . ${nonMatched} . "\n";
----- Code Ends -----
.
- References:
- Searching large files with a regex and a list
- From: Channing
- Re: Searching large files with a regex and a list
- From: Brian McCauley
- Searching large files with a regex and a list
- Prev by Date: Re: safe-module and namespaces
- Next by Date: Re: Negated Perl Regexp, Howabout qr in Modules?
- Previous by thread: Re: Searching large files with a regex and a list
- Next by thread: FAQ 4.29 How can I count the number of occurrences of a substring within a string?
- Index(es):
Relevant Pages
|