Re: Searching large files with a regex and a list
- From: "John W. Krahn" <someone@xxxxxxxxxxx>
- Date: Wed, 31 May 2006 04:23:15 GMT
Channing wrote:
I would like some suggestions (constructive) on some code I'm writing.
My Perl is rusty and that's reflected in the sample I'm posting. Here
is what I have to tackle. I have Gig files to parse for two different
RegEx's. Within those RegEx's there is a variable that is a list of
18,000+ numbers. I'm looking for some suggestions on what I can do to
speed things up, or at least make things better.
------- Code Begin ---------
#!/usr/bin/perl
my $match=0;
my $nonMatch=0;
open(DN_LIST, "<","big_list");
my @list = <DN_LIST>;
@list=sort(@list);
close(DN_LIST);
foreach (@list)
{
chomp;
s/ //g;
}
@list = join('|',@list);
while (<>)
{
if ( /^123456\d{8}($list[0])/o or /^9876(91|92)\d{24}($list[0])/o )
{
$match++;
}
else
{
$nonMatch++;
}
}
print "Match Count:" . ${match} . "\n";
print "Non-Match Count:" . ${nonMatch} . "\n";
------- Code End ---------
According to the FAQ:
perldoc -q "How do I efficiently match many regular expressions at once"
You need to do something like this (UNTESTED):
#!/usr/bin/perl
use warnings;
use strict;
my $match = 0;
my $nonMatch = 0;
open DN_LIST, '<', 'big_list' or die "Cannot open 'big_list' $!";
my @list = map {
chomp;
tr/ //d;
qr/^(?:123456\d{8}|98769[12]\d{24})$_/;
} <DN_LIST>;
close DN_LIST;
LINE:
while ( my $line = <> ) {
for my $regex ( @list ) {
if ( $line =~ /$regex/ ) {
$match++;
next LINE;
}
}
$nonMatch++;
}
print "Match Count:$match\n";
print "Non-Match Count:$nonMatch\n";
__END__
John
--
use Perl;
program
fulfillment
.
- References:
- Searching large files with a regex and a list
- From: Channing
- Searching large files with a regex and a list
- Prev by Date: Re: Searching large files with a regex and a list
- Next by Date: Re: Searching large files with a regex and a list
- Previous by thread: Re: Searching large files with a regex and a list
- Next by thread: Re: Searching large files with a regex and a list
- Index(es):
Relevant Pages
|