Re: Searching large files with a regex and a list
- From: "attn.steven.kuo@xxxxxxxxx" <attn.steven.kuo@xxxxxxxxx>
- Date: 30 May 2006 21:53:08 -0700
Channing wrote:
Hello All -
I would like some suggestions (constructive) on some code I'm writing.
My Perl is rusty and that's reflected in the sample I'm posting. Here
is what I have to tackle. I have Gig files to parse for two different
RegEx's. Within those RegEx's there is a variable that is a list of
18,000+ numbers. I'm looking for some suggestions on what I can do to
speed things up, or at least make things better.
Thanks in advance for your time.
------- Code Begin ---------
#!/usr/bin/perl
my $match=0;
my $nonMatch=0;
open(DN_LIST, "<","big_list");
my @list = <DN_LIST>;
@list=sort(@list);
close(DN_LIST);
foreach (@list)
{
chomp;
s/ //g;
}
@list = join('|',@list);
while (<>)
{
if ( /^123456\d{8}($list[0])/o or /^9876(91|92)\d{24}($list[0])/o )
{
$match++;
}
else
{
$nonMatch++;
}
}
print "Match Count:" . ${match} . "\n";
print "Non-Match Count:" . ${nonMatch} . "\n";
------- Code End ---------
You may want to avoid alternation in
the regular expression and just check for
matches against a hash:
use Inline::Files -backup;
my %wanted;
while (<DNLIST>)
{
chomp;
$wanted{$_} = 1;
}
while (<DATA>)
{
my $found_match = 0;
chomp;
if (/^123456\d{8}/gc || /^9876(91|92)\d{24}/gc)
{
our $digits = '';
while (/\G(\d)(?{ $digits .= $1})/g)
{
if (exists $wanted{$digits})
{
$found_match = 1;
print $_, " Matched\n";
last;
}
}
}
unless ($found_match)
{
print $_, " Not Matched\n";
}
}
__DNLIST__
12
345
6789
__DATA__
12345612345678345
00000000000000000
98769212345678901234567890123412
9876911234567890123456789012346789
9876911234567890123456789012340000
--
Hope this helps,
Steven
.
- References:
- Searching large files with a regex and a list
- From: Channing
- Searching large files with a regex and a list
- Prev by Date: Re: Searching large files with a regex and a list
- Next by Date: Re: How to match characters in different locations within string
- Previous by thread: Re: Searching large files with a regex and a list
- Next by thread: Re: Searching large files with a regex and a list
- Index(es):
Relevant Pages
|
|