Re: Searching large files with a regex and a list



Channing wrote:
Hello All -

I would like some suggestions (constructive) on some code I'm writing.
My Perl is rusty and that's reflected in the sample I'm posting. Here
is what I have to tackle. I have Gig files to parse for two different
RegEx's. Within those RegEx's there is a variable that is a list of
18,000+ numbers. I'm looking for some suggestions on what I can do to
speed things up, or at least make things better.

Thanks in advance for your time.

------- Code Begin ---------
#!/usr/bin/perl

my $match=0;
my $nonMatch=0;

open(DN_LIST, "<","big_list");
my @list = <DN_LIST>;
@list=sort(@list);
close(DN_LIST);
foreach (@list)
{
chomp;
s/ //g;
}
@list = join('|',@list);


while (<>)
{
if ( /^123456\d{8}($list[0])/o or /^9876(91|92)\d{24}($list[0])/o )
{
$match++;
}
else
{
$nonMatch++;
}
}

print "Match Count:" . ${match} . "\n";
print "Non-Match Count:" . ${nonMatch} . "\n";

------- Code End ---------



You may want to avoid alternation in
the regular expression and just check for
matches against a hash:


use Inline::Files -backup;

my %wanted;
while (<DNLIST>)
{
chomp;
$wanted{$_} = 1;
}


while (<DATA>)
{
my $found_match = 0;
chomp;
if (/^123456\d{8}/gc || /^9876(91|92)\d{24}/gc)
{
our $digits = '';
while (/\G(\d)(?{ $digits .= $1})/g)
{
if (exists $wanted{$digits})
{
$found_match = 1;
print $_, " Matched\n";
last;
}
}
}
unless ($found_match)
{
print $_, " Not Matched\n";
}
}

__DNLIST__
12
345
6789
__DATA__
12345612345678345
00000000000000000
98769212345678901234567890123412
9876911234567890123456789012346789
9876911234567890123456789012340000

--
Hope this helps,
Steven

.



Relevant Pages

  • Searching large files with a regex and a list
    ... I would like some suggestions on some code I'm writing. ... My Perl is rusty and that's reflected in the sample I'm posting. ...
    (comp.lang.perl.misc)
  • Re: Searching large files with a regex and a list
    ... My Perl is rusty and that's reflected in the sample I'm posting. ... perldoc -q "How do I efficiently match many regular expressions at once" ...
    (comp.lang.perl.misc)
  • Re: equivalent of chomp in perl
    ... on Perl, comparable to K&R). ... chomp VARIABLE ... string that doesn't end in a newline. ... char *s; ...
    (comp.lang.c)
  • Re: processing large numbers/values/figures
    ... chomp removes whatever $/ contains ... ... to zero by Perl automagically? ... automagically converts to zero when used in the correct context. ... To initialize the values of a hash you first have to have the keys. ...
    (comp.lang.perl.misc)
  • Re: Delete file if it contains x y or z
    ... Here's another way, but not necessarily the best Perl, but it does work: ... use strict; ... use warnings; ... chomp @filelist; ...
    (perl.beginners)