Why is my regex so slow?
- From: carnildo@xxxxxxxxx (Mark Wagner)
- Date: Fri, 31 Oct 2008 12:54:05 -0700
I've got a script I'm using to search through a list of Wikipedia
article titles to find ones that match certain patterns.
As-written, if you run it and supply '.*target.*' on standard input,
it will process my test file in 125 seconds. Make any of the changes
mentioned in the comments, and the time needed will drop to 1.8
seconds. Why the difference? Particularly interesting is that it
seems to matter where the regex pattern came from: if it's from
standard input, testing is slow; if it's assigned in the script,
testing is fast.
If it matters, I'm using Perl 5.8.8.
To see the problem I'm having, download
http://download.wikimedia.org/eswiki/20081018/eswiki-20081018-all-titles-in-ns0.gz
(a 4.1-MB file), unzip it, and run the program supplying the name of
the unzipped file.
Thanks,
Mark Wagner
--------------
binmode STDIN, ":utf8"; # Comment this out to speed things up
while(<STDIN>)
{
my $lines = 0;
my $lines2 = 0;
my $regex;
$regex = $_;
chomp $regex;
#$regex = '.*target.*'; # Or uncomment this to speed things up
open INFILE, "<", $ARGV[0];
binmode INFILE, ":utf8"; # Or comment this out to speed things up
while(<INFILE>)
{
my $target = $_;
chomp $target;
$target =~ s/_/ /g;
print "Match\n" if($target =~ /^$regex$/); # Or make
this case-insensitive to speed things up, or remove the start and end
anchors to speed things up
$lines = $lines + 1;
if($lines >= 10000)
{
$lines = 0;
$lines2 += 10000;
print STDERR "$lines2\r";
}
}
}
.
- Follow-Ups:
- Re: Why is my regex so slow?
- From: Jialin Li
- Re: Why is my regex so slow?
- From: Jialin Li
- Re: Why is my regex so slow?
- Prev by Date: Re: merge 2 or more files together without creating new file
- Next by Date: Re: signal processing INT or TERM
- Previous by thread: Why doesn't this work: perl -e "@s=([1,2],[3,4]); print $s[0][0];"
- Next by thread: Re: Why is my regex so slow?
- Index(es):
Relevant Pages
|
Loading