Speed comparison of regex versus index, lc, and / /i
- From: Ben Bullock <benkasminbullock@xxxxxxxxx>
- Date: Fri, 30 May 2008 13:04:19 +0000 (UTC)
In a recent discussion on this newsgroup, it was mentioned that
"index" is better for matching fixed strings than using regular
expressions. Coincidentally I've recently been setting up a search
system for a fairly large volume (about 30 megabytes) of text files,
and as a first approximation for the search system I made a simple
routine to open each file and search for the string in the file using
"index".
As a test of the proposition that index is better than regexes, I also
tried using a regex to do the same job. My results were that the
version using regexes was almost identical (within a few percent) in
speed to "index", leading me to think that under the bonnet "index" is
probably just using a regex anyway. Furthermore, the biggest
bottleneck in the code wasn't the pattern matching, but the use of
"lc" to convert the text into lower case for case insensitive
search. I found that saving all the text files as lower case before
doing the matching, rather than converting the strings using lc, saved
more than half of the total execution time, so the difference between
"index" and regexes was not even significant compared to the time
spent converting to lower case. I also found that the "i" option to
the regex similarly meant that the regex ran drastically
slower. Similarly another big bottleneck I identified was conversion
of the files - opening the (utf8 encoded) files with
open my $file, "<:utf8", $filename;
saved about 30% of the total execution time compared to converting the
text after reading it in.
So my conclusion is that "index" isn't necessary and one can always
use regexes - unless anyone can prove otherwise - and Perl's regexes
are so fast that they may not be much of a bottleneck. But why "lc"
should be so slow I don't know.
.
- Follow-Ups:
- Re: Speed comparison of regex versus index, lc, and / /i
- From: xhoster
- Re: Speed comparison of regex versus index, lc, and / /i
- From: John W. Krahn
- Re: Speed comparison of regex versus index, lc, and / /i
- Prev by Date: FAQ 5.4 How can I use Perl's "-i" option from within a program?
- Next by Date: Re: Need help with a simple (I think) Perl script
- Previous by thread: FAQ 5.4 How can I use Perl's "-i" option from within a program?
- Next by thread: Re: Speed comparison of regex versus index, lc, and / /i
- Index(es):
Relevant Pages
|