Re: Speed comparison of regex versus index, lc, and / /i



On Fri, 30 May 2008 14:55:04 +0000, xhoster wrote:

Ben Bullock <benkasminbullock@xxxxxxxxx> wrote:
In a recent discussion on this newsgroup, it was mentioned that "index"
is better for matching fixed strings than using regular expressions.

Yes, it is.

Proof?

If using regex to match fixed strings, you need to worry
about special characters or syntax errors in the regex, like the problem
with the literal string like "[l-c]" which we recently witnessed here.

That was exactly the discussion I was referring to. I said to use a
regex with \Q and \E, and Jurgen Exner said that "index" was the
correct tool for the job. I'm continuing that discussion on a separate
thread, since some time has passed and this is a separate topic from
that. I think Mr Exner was incorrect.

"Better" is a much bigger issue than merely faster.

OK, what is your definition of "better"?

And of course, if you are interested in where the string matches (i.e.
the return value of index, and not just whether or not it is -1) then it
is simpler to get it from index than from a regex.

Really? Please edit the following to show me how:

#!/usr/local/bin/perl
use warnings;
use strict;

sub index_find
{
my ($text, $ss) = @_;
my @finds;
my $found = 0;
while (1) {
$found = index ($text, $ss, $found);
last if $found == -1;
push @finds, $found;
$found += length ($ss);
}
return \@finds;
}

sub regex_find
{
my ($text, $ss) = @_;
my @finds;
while ($text =~ /\Q$ss\E/g) {
push @finds, pos ($text) - length($ss);
}
return \@finds;
}

my $text = <<EOF;
xhoster is the coolest perl programmer ever. xhoster is the
greatest. xhoster is the champion. xhoster is a babe magnet.
EOF
my $ss = "xhoster";

for (\&index_find, \&regex_find) {
print "String found at ", (join ", ",@{&{$_}($text, $ss)}),"\n";
}

It's possible to reduce the while (1) in the first line to something like

while (($found = index ($text, $ss, $found)) != -1) {

of course, but that doesn't make it simpler.

.



Relevant Pages

  • Re: Speed comparison of regex versus index, lc, and / /i
    ... sub index_find ... push @finds, pos - length; ... xhoster is the coolest perl programmer ever. ... use Benchmark qw(cmpthese); ...
    (comp.lang.perl.misc)
  • Re: Speed comparison of regex versus index, lc, and / /i
    ... xhoster is the coolest perl programmer ever. ... sub index_find ... push @finds, $found; ... On my machine and version of Perl a get a speed improvement by using a C style for loop instead: ...
    (comp.lang.perl.misc)
  • Re: Speed comparison of regex versus index, lc, and / /i
    ... In the real situation almost every search for the search string ... xhoster is the coolest perl programmer ever. ... sub index_find ... push @finds, pos - length; ...
    (comp.lang.perl.misc)
  • Re: qr and subroutines
    ... [SNIP] ... Your $tmp_regex is not a regex, ... I shouldn't have referred to the string as "regex", ... sub rework_uri { ...
    (comp.lang.perl.misc)
  • Re: Select specific text in cell
    ... c:filename.ext and the operating system will look in the ... following code (showing a 'backslashless' path reference) prints the first ... End Sub ... Well, as written, the regex would retain the C:. ...
    (microsoft.public.excel.misc)