Re: speed issues with pattern matching and substitution
- From: anno4000@xxxxxxxxxxxxxxxxxxxxxxx (Anno Siegel)
- Date: 5 Apr 2005 10:19:38 GMT
Brian McCauley <nobull@xxxxxxxx> wrote in comp.lang.perl.misc:
> Arturi wrote:
>
> > I wrote some perl lines that scans a file looking for some words and
> > substitute them for their corresponding pairs, which are previously
> > saved in a HASH.
[...]
> If you are concerned about speed you should take the comoposition and
> compilatiion of the regex outside the loop.
>
> my @patterns = map { qr/(\W)($_)(\W)/ } keys %HASH;
>
> local *_; # Wise not to stomp on someone else's $_
>
> while (<FILE>) {
> my @values = values %HASH;
> for my $pattern ( @patterns ) {
> my $value = shift @values;
> s/$pattern/$1$value$3/g;
> }
> # print or whatever
> }
>
> But my spider-sense is telling me that perhaps the keys of %HASH really
> arbirary regexes but are just words and all you want is word
> substitution. This is a classic bit of Perl.
>
> while (<FILE>) {
> s/(\w+)/$HASH{$1}||$1/eg;
> # print or whatever
> }
At first sight that looks rather inefficient because *every* word
is captured and replaced (often by itself). Out of interest I
benchmarked a few variants, and it turns out that it is just as
fast as more selective solutions.
I'm appending the benchmark script, in case anyone cares. The four
variants tested are (least efficient first, the difference between
the last two is insignificant):
direct -- every regex is compiled from the string at run time
regex -- all alternatives compiled into a big regex
brian -- replace every word, like the second solution above
precomp -- regular expressions precompiled via qr//, like the
first solution above
Anno
--
#!/usr/local/bin/perl
use strict; use warnings; $| = 1;
use Vi::QuickFix;
use Benchmark qw( cmpthese);
our @lines = <DATA>;
my %repla; # string replacements (master)
@repla{
qw( one two three four five six seven eight nine ten eleven twelve)
} = qw( ein zwei drei vier fuenf sechs sieben acht neun zehn elf zwoelf);
my @repla = map [ qr/\b$_\b/ => $repla{ $_}], keys %repla; # precompiled
my $repla = do { # single regex
my $re = join '|', map "\\b$_\\b", keys %repla;
qr/($re)/;
};
goto bench;
check:
print brian( @lines);
print "\n", @lines;
exit;
bench:
@lines = ( @lines) x 10; # vary to check linearity
cmpthese -1, {
direct => 'direct( @lines)',
precomp => 'precomp( @lines)',
brian => 'brian( @lines)',
regex => 'regex( @lines)',
};
#######################################################################
sub direct {
my @new = @_;
for ( @new ) {
my ( $str, $rpl);
s/\b$str\b/$rpl/g while ( $str, $rpl) = each %repla;
}
@new;
}
sub precomp {
my @new = @_;
for ( @new ) {
for my $r ( @repla ) {
s/$r->[ 0]/$r->[ 1]/g;
}
}
@new ;
}
sub brian {
my @new = @_;
s/(\w+)/$repla{ $1} || $1/eg for @new;
@new;
}
sub regex {
my @new = @_;
s/$repla/$repla{ $1}/g for @new;
@new;
}
__DATA__
one spot was detected on each of the two blades
three jolly coachmen, three jolly coachmen
twelve to a dozen
a display of oneupmanship
three o'clock four o'clock five o'clock ROCK
.
- References:
- speed issues with pattern matching and substitution
- From: Arturi
- Re: speed issues with pattern matching and substitution
- From: Brian McCauley
- speed issues with pattern matching and substitution
- Prev by Date: FAQ 1.6 What is perl6?
- Next by Date: Re: #!perl ??
- Previous by thread: Re: speed issues with pattern matching and substitution
- Next by thread: [BEGINNER} Question
- Index(es):
Relevant Pages
|