Pattern match over mutiple files is slow - Help Needed !

From: RV (rvstore1_at_yahoo.com)
Date: 10/22/03


Date: 22 Oct 2003 12:53:10 -0700

Hi:

Am having a huge performance problem in one of my scripts.

I have an array containing some reference keys. ( about 1000 entries
or so ).
I also have a list of files ( about 100 or so ) and I need to locate
occurence of these keys in all of the files and replace with some
value ( lets say the key-value hash is also given ).

My code looks something like this:

#Note: %keyval --> holds the key-value mapping
# @keylist - is the array with the 1000 keys ( like keys %keyval )
# @files - holds the list of files ( about 100 or so ).

foreach $f ( @files )
{
    #open file - validate etc - assume it is opened as <FH>
    while(<FH>) #each line
    {
        $line=$_ ;
        foreach $k (@keylist)
        {
            $line =~ s/$k/$keyval{$k}/ig ; #replace key with value
        } #key loop
    }
    close(FH);
} #foreach

This code works - but its too slow ! -- Obviously I run the inner loop
1000 times for each line in the file.
Constraints being that multiple keys may occur on the same line ( and
even the same key will occur multiple times on the same line ).

I tried globbing the file into a scalar ( unsetting $/ ) - no big
difference in timing.

Can someone help me here ? - If you can give some ideas that I can
look into, I'll greatly appreciate it.
Pseudocode is fine as well.

If you can include a courtesy CC: that would be great !

Thanks - hope I've conveyed my problem accurately ( this among my
first posts - am a frequent "reader" though ! ).

-RV.



Relevant Pages