Re: perl vs Unix grep

From: Giridhar Nandigam (nandi_giri_at_yahoo.com)
Date: 07/07/04


Date: 6 Jul 2004 22:58:51 -0700

Hello Al Baden,

I have had similar problem with getting the index numder of the
element match when we search for elements in an array. It was
fruitless. I used Hash map, but that was a burden on the system. In
another possiable implementation i have done with use of a separate
variable indexCount on array and reintialized evry time.

That's it.
Perl is langauge to make things work at any cost. All the best.
Thanks.
Giridhar Nandigam

"Al Belden" <abelden@comcast.net> wrote in message news:<BvWdnau1FYhmh3vdRVn-gg@comcast.com>...
> Hi all,
> I've been working on a problem that I thought might be of interest: I'm
> trying to replace some korn shell scripts that search source code files with
> perl scripts to gain certain features such as:
>
> More powerful regular expressions available in perl
> Ability to print out lines before and after matches (gnu grep supports this
> but is not availble on our Digital Unix and AIX platforms)
> Make searches case insensitive by default (yes, I know this can be done with
> grep but the shell scripts that use
> grep don't do this)
>
> We're talking about approx. 5000 files spread over 15 directories. To date
> it has proven quite difficult (for me) to match the performance of the Korn
> shell scripts using perl scripts and still obtain the line number and
> context information needed. The crux of the problem is that I have seen the
> best performance from perl when I match with the /g option on a string that
> represents the current slurped file:
>
> local $/;
> my $curStr = <FH>;
> while ($curStr =~ /$compiledRegex/g)
> {
> # write matches to file for eventual paging
> }
>
> This works well except that when each match is found I need the line number
> the match has been found in. As far as I can tell from reading and research
> there is no variable that holds this information as I am not reading from
> the file at this point. I can get the information in other ways such as:
>
> 1. Reading each file a line at a time, testing for a match and keeping a
> line counter or using $NR.
> 2. Reading the file into an array and processing a line at a time
> 3. Creating index files for the source files that store line offsets and
> using them with the slurp method in the
> paragraph above
> 4. Creating an in-memory index for each file that contains a match and using
> it for subsequent matches in that file
>
> 1, 2 and 4 above suffer performance degradation relative to unix grep. #3
> provides good performance and is the method I am currently using but it
> requires creating and maintaining index files. I was wondering if I could
> tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
> $. would contain the current line number as the file would be read as the
> loop is traversed. Any other ideas would be welcome
>
> Al



Relevant Pages

  • Re: Strange behavior when working with large files
    ... > The first time I read a file it will read into the array in about 2 ... The second time I try to read a file in it ... The file I'm reading here consists of 1.5 million lines (50MB all ... that Perl needs as more time as longer the lines are. ...
    (comp.lang.perl.misc)
  • Re: read file backwards
    ... I need to read a file in Perl backwards but - ... I have read tips about reading the file into an array and then reading ... way to read it backwards without having to use any of the above. ...
    (comp.lang.perl.misc)
  • read file backwards
    ... I need to read a file in Perl backwards but - ... I have read tips about reading the file into an array and then reading ... way to read it backwards without having to use any of the above. ...
    (comp.lang.perl.misc)
  • Re: split by word using | as delimiter
    ... > Nut I frequently see beginners *explicitly* reading all the lines in a ... > file into an array, and then iterating over that array, as the OP did ... In any case, as a perl dabbler, I can tell you why some of these ...
    (comp.lang.perl.misc)
  • Re: Learning Perl
    ... it should be an array, ... Then they'd be completely inaccessible to beginners. ... that should be my $var. ... so why is it redundant to point out that Perl is different from C here? ...
    (comp.lang.perl.misc)