Re: perl vs Unix grep
From: Giridhar Nandigam (nandi_giri_at_yahoo.com)
Date: 07/07/04
- Next message: Louis: "regular expression on a range of number..."
- Previous message: Eric Enright: "$var = <LINE> ??"
- In reply to: Al Belden: "perl vs Unix grep"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 6 Jul 2004 22:58:51 -0700
Hello Al Baden,
I have had similar problem with getting the index numder of the
element match when we search for elements in an array. It was
fruitless. I used Hash map, but that was a burden on the system. In
another possiable implementation i have done with use of a separate
variable indexCount on array and reintialized evry time.
That's it.
Perl is langauge to make things work at any cost. All the best.
Thanks.
Giridhar Nandigam
"Al Belden" <abelden@comcast.net> wrote in message news:<BvWdnau1FYhmh3vdRVn-gg@comcast.com>...
> Hi all,
> I've been working on a problem that I thought might be of interest: I'm
> trying to replace some korn shell scripts that search source code files with
> perl scripts to gain certain features such as:
>
> More powerful regular expressions available in perl
> Ability to print out lines before and after matches (gnu grep supports this
> but is not availble on our Digital Unix and AIX platforms)
> Make searches case insensitive by default (yes, I know this can be done with
> grep but the shell scripts that use
> grep don't do this)
>
> We're talking about approx. 5000 files spread over 15 directories. To date
> it has proven quite difficult (for me) to match the performance of the Korn
> shell scripts using perl scripts and still obtain the line number and
> context information needed. The crux of the problem is that I have seen the
> best performance from perl when I match with the /g option on a string that
> represents the current slurped file:
>
> local $/;
> my $curStr = <FH>;
> while ($curStr =~ /$compiledRegex/g)
> {
> # write matches to file for eventual paging
> }
>
> This works well except that when each match is found I need the line number
> the match has been found in. As far as I can tell from reading and research
> there is no variable that holds this information as I am not reading from
> the file at this point. I can get the information in other ways such as:
>
> 1. Reading each file a line at a time, testing for a match and keeping a
> line counter or using $NR.
> 2. Reading the file into an array and processing a line at a time
> 3. Creating index files for the source files that store line offsets and
> using them with the slurp method in the
> paragraph above
> 4. Creating an in-memory index for each file that contains a match and using
> it for subsequent matches in that file
>
> 1, 2 and 4 above suffer performance degradation relative to unix grep. #3
> provides good performance and is the method I am currently using but it
> requires creating and maintaining index files. I was wondering if I could
> tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
> $. would contain the current line number as the file would be read as the
> loop is traversed. Any other ideas would be welcome
>
> Al
- Next message: Louis: "regular expression on a range of number..."
- Previous message: Eric Enright: "$var = <LINE> ??"
- In reply to: Al Belden: "perl vs Unix grep"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|