Re: How to reinvent grep in perl?



siegfried wrote:

?????? wrote:

siegfried wrote:

I need to search large amounts of source code and grep is not doing the
job. The problem is that I keep matching stuff in the comments of the
C++/Java/Perl/Groovy/Javascript source code.

Can someone give me some hints on where I might start on rewriting grep
in perl so that it ignores the contents of /* and */ comments?

Instead of rewriting grep, consider writing a comment filter. Have it
read from standard input and write to standard output; pipe the file
that you want to grep into it, and pipe its output into grep.

Thanks, but if I am piping from stdin to stdout I see two problems:

(1) how do I implement the -n flags that tell me the line number and
file name where the matches are

(2) how do I make two passes: one to strip out the comments (and
preserve the original line breaks so I don't screw up the line
numbers) and the other to actually search for what I am looking for?


The only way I can see to do this is to make three passes:

Pass #1: prepend the file name and current line number on to the
beginning of each line (is there a way to interrogate stdin to get the
file name?) So on a path with a long file and path name, that could
easly double the memory requirement to store all that stuff
redundantly on each line.

Pass #2: change all comments to spaces except new-lines
Pass #3: search for the pattern and print the line it is found on

Now I could do this with pipes and 3 different instances of perl
running at the same time. Is there a better way?

So am I concerned about memory problems? The worst files are 16K lines
long and consume a megabyte. I'm running windows with 2GB RAM. Should
I be concerned about making multiple in memory passes over a 1MB
string (that becomes a 3MB string after I prepend the file name and
line number to the beginning of every line)? How can I write to a
string instead stdout and make an additional pass using the technique
described in "perldoc -q comments".

Now I have queried this mailing list previously when I had a scraper
that ran for six hours scraping web sites. If I recall correctly,
perl's memory management was a bit of a problem. Will perl recycle my
memory properly if I keep using the same 3MB string variables over and
over again?

How do I read an entire file into a string? I know how to do it record
by record. Is there a more efficient way?


Here is one way to do what you want:


local $/; # slurp whole file

while ( @ARGV ) {

$_ = <>; # get current file - puts file name in $ARGV

# remove C/C++ comments (based on perlfaq)
# and remove everything except newlines

s!(/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*)|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)! defined $3 ? $3 : do { ( my $x = $1 ) =~ y/\n//cd; $x } !gse;

my $line_num;

for my $line ( split /\n/ ) {

++$line_num;

print "File name: $ARGV Line number: $line_num Line: $line\n"
if $line =~ /grep pattern/;

}

}




John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
.



Relevant Pages

  • Re: Extract Numeric values from string
    ... I dont want to use split caue ... The string apparently contains 112M values. ... perl has a policy of trading memory for speed. ...
    (comp.lang.perl.misc)
  • Re: memory allocation in perl
    ... There is no explicit memory ... >> allocation in perl (well, actually, you can treat a string as a chunk ... > I want to whether we have some sort of allocation method as we have in C ... In perl there are only two basic ways to allocate variables. ...
    (perl.beginners)
  • RE: How to reinvent grep in perl?
    ... Instead of rewriting grep, ... that you want to grep into it, and pipe its output into grep. ... So am I concerned about memory problems? ... string (that becomes a 3MB string after I prepend the file name and ...
    (perl.beginners)
  • Re: Memory issues
    ... Please keep in mind that perl's memory allocation strategy in general is ... Perl algorithms tend to exchange RAM for speed in most cases ... length in bytes in a unicode string is a pretty uncommon use-case, ...
    (comp.lang.perl.misc)
  • Re: perl 5.8.3 on windows XP eat an huge amount of memory?
    ... Definitely a moron. ... arithmatic operator - it is a string operator. ... It tells Perl to ... that can take up a lot of memory! ...
    (comp.lang.perl.misc)