Serious Perl Regular Expression deficiency?



I don't see a solution to this problem that
regular expressions can't exclude a string when
processing. It can exclude individual characters
fine. I started doing Perl 2 years ago and have
run into this nagging problem several times.

After extensive read on the Perl docs on re's
(especially in the last 2 days) I have come to the
conclusion that regular expressions have a serious
deficiency. This is serious because the not string
is a fundimental basic logic idea in a search from
a touted master search engine or should be.
To a degree it works with a known subset, but it
won't work to the degree shown below. This is a
serious flaw in regualar expressions!

I hope you masters can prove me wrong! I really do.
If not I would hope that the Perl authors can provide
some insight on when this construct can be fixed,
aka implemented.

Beat this code if you can (you can't). Don't look
at the code in this example, look instead at the
output.
Don't comment on any code syntax because thats not
welcome or the point.
Instead, refer you comments to the output ID's.

If you know of a way Perl regex can do this
please reply. I'm almost %99 sure Perl regex
can't do this. In fact the %1 is thrown out here
to either verify that or prove otherwise.

Thanks for your help...



print <<EOM;
\n# Serious Regular Expression deficiency,
# "not string", shown by XML comments..
# ----------------------------------------
EOM

use strict;
use warnings;

my $gabage1 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';

my $gabage2 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';

my @sarrys = ($gabage1, $gabage2);
my $cnt = 1;
foreach my $xml (@sarrys) {
print "\n\n","/"x40,"\nXML $cnt:\n$xml\n";
# -------------
$_ = $xml;
print "="x40,
"\n** regex: s/<!--(.*)-->//s\n",
"-"x40,"\n";
print "id: $cnt","1\n";
while (s/<!--(.*)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--([^<>]*)-->//s\n",
"-"x40,"\n";
print "id: $cnt","2\n";
while (s/<!--([^<>]*)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--([\\w\\s]*)(?!<!--)-->//s\n",
"-"x40,"\n";
print "id: $cnt","3\n";
while (s/<!--([\w\s]*)(?!<!--)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--(.*)(?!<!--)-->//s\n",
"-"x40,"\n";
print "id: $cnt","4\n";
while (s/<!--(.*)(?!<!--)-->//s) { print "$1\n"; }
$cnt++;
}
__END__

C:\Drvs14\PerlMiscTest\Eraser\ESP\XMLP>perl test.pl

# Serious Regular Expression deficiency,
# "not string", shown by XML comments..
# ----------------------------------------


////////////////////////////////////////
XML 1:

<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>

========================================
** regex: s/<!--(.*)-->//s
----------------------------------------
id: 11
howdy folks -->
<in2>jjjj</in2>
<!-- and still more

========================================
** regex: s/<!--([^<>]*)-->//s
----------------------------------------
id: 12
howdy folks
and still more

========================================
** regex: s/<!--([\w\s]*)(?!<!--)-->//s
----------------------------------------
id: 13
howdy folks
and still more

========================================
** regex: s/<!--(.*)(?!<!--)-->//s
----------------------------------------
id: 14
howdy folks -->
<in2>jjjj</in2>
<!-- and still more


////////////////////////////////////////
XML 2:

<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>

========================================
** regex: s/<!--(.*)-->//s
----------------------------------------
id: 21
howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more

========================================
** regex: s/<!--([^<>]*)-->//s
----------------------------------------
id: 22
and still more

========================================
** regex: s/<!--([\w\s]*)(?!<!--)-->//s
----------------------------------------
id: 23
and still more

========================================
** regex: s/<!--(.*)(?!<!--)-->//s
----------------------------------------
id: 24
howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more


.



Relevant Pages

  • Re: Serious Perl Regular Expression deficiency?
    ... I started doing Perl 2 years ago and have ... > conclusion that regular expressions have a serious ... This is serious because the not string ... If you want to pull out the contents of XML comments you could do this. ...
    (comp.lang.perl.misc)
  • Re: perl question
    ... that an exact string match is only ... function will be more efficient than regular expression matching. ... For a gentle introduction to regular expressions, ... You may find that the Perl ...
    (comp.os.vms)
  • Re: Serious Perl Regular Expression deficiency?
    ... I started doing Perl 2 years ago and have ... >> conclusion that regular expressions have a serious ... This is serious because the not string ... >If you want to pull out the contents of XML comments you could do this. ...
    (comp.lang.perl.misc)
  • Re: Regular Expressions in C
    ... I learned some Perl and I read the Llamabook. ... > from a string provided by the user using regular expressions and match ... maybe you want more than regular expressions. ...
    (comp.os.linux.development.apps)
  • Re: Serious Perl Regular Expression deficiency?
    ... > regular expressions can't exclude a string when ... It can exclude individual characters ... howdy folks %SYSTEM is down ...
    (comp.lang.perl.misc)