Re: Serious Perl Regular Expression deficiency?



robic0 wrote:
> I don't see a solution to this problem that
> regular expressions can't exclude a string when
> processing. It can exclude individual characters
> fine. I started doing Perl 2 years ago and have
> run into this nagging problem several times.
>
> After extensive read on the Perl docs on re's
> (especially in the last 2 days) I have come to the
> conclusion that regular expressions have a serious
> deficiency. This is serious because the not string
> is a fundimental basic logic idea in a search from
> a touted master search engine or should be.
> To a degree it works with a known subset, but it
> won't work to the degree shown below. This is a
> serious flaw in regualar expressions!
>
> I hope you masters can prove me wrong! I really do.
> If not I would hope that the Perl authors can provide
> some insight on when this construct can be fixed,
> aka implemented.
>
> Beat this code if you can (you can't). Don't look
> at the code in this example, look instead at the
> output.
> Don't comment on any code syntax because thats not
> welcome or the point.
> Instead, refer you comments to the output ID's.
>
> If you know of a way Perl regex can do this
> please reply. I'm almost %99 sure Perl regex
> can't do this. In fact the %1 is thrown out here
> to either verify that or prove otherwise.
>

Its not clear what "this" is. Are you asking if perl can do a negative
match on a string, pull out XML comments with a regex, or both?

If you are wondering about a negative string match, look at the perlre
documentation, specifically negative lookahead and lookbehind
assertions.

If you want to pull out the contents of XML comments you could do this.


sub test_xml_comment_parse {
my ($xml) = @_;
print "XML\n", '-' x 40, "\n", $xml, "\n", '-' x 40, "\n";
while ($xml =~ s/<!--(.*?)-->//ms) {
print "Comment [$1]\n"
}
print "\n", '-' x 40, "\n\n\n";
}

my $gabage1 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';

my $gabage2 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';

test_xml_comment_parse($_) foreach ($gabage1,$gabage2);

output:

XML
----------------------------------------

<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>

----------------------------------------
Comment [ howdy folks ]
Comment [ and still more ]

----------------------------------------


XML
----------------------------------------

<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>

----------------------------------------
Comment [ howdy folks %SYSTEM is down <who cares?> ]
Comment [ and still more ]

----------------------------------------







There is a problem though. If you need to retrieve data from xml
documents, you should generally use an XML parser instead of using your
own regular expressions.

Here is 1 case where the code I posted above would pull out the text
"not really a comment", that isn't really a comment.

<test_xml>
<value>
<![CDATA[ <!-- not really a comment --> ]]>
</value>
</test_xml>

.



Relevant Pages

  • Serious Perl Regular Expression deficiency?
    ... After extensive read on the Perl docs on re's ... conclusion that regular expressions have a serious ... This is serious because the not string ... howdy folks --> ...
    (comp.lang.perl.misc)
  • Re: Serious Perl Regular Expression deficiency?
    ... I started doing Perl 2 years ago and have ... >> conclusion that regular expressions have a serious ... This is serious because the not string ... >If you want to pull out the contents of XML comments you could do this. ...
    (comp.lang.perl.misc)
  • Re: perl question
    ... that an exact string match is only ... function will be more efficient than regular expression matching. ... For a gentle introduction to regular expressions, ... You may find that the Perl ...
    (comp.os.vms)
  • Re: Reasons for preferring Lisp, and for what
    ... >> can then apply the other Perl tools to. ... > Trying to parse HTML or XML with regular expressions is really a bad ...
    (comp.lang.lisp)
  • Re: Regular Expressions in C
    ... I learned some Perl and I read the Llamabook. ... > from a string provided by the user using regular expressions and match ... maybe you want more than regular expressions. ...
    (comp.os.linux.development.apps)