a regex question (sample code with comments provided)

From: alex mikhail (alexm_at_factory7.com)
Date: 12/03/03


Date: 2 Dec 2003 20:52:39 -0800

I've been trying to parse blurbs of text formatted with HTML tags and
make it look pretty in plain text to the human eye. Yeah, it wasn't
my idea. Are there any people out there who muck with this? Any of
you work with sophisticated search engines. Do any of you know where
I can get insight on this.

In this case I'd like to get the equivalent of (match A and not match
B) then replace A. However, I've got several of these and I need to
do quite a bit of extensive processing. Although I regret starting
this in PHP. I'm sure I could do the necessary work with a lot of
regexs and some conditional statements, but it would actually get
quite ugly. Actually, I got kind of curious about some mathematical
questions as well.

Pardon me if I'm rambling. However, I think the theoretical computer,
"regular expressions", are nothing more than string matchers and they
can't recognize strings like (a^n)(b^n). In short they can't loop
(i.e. the pumping lemma). However, Perl's regex are more powerful.
You can do (match (a OR b)), just not ((match a AND (not match b)).
Also you can't manipulate the later into the former via DeMorgan's
laws and get it parse (i.e. compile) properly. The later bugs me a
little.

Is this right or is there a way to (match A and not match B) in one
regex? Does Perl allow you to nest subexpressions () inside character
classes []? Any suggestions or links to info on the topic via PHP or
otherwise would be nice. I'm finding a lack of sophisticated material
on the topic. However, I haven't researched this too long.

I created an toy example below:

<html>
<body>

<?php
////////////////////////////////
//Alex Mikhail
//
//12/2/2003
//
//I'm attempting to test the limit of regex to solve a ugly parsing
problem
//what can perl regex do?
//KEYWORDS: regex perl regular expression php preg_match preg_replace
// Warning: Compilation failed: unmatched parentheses at offset
//
//php 4.2.2
////////////////////////////////

$test="abcdefg";
echo("before test A or B: ".$test."<br>\n");

//regex A:
//regex='/[^([^a])]/';
//this regex results in an error: php or PCRE doesn't like
subexpressions
//(i.e. parenthesis) inside character classes (i.e. brackets)

//results in Error
//$result=preg_replace($regex,"",$test);

//echo("after test A: ".$result."<br>\n");

//same as [^a] what kind of algebraic property is this??
$regex='/([^a])([^a])\1/';
//NOTE: the result is :abcdefg before and after test (like an
identity)
//i.e. it's not demorgan's law and it's not recursive

//results is "abcdefg", (i.e. no change)
$result=preg_replace($regex,"",$test);

echo("after test B: ".$result."<br>\n");

?>
</body>
</html>



Relevant Pages

  • RE: [PHP] Negative Look Ahead Regex
    ... [PHP] Negative Look Ahead Regex - Code Mistake or Bug? ... I tried your suggestion and it does prevent the SID from being inserted ... expression it could find that fulfills your negative lookahead, ...
    (php.general)
  • RE: [PHP] Negative Look Ahead Regex - Code Mistake or Bug?
    ... instead you put it together with the session id. ... [PHP] Negative Look Ahead Regex - Code Mistake or Bug? ... The regex is for inserting a SID into every link in the ...
    (php.general)
  • RE: [PHP] Need help with RegEx
    ... echo $matches; ... [PHP] Need help with RegEx ... global $FoundStatusTag; ...
    (php.general)
  • Re: Can this be RegEx, or do I have to go DOM?
    ... My RegEx skill leave much to be desired, I don't know how to capture data ... see the PHP pattern syntax docs: ... page, I find it easier, if possible, to first isolate the section I'll ...
    (comp.lang.php)
  • Re: Complex regex help
    ... The regex doesn't look right at all, ... bold tags has no effect, and the] in the character class needn't be escaped. ... which will return all the text between the HTML tags, but this will fall down if ...
    (perl.beginners)