RE: [PHP] Need help with RegEx





At 08:29 AM 12/11/2006 , Brad Fuller wrote:

The example provided didn't work for me. It gave me the same string without
anything modified.

You are absolutely correct, this is what I get for not testing it explicitly :( My most sincere apologies to the OP and the list, there is an error in my example (see below for correction)

**** I have cut and pasted from further down in the quoted message, for convenience ****
Using the tags you describe here, and assuming the source html is in the
variable $source_html, try this:

$trans_text = preg_replace("/(.*?)(<div id=result_box
dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html);

The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version:

$trans_text = preg_replace("/(.*?)(<div id=result_box dir=ltr>)(.*?)(<\/div>)(.*?)/s","$3",$source_html);
***** end of pasted section *****



I am also looking for this solution to strip out text from some XML response
I get from posting data to a remote server. I can do it using substring
functions but I'd like something more compact and portable. (A one-liner
that I could modify for other uses as well)

Example 1:
<someXMLtags>
<status>16664 Rejected: Invalid LTV</status>
</someXMLtags>

Example 2:
<someXMLtags>
<status>Unable to Post, Invalid Information</status>
</someXMLtags>

I want what is inside the <status> tags.

Does anyone have a working solution how we can get the text from inside
these tags using regex?

Much appreciated,

B

-----Original Message-----
From: Michael [mailto:michael@xxxxxxxxxxxxxx]
Sent: Monday, December 11, 2006 6:59 AM
To: Anthony Papillion
Cc: php-general@xxxxxxxxxxxxx
Subject: Re: [PHP] Need help with RegEx

At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
Hello Everyone,

I am having a bit of problems wrapping my head around regular
expressions. I
thought I had a good grip on them but, for some reason, the expression
I've
created below simply doesn't work! Basically, I need to retreive all of
the
text between two unique and specific tags but I don't need the tag text.
So
let's say that the tag is

<tag lang='ttt'>THIS IS A TEST</tag>

I would need to retreive THIS IS A TEST only and nothing else.

Now, a bit more information: I am using cURL to retreive the entire
contents
of a webpage into a variable. I am then trying to perform the following
regular expression on the retreived text:

$trans_text = preg_match("\/<div id=result_box dir=ltr>(.+?)<\/div>/");

Using the tags you describe here, and assuming the source html is in the
variable $source_html, try this:

$trans_text = preg_replace("/(.*?)(<div id=result_box
dir=ltr>)(.*?)(<\/div>)(.*?)^/s","$3",$source_html);

The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version:

$trans_text = preg_replace("/(.*?)(<div id=result_box dir=ltr>)(.*?)(<\/div>)(.*?)/s","$3",$source_html);


how this breaks down is:

opening quote for first parameter (your MATCH pattern).

open regex match pattern= /

first atom (.*?) = any or no leading text before <div id=result_box
dir=ltr>,
the ? makes it non-greedy so that it stops after finding the first match.

second atom (<div id=result_box dir=ltr>) = the opening tag you are
looking for.

third atom (.*?) = the text you want to strip out, all text even if
nothing is
there, between the 2nd and
4th atoms.

fourth atom (<\/div>) = the closing tag of the div tag pair.

fifth atom (.*?) = all of the rest of the source html after the closing
tag up
to the end of the line ^,even if there is nothing there.

close regex match pattern= /s

in order for this to work on html that may contain newlines, you must
specify
that the . can represent newline characters, this is done by adding the
letter
's' after your regex closing /, so the last thing in your regex match
pattern
would be /s.

end of string ^ (this matches the end of the string you are
matching/replacing
, $source_html)

ignore this part of the explanation, the ^ is not needed and in fact breaks the example given


closing quote for first parameter.

The second parameter of the preg_replace is the atom # which contains the
text
you want to replace the text matched by the regex match pattern in the
first
parameter, in this case the text we want is in the third atom so this
parameter
would be $3 (this is the PHP way of back-referencing, if we wanted the
text
before the tag we would use atom 1, or $1, if we want the tag itself we
use $2,
etc basically a $ followed by the atom # that holds what we want to
replace the
$source_html into $trans_text).

The third parameter of the preg_replace is the source you wish to match
and
replace from, in this case your source html in $source_html.

after this executes, $trans_text should contain the innerText of the <div
id=result_box dir=ltr></div> tag pair from $source_html, if there is
nothing
between the opening and closing tags, $trans_text will == "", if there is
only
a newline between the tags, $trans_text will == "\n". IMPORTANT: if the
text
between the tags contains a newline, $trans_text will also contain that
newline
character because we told . to match newlines.

I am no regex expert by far, but this worked for me (assuming I copied it
correctly here heh)
There are doubtless many other ways to do this, and I am sure others on
the
list here will correct me if my way is wrong or inefficient.

I hope this works for you and that I haven't horribly embarassed myself
here.
Good luck :)


The problem is that when I echo the value of $trans_text variable, I end
up
with the entire HTML of the page.

Can anyone clue me in to what I am doing wrong?

Thanks,
Anthony

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

.



Relevant Pages

  • RE: [PHP] Need help with RegEx
    ... I want what is inside the tags. ... fifth atom = all of the rest of the source html after the closing ... can represent newline characters, this is done by adding the ...
    (php.general)
  • RE: [PHP] Need help with RegEx
    ... Download and play with "The Regex Coach" ... I want what is inside the tags. ... fifth atom = all of the rest of the source html after the ...
    (php.general)
  • Re: S-expression I/O in Ada
    ... complexity ... Serialization isn't only about access values. ... into an atom by providing a pair of To_String, From_String operations, ... except I wouldn't use String as an intermediate value, ...
    (comp.lang.ada)
  • Re: Please help:need to read XML, edit, and write back to XML file
    ... I copied Astrid's code into a VBA module and ran it, ... give me any of the tags. ... string, as the scripting code gave me. ... reading, writing or appending. ...
    (microsoft.public.word.vba.general)
  • Re: Please help:need to read XML, edit, and write back to XML file
    ... And it's perfect - it puts everything in a string with all the tags, ... of the code can be used in a VBA routine. ... the information in the various forms and generates an XML file. ...
    (microsoft.public.word.vba.general)