Re: how to extract part of HTML page



Jerry Stuckle wrote:
Ecka wrote:
Hi everyone,

I'm trying to write a PHP script that connects to a bank's currency convertor page using cURL and that part works fine. The issue is that I end up with a page that includes a lot of information that I don't need. Using the PHP function strip_tags I've ended with the text below and from the remaining HTML code, I'd like to extract the lines starting with "<TABLE BORDER="1" WIDTH="315">" up to its closing </TABLE> tag. How do I do this using PHP? I tried using preg_match and the like but my regex skills are pretty bad. Not too sure where to start. Could someone please provide me with some pointers?


=========================================================================================

<TABLE BORDER="0" WIDTH="600">
<tr>
<td width="148"></td>
<td width="448">some text some text some text some text some text</td>
</tr>
</TABLE>

<TABLE BORDER="0" WIDTH="600">
<TR><TD VALIGN="top" WIDTH="148">
</TD>
<TD WIDTH="448" VALIGN="top">
<TABLE BORDER="0" WIDTH="448">
<TR><TD>
some text some text some text some text some text some text some text some text some text some text.
some text some text some text some text some text
</TD></TR>
<TR><TD>
<TABLE BORDER="1" WIDTH="315"> <----- extract from here

<TR><TD>
some text some text some text some text some text
</TD>
<TD ALIGN="right">
some text some text some text some text some text
</TD></TR>
<TR><TD>
some text some text some text some text some text
</TD>

<TD ALIGN="right">
some text some text some text some text some text
</TD></TR>
<TR><TD>
some text some text some text some text some text
</TD>
<TD ALIGN="right">
some text some text some text some text some text
</TD></TR>
</TABLE> <--------- to here
</TD></TR>
<TR><TD>
a {
color:blue;
}
some text some text some text some text some text
some text some text some text some text some text some text some text some text some text some text
some text some text some text some text some text some text some text some text some text some text
</TD></TR>
<TR><TD>
some text some text some text some text some text
some text some text some text some text some text some text some text some text some text some text some text some
</TD></TR>
</TABLE>
</TD></TR>
</TABLE>
<br>
=========================================================================================



Thanks
Eric



Hmmm, to me a regex seems a bit overkill here. There's a lot of overhead with regex's.

How about something like:

$start = strpos($rawdata, '<TABLE BORDER="1" WIDTH="315">');
if ($start === false)
echo 'Start not found';
else {
$stop = strpos($text, '</TABLE>', $start);
if ($stop === false)
echo 'End not found';
else {
$text = substr($rawdata, $start + 31, $stop - $start - 31);
}

It's longer, but should have less overhead than a regex.


Jerry, I couldn't get your code to work, I had to make a few changes

$text = '';
$needle = '<TABLE BORDER="1" WIDTH="315">';
$start = strpos($rawdata, $needle);

if ($start === false)
echo 'Start not found';
else {
// first argument is $rawdata, not $text
$stop = strpos($rawdata, '</TABLE>', $start);

if ($stop === false)
echo 'End not found';

// removed extra { with no matching end }
else
$text = substr($rawdata, $start + strlen($needle), $stop - $start - strlen($needle));
}

Changing to strlen makes it clearer than hard coding the string length of the needle.

--
Curtis
.



Relevant Pages

  • Re: how to extract part of HTML page
    ... I'm trying to write a PHP script that connects to a bank's currency convertor page using cURL and that part works fine. ... I tried using preg_match and the like but my regex skills are pretty bad. ... echo 'Start not found'; ... but should have less overhead than a regex. ...
    (comp.lang.php)
  • RE: [PHP] Need help with RegEx
    ... echo $matches; ... [PHP] Need help with RegEx ... global $FoundStatusTag; ...
    (php.general)
  • RE: [PHP] Little regex help please...
    ... [PHP] Little regex help please... ... Here's a regex that I got off the web that I am trying to modify for ... echo $title; ...
    (php.general)
  • Re: how to extract part of HTML page
    ... I'm trying to write a PHP script that connects to a bank's currency convertor page using cURL and that part works fine. ... I tried using preg_match and the like but my regex skills are pretty bad. ... echo 'Start not found'; ... but should have less overhead than a regex. ...
    (comp.lang.php)
  • Re: 500. Turbo Sort
    ... My php code gets TimeLimitExceeded. ... echo "nothing\n"; ... the sort is only a tiny part of your overhead. ... worth, ...
    (comp.lang.php)