HTML::Tree help
From: Ing. Branislav Gerzo (konfera_at_2ge.us)
Date: 11/30/04
- Next message: Randal L. Schwartz: "Re: sorting with Swartz Transform"
- Previous message: Jonathan Paton: "Re: Search Tab-delimited file for Null fields"
- Next in thread: Jonathan Paton: "Re: HTML::Tree help"
- Reply: Jonathan Paton: "Re: HTML::Tree help"
- Reply: Randy W. Sims: "Re: HTML::Tree help"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 30 Nov 2004 16:22:42 +0100 To: beginners@perl.org
Hi all,
I have to parse some thousand of html files, so I'd like to use some
html parser, and not my own regexpes. Htmls I am parsing are quite
complex, so I need your help. First of all, is HTML::Tree good and
fast module?
Because, I am not sure if I have to look for some criteria using
if( my $h = $tree->look_down('_tag', 'sometag') ) { }
it is not slow ?
When I used Dumped through Data::Dumper, from 300 kb html file is 13mb
dump output...
Ok, and now to the problem, html looks like:
<table width="600%" border="3" align="center" cellspacing="2" cellpadding="2" bgcolor='#eeffff'>
<tr>
<td align="left" valign="top" width="20%"> <span class="tl">TEST: </span></td>
<td align="left" width="80%"><table width="100%" border="0">
<tr>
<td width="67%"> <span class='ra'> Vysoká </span> <span class='ra'> 9 </span><br> <span class='ra'> Bratislava </span> <span class='ra'> 810 00 </span><br></td>
<td width="33%" valign='top'> <span class='ra'>something</span></td>
</tr>
</table><table width="100%" border="0">
<tr>
<td width="67%"> <span class='ro'> Nám. SNP </span> <span class='ro'> 15 </span><br> <span class='ro'> Bratislava </span> <span class='ro'> 810 00 </span><br></td>
<td width="33%" valign='top'> <span class='ro'>something</span></td>
</tr>
</table><table width="100%" border="0">
<tr>
<td width="67%"> <span class='ro'> Bratislava </span><br></td>
<td width="33%" valign='top'> <span class='ro'>something</span></td>
</tr>
</table></td>
</tr>
</table>
(I hope you will see it ok, if not http://www.2ge.us/perl/html.txt ).
Ok, and now to the problem - nearly whole html is full of this kind
tables. And now how to extract values from there ? I have to look out,
if class = "tl" and value is /TEST:/i, if yes, give me all values till
end of whole table. Should be someone so neat and give me some help ?
Hint: in table is always one class='ra' and optional 0 or more
class='ro'
thanks for any help!
-- --. ,-- ,- ICQ: 7552083 \|||/ `//EB: www.2ge.us ,--' | - |-- IRC: [2ge] (. .) ,\\SN: 2ge!2ge_us `====+==+=+===~ ~=============-o00-(_)-00o-================~ John Tesh might drive (John says ride) a Celica.
- Next message: Randal L. Schwartz: "Re: sorting with Swartz Transform"
- Previous message: Jonathan Paton: "Re: Search Tab-delimited file for Null fields"
- Next in thread: Jonathan Paton: "Re: HTML::Tree help"
- Reply: Jonathan Paton: "Re: HTML::Tree help"
- Reply: Randy W. Sims: "Re: HTML::Tree help"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|