Extracting nested tables from HTML

From: Terry (just_at_say.no)
Date: 12/31/04


Date: Fri, 31 Dec 2004 10:05:50 GMT

Hi!

I have several very large HTML files from which I'd like to extract only
tables nested at the deepest level. I thought this would be quite easy
by extracting something like (<table.*?</table>) where I'd alter the
'.*?' to test for and reject any new occurrences of a starting table
tag, but I can't seem to get it. Any pointers?

I want to deal with the file at a text level until the tables are
extracted, after which I plan to use HTML::TableContentParser to extract
the needed content.

Thanks for your help.

Terry.



Relevant Pages

  • Re: Regular expression for extracting hrefs from HTML file
    ... R (Chandra) Chandrasekhar wrote: ... I am trying to construct a regular expression to extract strings having the structure ... from HTML files, ...
    (perl.beginners)
  • Re: PHP4 : Extract text from HTML file
    ... It looks like these functons are used for XML files, ... used for html files? ... used a regular expression because i don't know the "format between ... These function should be able to extract the text from any tags! ...
    (comp.lang.php)
  • How to parse section of html code?
    ... I have some .html files, and the files contain a line in which I need to ... extract a substring. ... In the two example lines below, I need to extract ... How can I parse this value from the html code? ...
    (comp.lang.perl.misc)
  • Extracting names from HTML
    ... script or some way for me to extract a ... list of names from about 100 html files that I've created. ... to get all these names into a text file without any of the html code. ...
    (comp.os.os2.apps)
  • DB Setup
    ... I have a file that I would like to extract all the 2005 records from into a ... most fool proof? ... Terry ... Prev by Date: ...
    (comp.databases.filemaker)