grab html table[s] from html to files

From: Ronald Rood (devnull_at_ronr.nl)
Date: 10/06/03


Date: 6 Oct 2003 04:59:31 -0700

Hi,

I have some generated html's that have tables in them that I need to
convert to a csv file per table. It should be possible with a few good
aimed regexps, question is how ?
What I tried was stripping everything from around the tables and loop
through the tables until done. The html is written with mixed case
tAblE tags and I have only a very basic tcl installation available. It
is in fact oratclsh that comes with oracle 9.2 ...

This filters out all html tags, what is nice for the last part.
  regsub -all {<[^>]*>} $html {} text_only

How to start with the first part ?
Something like: regsub -all {.*(<TABLE.*/TABLE).*} $html {<\1>} tables
It just does not do what I want :-(
TABLE can also be table or TaBlE and there is an other error in it.

any hint/tip is very welcome,
Ronald.
-----------------------
http://ronr.nl/unix-dba



Relevant Pages

  • macro for parsing text
    ... I have a very large .csv file ... HTML tags from the web report. ... macro to accomplish this and hopefully get past the error message. ...
    (microsoft.public.excel.programming)
  • Re: HTML tag encoding
    ... If you open the CSV file in a text editor are all the HTML tags in there? ... are you sure it's the DTS import process that's stripping your HTML ...
    (microsoft.public.sqlserver)
  • CSV and random from elements
    ... shuffle; $show = implode; include; //contains html tags and form tag ... The problem is that I have a header in the CSV file BUT .. ... I have no idea how to put the form elements value into the proper column of the csv file ...
    (alt.php)

Loading