grab html table[s] from html to files
From: Ronald Rood (devnull_at_ronr.nl)
Date: 10/06/03
- Next message: David N. Welton: "Re: grab html table[s] from html to files"
- Previous message: Sabyasachi Basu: "Re: Tcl-C question"
- Next in thread: David N. Welton: "Re: grab html table[s] from html to files"
- Reply: David N. Welton: "Re: grab html table[s] from html to files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 6 Oct 2003 04:59:31 -0700
Hi,
I have some generated html's that have tables in them that I need to
convert to a csv file per table. It should be possible with a few good
aimed regexps, question is how ?
What I tried was stripping everything from around the tables and loop
through the tables until done. The html is written with mixed case
tAblE tags and I have only a very basic tcl installation available. It
is in fact oratclsh that comes with oracle 9.2 ...
This filters out all html tags, what is nice for the last part.
regsub -all {<[^>]*>} $html {} text_only
How to start with the first part ?
Something like: regsub -all {.*(<TABLE.*/TABLE).*} $html {<\1>} tables
It just does not do what I want :-(
TABLE can also be table or TaBlE and there is an other error in it.
any hint/tip is very welcome,
Ronald.
-----------------------
http://ronr.nl/unix-dba
- Next message: David N. Welton: "Re: grab html table[s] from html to files"
- Previous message: Sabyasachi Basu: "Re: Tcl-C question"
- Next in thread: David N. Welton: "Re: grab html table[s] from html to files"
- Reply: David N. Welton: "Re: grab html table[s] from html to files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|