Re: What kind of tcl tools would help me parse and use html info?



Larry W. Virden wrote:

I know others have replied, but...

I have a need to write a tool to do this:

fetch an html http URL

Use the http pacakge

parse the html

I'd use htmlparse package from TclLib with the -cmd option

Look through the A tags for some specific phrases

The routine you specify in the ::htmlparse::parse via the -cmd option will be called for every tag, just check to see if the tag is an A.

For each one found, check a file cache. If the URL associated with the
tag is in the cache, see if it has been modified since it was placed
into the cache.

file mtime, clock scan and string equal

If not, continue.
If it has been modified, or if it doesn't exist in the cache, then
fetch the URL,

Again use the http package

place into the cache, and touch to make the cache copy
have the date and time from the web site.

file mtime $filename $webDateTime

For one of the specific phrases, instead of caching the file, treat it
as the next html to parse and search.

Put the above in a proc and recursively call it.

When one specific term is no longer found, application is finished.

The stack unwinds and you exit.

The only other possible thing for the algorithm above is that one of
the URLs is the URL of a CGI with values. The other URLs are just
static HTML pages.

What are some examples using some of the Tcl tools for parsing that
fetched file and searching the A tags for phrases?


Take a look at the htmlparse.test on tcllib.sf.net

--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
.



Relevant Pages

  • Re: ruby html (or xhtml) forms class...
    ... xx is a library designed to extend ruby objects with html, xhtml, and xml ... xml or xhtml as clean looking and natural as ruby it self. ... attributes may be passed to any tag method as either symbol or string. ...
    (comp.lang.ruby)
  • Re: CSS Safari Problem (Mac)
    ... indirectly quoted, ... anything to say on my point: mixing HTML 4.01 and XHTML 1.0 is clueless. ... The closed meta tag was copied and pasted from Google's Webmaster ... But since this is Usenet, many likely are aware that the theme song ...
    (alt.html)
  • Re: Dynamically assigning functions with parameters to events
    ... which in turn does the work at mouseover-time: ... `onmouseover' is an attribute of the `body' element in Valid HTML ... its start tag. ... That it works in a handful of ...
    (comp.lang.javascript)
  • wierd behavior in program
    ... I am reading a file with html tags in ... I read each tag and push it on the stack and pop it when I ... int degree; ...
    (comp.lang.c)
  • reuse code inquiry
    ... I am a perl beginner and I am suggested to parse HTML by using ... sub parse_html { ... # incomplete tag. ... if ($routine eq "") { ...
    (comp.lang.perl.misc)