Re: What kind of tcl tools would help me parse and use html info?



Larry W. Virden wrote:
I have a need to write a tool to do this:

fetch an html http URL
parse the html
Look through the A tags for some specific phrases
For each one found, check a file cache. If the URL associated with the
tag is in the cache, see if it has been modified since it was placed
into the cache. If not, continue.
If it has been modified, or if it doesn't exist in the cache, then
fetch the URL, place into the cache, and touch to make the cache copy
have the date and time from the web site.
For one of the specific phrases, instead of caching the file, treat it
as the next html to parse and search.
When one specific term is no longer found, application is finished.

The only other possible thing for the algorithm above is that one of
the URLs is the URL of a CGI with values. The other URLs are just
static HTML pages.

What are some examples using some of the Tcl tools for parsing that
fetched file and searching the A tags for phrases?

You could use the htmlparse or tdom packages to do the html parsing, but
both of them like their html correct, so if you could have invalid html
files they can and do fail (trash in -> trash out).

The tdom page on the wiki has an example of a tdom script that fetches
an url and extracts all links, would probably a good start.Using the
htmlparse module from tcllib would work too.

The rest sounds like a bit of http::geturl with the -command and
probably the -channel option should work quite well.

Michael

.



Relevant Pages

  • Re: What kind of tcl tools would help me parse and use html info?
    ... fetch an html http URL ... Look through the A tags for some specific phrases ... For each one found, check a file cache. ... For one of the specific phrases, instead of caching the file, treat it ...
    (comp.lang.tcl)
  • Re: Simple caching system question
    ... building cached files for the admin itself, merely pointing out that, ... Now suppose that the "article" you talk of appears on an HTML page, ... Now if the data for any one of the modules is changed, or the picture ... modules that could affect same cache because from their point of view, ...
    (comp.lang.php)
  • Re: [PHP] PHP+MySQL website cache ? Yes/No
    ... Put your shopping chart items, rendered html items in memcached. ... I was going to make this file cache system, but I relies that for each ...
    (php.general)
  • What kind of tcl tools would help me parse and use html info?
    ... fetch an html http URL ... Look through the A tags for some specific phrases ... For each one found, check a file cache. ... as the next html to parse and search. ...
    (comp.lang.tcl)
  • Re: What kind of tcl tools would help me parse and use html info?
    ... fetch an html http URL ... Look through the A tags for some specific phrases ... For each one found, check a file cache. ... For one of the specific phrases, instead of caching the file, treat it ...
    (comp.lang.tcl)