Re: HTML tag Parsing and extracting data.
- From: schlenk@xxxxxxxxxxxxxxxx
- Date: Mon, 29 Oct 2007 06:45:03 -0700
majidkha...@xxxxxxxxx wrote:
Hi,
I am new in TCL . Let me tell you what I am want to do which so far I
am trying to but failed.
I would like to parse an HTML page say it is .
http://www.cmcelectronics.ca/En/Careers/job_display_en.php?JOB_ID=511
or
http://www.sanjel.com/careers/jobDesc.cfm?numJobBoardID=440
and I want to extract the data/info of "Duties & Responsibilities:" ,
"Description:" , "Summary" or "Responsibilities" and etc etc..
So I am looking for the code which should be generic enough in a sense
that if we pass "descriptions" or "description" or any keyword whic I
mentioned above or could be any then it looks and extract the
information related to that keyword or heading..
Basically 'generic' is hard in this regard due to the way HTML turned
into a tag soup and isn't properly annotated at all (even harder if
javascript is involved).
A simple sledgehammer would be regexp..., a little more sophisticated
something like tcllib htmlparse or tdom in html mode. Even tclwebtest
might be helpful.
See: http://wiki.tcl.tk/2204
http://wiki.tcl.tk/tdom
Michael
.
- Follow-Ups:
- Re: HTML tag Parsing and extracting data.
- From: chihung
- Re: HTML tag Parsing and extracting data.
- References:
- HTML tag Parsing and extracting data.
- From: majidkhan59
- HTML tag Parsing and extracting data.
- Prev by Date: Re: Constructing commands for [exec] under Windows
- Next by Date: Re: Standard DBI Proposal
- Previous by thread: Re: HTML tag Parsing and extracting data.
- Next by thread: Re: HTML tag Parsing and extracting data.
- Index(es):
Relevant Pages
|