http::geturl parse 1 line at a time after keyword?






Hi All

I'm trying to port a script that parses wunderground.com's html output. The
original script used tcl's "socket" code. I'm attempting to translate it to use
the tcl http package.

My question is this: How to step through several lines after finding a keyword.

For example, the original code used:

if {[regexp {Local Time:} $wzout]} {
for {set t 0} {$t <= 4} {incr t} {
set wzout [gets $wzsock]
regexp {<b>(.*?)</b>} $wzout match localdate
regexp {(.*?) on} $localdate match localtime
if {![regexp on $localdate]} {
set localtime $localdate
}
}
}

The original above basically grabs "Local Time:" and then pulls in up to 5 more
lines, one line at a time, looking for "<b>data</b>" to get the right data.

This is what the html looks like, for the example:

<td class="taR" style="white-space: nowrap;">Local Time:</td>
<td id="full" style="white-space: nowrap;">
<b>2:38 AM EDT</b>


I would prefer to not use the htmlParse package, since this will be
running in an eggdrop bot, and don't want to require other packages if it can
be avoided.

Since I'm going to be using the http package, the entire html source will be in
a var. So my question is, how could I step 1 line at a time, up to a specific
number of lines, after finding the keyword?

I've seen some examples of using multi-line regex's, and that may work. There
are some instances on wunderground where the format changes, however. For
example, in the 'Lunar phases' section:

The usual format:

<td style="padding: 3px; border-bottom: 1px solid #CCC;">Moon</td>
<td style="border-bottom: 1px solid #CCC;">11:19 AM EDT</td>
<td style="border-bottom: 1px solid #CCC;">No Moon Set</td>
</tr>

Unusual format:

<td style="padding: 3px; border-bottom: 1px solid #CCC;">Moon</td>
<td style="border-bottom: 1px solid #CCC;">
10:14 AM AKDT (6/30)
</td>
<td style="border-bottom: 1px solid #CCC;">
2:56 AM AKDT (6/30)
</td>
</tr>

This is the original code to handle the above:

# moonrise/set
set i 0
set wzout [gets $wzsock]
# Find the keyword "Moon"
while {([regexp {>Moon</td>} $wzout] == 0) && ($i < 40)} {
incr i
set wzout [gets $wzsock]
}
set mrset "0"
set msset "0"
set msfp "1"
# step 1 line at a time til we get the data:
for {set t 0} {$t <= 4} {incr t} {
set wzout [gets $wzsock]
if {$mrset == "0"} {
set mrset [regexp {>(.*?)</td>} $wzout match moonrise]
if {$mrset == 0} {
#hope this line doesn't word-wrap:
set mrset [regexp {[[:space:]]{2}([[:digit:]]{1,2}:[[:digit:]]{2} .*)} $wzout match moonrise]
# word-wrap?
}
}
if {($msset == "0") && ($mrset == 1)} {
if {$msfp == 1} {
set wzout [gets $wzsock]; incr t;incr msfp
}
set msset [regexp {>(.*?)</td>} $wzout match moonset]
if {$msset == 0} {
#same here, word-wrap possible:
set msset [regexp {[[:space:]]{2}([[:digit:]]{1,2}:[[:digit:]]{2} .*)} $wzout match moonset]
# word-wrap?
}
}
}


Hoping that someone can show me an example to step 1 line at a time through the
http::geturl variable/data, after finding the keywords, similar to how the original worked.

I've also thought about using the 'foreach line $html' type code, but I'm
trying to minimize how many times the $html has to be parsed, to make this faster. Plus
I would think that using "foreach line" will give me too many bad matches. I need to be able
to limit the matches to within a specific number of lines after finding the keyword.

Thanks in advance for any ideas/clues. :)

-C.L.



.



Relevant Pages

  • regex for replacing plain text within html string...
    ... i have a tricky problem and my regex expertise has reached its limit. ... preserve the html, and replace some of the plain text. ... problems because the keyword may appear in markup tags or attribute ... it essentially matches the keyword inside the inner text of a html tag ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: traversing a variable with regex instead of a file
    ... wouldn't surprise me if one of the Regex Gurus could do it even better ... > # no keyword links to be made inside valid HTML ...
    (perl.beginners)
  • Search Keyword in HTML page
    ... Now I could highlight text using your code. ... checking whether any specific keyword is present in HTML page,no need ... But my further requirement is to exclude search for some HTML ... scrollbar to move down page. ...
    (microsoft.public.inetsdk.programming.webbrowser_ctl)
  • Search Keyword in HTML Page
    ... Now I could highlight text using your code. ... But my further requirement is to exclude search for some HTML elements ... return TRUE for keyword "Email" though it is not present in content of real ...
    (microsoft.public.vc.mfc)
  • Re: forms
    ... Do you just want to use JavaScript to do a calculation? ... Watch for word-wrap. ...
    (comp.lang.javascript)