HTMLParser question
From: Rajarshi Guha (rajarshi_at_presidency.com)
Date: 08/19/04
- Next message: Dave Benjamin: "Re: Jython and super_reload?"
- Previous message: Richard Hanson: "Re: age of Python programmers"
- Next in thread: Benjamin Niemann: "Re: HTMLParser question"
- Reply: Benjamin Niemann: "Re: HTMLParser question"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 19 Aug 2004 11:27:24 -0400
Hi,
I have some HTML that looks essentially consists of a series of <div>'s
and each <div> having one of two classes (tnt-question or tnt-answer).
I'm using HTMLParser to handle the tags as:
class MyHTMLParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
if len(attrs) == 1:
cls,whichcls = attrs[0]
if whichcls == 'tnt-question':
print self.get_starttag_text(), self.getpos()
def handle_endtag(self, tag):
pass
def handle_data(self, data):
print data
if __name__ == '__main__':
htmldata = string.join(open('tt.html','r').readlines())
parser = MyHTMLParser()
parser.feed( htmldata )
However what I would like is that when the parser reaches some HTML like
this:
<div class="tnt-question">
How do I add a user to a MySQL system?
</div>
I should get back the data between the open and close tags. However the
above code prints the text contained between all tags, not just the <div>
tags with the class='tnt-question'.
Is there a way to call handle_data() when a specific tag is being handled?
Placing a call to handle_data() in handle_starttag seems to be the way -
but I';m not sure how to actually do it - what data should I pass to the
call?
Any pointers would be appreciated
Thanks,
Rajarshi
- Next message: Dave Benjamin: "Re: Jython and super_reload?"
- Previous message: Richard Hanson: "Re: age of Python programmers"
- Next in thread: Benjamin Niemann: "Re: HTMLParser question"
- Reply: Benjamin Niemann: "Re: HTMLParser question"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|