Re: SGMLParser eats ä etc
From: John J. Lee (jjl_at_pobox.com)
Date: 11/30/03
- Next message: Glenn Reed: "PythonWin IDE doesn't save all project files??"
- Previous message: Glenn Reed: "Re: ActiveState Python won't call module function."
- In reply to: Anders Eriksson: "SGMLParser eats ä etc"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 30 Nov 2003 00:53:28 +0000
Anders Eriksson <ameLista@telia.com> writes:
> I'm using smgllib (ActivePython 2.3.2, build 230) and I have some trouble
> with letters that has been coded, e.g. the letter å is coded å ä is
> coded ä and ö is coded ö all according to the html standard.
>
> I use the SGMLParser and when I feed method all the coded letter will be
> stripped/eaten.
>
> Why?
> How do I fix this?
You probably want to use HTMLParser.HTMLParser instead (NOT the same
thing as htmllib.HTMLParser, note). It knows about XHTML, sgmllib &
htmllib don't. If you really want sgmllib, though (untested):
import htmlentitydefs
class MyParser(sgmllib.SGMLParser):
entitydefs = htmlentitydefs.entitydefs
def unknown_entityref(self, ref):
...
...
John
- Next message: Glenn Reed: "PythonWin IDE doesn't save all project files??"
- Previous message: Glenn Reed: "Re: ActiveState Python won't call module function."
- In reply to: Anders Eriksson: "SGMLParser eats ä etc"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]