UnicodeDecodeError help please?
- From: Robin Haswell <rob@xxxxxxxxxxxxxxxxxx>
- Date: Fri, 07 Apr 2006 17:27:24 +0100
Okay I'm getting really frustrated with Python's Unicode handling, I'm
trying everything I can think of an I can't escape Unicode(En|De)codeError
no matter what I try.
Could someone explain to me what I'm doing wrong here, so I can hope to
throw light on the myriad of similar problems I'm having? Thanks :-)
Python 2.4.1 (#2, May 6 2005, 11:22:24)
[GCC 3.3.6 (Debian 1:3.3.6-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
'utf-8'import sys
sys.getdefaultencoding()
©import htmlentitydefs
char = htmlentitydefs.entitydefs["copy"] # this is an HTML © - a copyright symbol
print char
Applestr = u"Apple"
print str
Traceback (most recent call last):str + char
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte
Traceback (most recent call last):a = str+char
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte
Basically my app is a search engine - I'm grabbing content from pages
using HTMLParser and storing it in a database but I'm running in to these
problems all over the shop (from decoding the entities to calling
str.lower()) - I don't know what encoding my pages are coming in as, I'm
just happy enough to accept that they're either UTF-8 or latin-1 with
entities.
Any help would be great, I just hope that I have a brainwave over the
weekend because I've lost two days to Unicode errors now. It's even worse
that I've written the same app in PHP before with none of these problems -
and PHP4 doesn't even support Unicode.
Cheers
-Rob
.
- Follow-Ups:
- Re: UnicodeDecodeError help please?
- From: Ben C
- Re: UnicodeDecodeError help please?
- From: Paul Boddie
- Re: UnicodeDecodeError help please?
- From: Fredrik Lundh
- Re: UnicodeDecodeError help please?
- From: Robert Kern
- Re: UnicodeDecodeError help please?
- Prev by Date: FTP
- Next by Date: GUI Treeview
- Previous by thread: FTP
- Next by thread: Re: UnicodeDecodeError help please?
- Index(es):
Relevant Pages
|