Re: UnicodeDecodeError help please?



Robin Haswell wrote:
Okay I'm getting really frustrated with Python's Unicode handling, I'm
trying everything I can think of an I can't escape Unicode(En|De)codeError
no matter what I try.

Have you read any of the documentation about Python's Unicode support? E.g.,

http://effbot.org/zone/unicode-objects.htm

Could someone explain to me what I'm doing wrong here, so I can hope to
throw light on the myriad of similar problems I'm having? Thanks :-)

Python 2.4.1 (#2, May 6 2005, 11:22:24)
[GCC 3.3.6 (Debian 1:3.3.6-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import sys
sys.getdefaultencoding()

'utf-8'

How did this happen? It's supposed to be 'ascii' and not user-settable.

import htmlentitydefs
char = htmlentitydefs.entitydefs["copy"] # this is an HTML © - a copyright symbol
print char

©

str = u"Apple"
print str

Apple

str + char

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte

a = str+char

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte

The values in htmlentitydefs.entitydefs are encoded in latin-1 (or are numeric
entities which you still have to parse). So decode using the latin-1 codec.

--
Robert Kern
robert.kern@xxxxxxxxx

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

.



Relevant Pages