Re: UnicodeDecodeError help please?
- From: Robert Kern <robert.kern@xxxxxxxxx>
- Date: Fri, 07 Apr 2006 11:50:43 -0500
Robin Haswell wrote:
Okay I'm getting really frustrated with Python's Unicode handling, I'm
trying everything I can think of an I can't escape Unicode(En|De)codeError
no matter what I try.
Have you read any of the documentation about Python's Unicode support? E.g.,
http://effbot.org/zone/unicode-objects.htm
Could someone explain to me what I'm doing wrong here, so I can hope to
throw light on the myriad of similar problems I'm having? Thanks :-)
Python 2.4.1 (#2, May 6 2005, 11:22:24)
[GCC 3.3.6 (Debian 1:3.3.6-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys.getdefaultencoding()
'utf-8'
How did this happen? It's supposed to be 'ascii' and not user-settable.
import htmlentitydefs
char = htmlentitydefs.entitydefs["copy"] # this is an HTML © - a copyright symbol
print char
©
str = u"Apple"
print str
Apple
str + char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte
a = str+char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte
The values in htmlentitydefs.entitydefs are encoded in latin-1 (or are numeric
entities which you still have to parse). So decode using the latin-1 codec.
--
Robert Kern
robert.kern@xxxxxxxxx
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
.
- References:
- UnicodeDecodeError help please?
- From: Robin Haswell
- UnicodeDecodeError help please?
- Prev by Date: Re: binding - python
- Next by Date: Why did someone write this?
- Previous by thread: UnicodeDecodeError help please?
- Next by thread: Re: UnicodeDecodeError help please?
- Index(es):
Relevant Pages
|