UnicodeDecodeError help please?



Okay I'm getting really frustrated with Python's Unicode handling, I'm
trying everything I can think of an I can't escape Unicode(En|De)codeError
no matter what I try.

Could someone explain to me what I'm doing wrong here, so I can hope to
throw light on the myriad of similar problems I'm having? Thanks :-)

Python 2.4.1 (#2, May 6 2005, 11:22:24)
[GCC 3.3.6 (Debian 1:3.3.6-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys.getdefaultencoding()
'utf-8'
import htmlentitydefs
char = htmlentitydefs.entitydefs["copy"] # this is an HTML © - a copyright symbol
print char
©
str = u"Apple"
print str
Apple
str + char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte
a = str+char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte


Basically my app is a search engine - I'm grabbing content from pages
using HTMLParser and storing it in a database but I'm running in to these
problems all over the shop (from decoding the entities to calling
str.lower()) - I don't know what encoding my pages are coming in as, I'm
just happy enough to accept that they're either UTF-8 or latin-1 with
entities.

Any help would be great, I just hope that I have a brainwave over the
weekend because I've lost two days to Unicode errors now. It's even worse
that I've written the same app in PHP before with none of these problems -
and PHP4 doesn't even support Unicode.

Cheers

-Rob
.



Relevant Pages

  • Re: Why C#? CPU independent?
    ... my app is very UI intensive. ... such as VB and MFC. ... his program's load time became four seconds. ... WinMo only supports Unicode. ...
    (microsoft.public.pocketpc.developer)
  • Getting UNICODE characters from wcin in a Console app
    ... Can anyone explain why I can't get Unicode code points out of what I ... type into a console app? ... Clearly that's because the console I'm running the app in is in the ...
    (microsoft.public.vc.language)
  • Unicode
    ... Can anyone please answer me the following questions on unicode apps? ... I'm building an application that needs to interact with BSTRs, ... functions I'm building to accept MBCS strings, i.e. it needs to convert them ... * my app would function ok if distributed as a whole, ...
    (microsoft.public.vc.language)
  • Re: Populating CString in Win32 dll interface that accepts LPCTSTR
    ... and I want a string to be populated by calling this inside a UNICODE ... CString str; ... If your app is Unicode, and your DLL returns 8-bit characters, you would declare ...
    (microsoft.public.vc.mfc)
  • Re: Problem with GetTextMetrics and Unicode
    ... > I am in the process of converting one of my apps to Unicode and have ... > GetTextMetrics in my Paint function I get a runtime error when the ... > 'textmetrics' variable. ... If I run the app outside of the debugger I ...
    (microsoft.public.win32.programmer.gdi)