Re: Unicode is driving me nuts!
From: Anthony Liu (antonyliu2002_at_yahoo.com)
Date: Sat, 13 Mar 2004 00:35:45 -0800 (PST) To: py <email@example.com>
Thank you, Skip. You know what, I guess I'll give up
using unicode, as you also mentioned you used to have
headache with it.
I'll probably just read by bytes and check if the byte
is a Chinese character. If it is, read 2 bytes
instead. What do you think? This way, I will
hopefully not to have a lot of unreadable characters.
--- Skip Montanaro <firstname.lastname@example.org> wrote:
> Anthony> str = unicode(raw_str, myencoding)
> Anthony> This works just fine with a small
> sample Chinese document.
> Anthony> But when I attempted to run the script
> on the entire corpus, I
> Anthony> get the typical "incomplete multibyte
> sequence error" or
> Anthony> "UnicodeEncodeError: 'ascii' codec
> can't encode characters in
> Anthony> position 0-23: ordinal not in
> Can you craft a small example which demonstrates the
> error but which you
> think is correctly encoded?
> Anthony> I am at my wit's end, so frustrated at
> Anthony> non-ascii texts.
> Unicode creates lots of problems for the
> uninitiated. I pulled my hair out
> for a long time. It took me a couple tries to get
> my system to work
> (more-or-less) with Unicode. It's still got the
> occasional problem.
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam