Re: Unicode is driving me nuts!

From: Anthony Liu (antonyliu2002_at_yahoo.com)
Date: 03/13/04


Date: Sat, 13 Mar 2004 00:35:45 -0800 (PST)
To: py <python-list@python.org>

Thank you, Skip. You know what, I guess I'll give up
using unicode, as you also mentioned you used to have
headache with it.

I'll probably just read by bytes and check if the byte
is a Chinese character. If it is, read 2 bytes
instead. What do you think? This way, I will
hopefully not to have a lot of unreadable characters.

--- Skip Montanaro <skip@pobox.com> wrote:
>
> Anthony> str = unicode(raw_str, myencoding)
>
> Anthony> This works just fine with a small
> sample Chinese document.
>
> Anthony> But when I attempted to run the script
> on the entire corpus, I
> Anthony> get the typical "incomplete multibyte
> sequence error" or
> Anthony> "UnicodeEncodeError: 'ascii' codec
> can't encode characters in
> Anthony> position 0-23: ordinal not in
> range(128)"
>
> Can you craft a small example which demonstrates the
> error but which you
> think is correctly encoded?
>
> Anthony> I am at my wit's end, so frustrated at
> handling
> Anthony> non-ascii texts.
>
> Unicode creates lots of problems for the
> uninitiated. I pulled my hair out
> for a long time. It took me a couple tries to get
> my system to work
> (more-or-less) with Unicode. It's still got the
> occasional problem.
>
> Skip
>

__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com



Relevant Pages

  • Re: UNICODE to MBCS
    ... a conversion internally from Unicode encoding to ANSI encoding. ... INI = UNICODE containing code points for English and Chinese ... Chinese characters where all unresolved and appearing as '?'. ...
    (microsoft.public.vc.language)
  • Re: Japanese Chinese tea web sites
    ... >>> character pairs are used for Japanese font sets. ... >>> see are from the Japanese fonts and not Chinese. ... >>> languages take two characters for representation and a corresponding ... But UTF-8 *is* Unicode. ...
    (rec.food.drink.tea)
  • Higher Unicode characters
    ... Can someone explain something regarding unicode? ... I have to work with documents written in Simplified/Traditional Chinese (by ... read the characters and it pops up the Unicode conversion dialogue box. ... Also, once the characters have been written, why is it that applying a font ...
    (microsoft.public.word.printingfonts)
  • Re: MFC(VC6) Application Localization from French to Chinese(RPC)
    ... If you are using VC6 you must be sure to have the Chinese code page loaded for the characters to work correctly. ... If you are using 2005 you can also open the RC file with notepad and resave it as Unicode and the VC resource editor will maintain it in Unicode for you. ...
    (microsoft.public.vc.mfc)
  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)