Re: Unicode and Zipfile problems

From: Gerson Kurz (gerson.kurz_at_t-online.de)
Date: 11/06/03


Date: Thu, 06 Nov 2003 18:01:28 GMT


>but you HAVE TO care, since on MS Windows, if a filename is unicode,
>it is UTF-16 and you just cannot convert it into stream of bytes
>without messing up with encodings. UTF-16 is not even ASCII compatible,
>after all.

Well, you know, the strange thing is: I have written C and C++
programms on Win32 ever since WinNT 3.x (Unicode was not part of
16-bit Windows) and a few of these actually work. Sort of. Even though
they cannot work because they don't support UTF-16, right?

And the same goes for many commercial or wellknown applications.
Antivir Professional? Kerio Personal Firewall? Free Agent? Paint Shop
Pro? Winzip? Winamp? Of course I don't have the source for these, but
the Dependency Viewer (from the Microsoft SDK) will show you that all
of these link with the ASCII-Versions of the Windows API. Seems like
there is a lot of broken apps out there! And the most shocking of all
- this holds true even of python23.dll: ShellExecuteA,
RegQueryValueExA, LoadStringA, LoadLibraryExA - its all ASCII!
Somebody better call for a major unicode cleanup!

But OK, I agree, the subject is somewhat boring - even though every
week somebody else runs into problems with this (see the thread
"Strange problem with encoding" from today) there will probably be no
change introduced in Python at this point on this subject.



Relevant Pages

  • WM_CHAR
    ... Note that WM_CHAR uses 16-bit Unicode ... of the character key that was pressed. ... version of Windows. ... WM_CHAR chooses between UTF-16 and ASCII depending on whether the window ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Multi language application
    ... I think that if you want to correctly display Unicode characters in Windows, ... to UTF-16 just before passing them to Windows controls ... and pass the UTF-16 string to Windows controls. ...
    (microsoft.public.vc.mfc)
  • Re: CString and UTF-8
    ... Nothing handles UTF-32 in Windows, so you will have to implement everything ... Mac OS X Unicoded API uses utf-16 ... A lot of work is done now for the Unicode part. ... but not too usefull in real life. ...
    (microsoft.public.vc.mfc)
  • Re: unicode in ruby
    ... wchar_t on MacOS X and Windows is UTF-16. ... superior Unicode support than anything else) both use UTF-16 as the ... native filename encoding. ...
    (comp.lang.ruby)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... (When Unix filesystems can write UTF-16 as ... to use decomposed characters instead of composed characters (e.g., ... even compress repetitive text which no encoding can. ...
    (comp.lang.ruby)