Re: LANG, locale, unicode, setup.py and Debian packaging



2. If this returns "C" or anything without 'utf8' in it, then things start
to go downhill:
2a. The app assumes unicode objects internally. i.e. Whenever there is
a "string like this" in a var it's supposed to be unicode. Whenever
something comes into the app (from a filename, a file's contents, the
command-line) it's assumed to be a byte-string that I decode("utf8") on
before placing it into my objects etc.

That's a bug in the app. It shouldn't assume that environment variables
are UTF-8. Instead, it should assume that they are in the locale's
encoding, and compute that encoding with locale.getpreferredencoding.

2b. Because of 2a and if the locale is not 'utf8 aware' (i.e. "C") I start
getting all the old 'ascii' unicode decode errors. This happens at every
string operation, at every print command and is almost impossible to fix.

If you print non-ASCII strings to the terminal, and you can't be certain
that the terminal supports the encoding in the string, and you can't
reasonably deal with the exceptions, you should accept moji-bake, by
specifying the "replace" error handler when converting strings to the
terminal's encoding.

3. I made the decision to check the locale and stop the app if the return
from getlocale is (None,None).

I would avoid locale.getlocale. It's a pointless function (IMO).

Also, what's the purpose of this test?

Does anyone have some ideas? Is there a universal "proper" locale that we
could set a system to *before* the Debian build stuff starts? What would
that be - en_US.utf8?

Your program definitely, absolutely must work in the C locale. Of
course, you cannot have any non-ASCII characters in that locale, so
deal with it.

If you have solved that, chances are high that it will work in other
locales as well (but be sure to try Turkish, as that gives a
surprising meaning to "I".lower()).

Regards,
Martin
.



Relevant Pages

  • Re: eval and unicode
    ... encoding your terminal/file/whatnot is written in. ... you have a byte string that starts with u, then ", then something ... The first item in the sequence is \u5fb9 -- a unicode code point. ...
    (comp.lang.python)
  • Re: Multi language application
    ... the block below implies that you created your app as "Unicode app". ... Whether you choose UTF8 or UTF16 is really up to you but by default in a "Unicode app", you're likely to write less code with UTF16 strings. ... You can use::MultiByteToWideChar Win32 API to convert from UTF-8 to UTF-16, and pass the UTF-16 string to Windows controls. ...
    (microsoft.public.vc.mfc)
  • Re: Populating CString in Win32 dll interface that accepts LPCTSTR
    ... and I want a string to be populated by calling this inside a UNICODE ... CString str; ... If your app is Unicode, and your DLL returns 8-bit characters, you would declare ...
    (microsoft.public.vc.mfc)
  • Ruby, Unicode - ever?
    ... Why can't ruby use at least ICU libs? ... proper Unicode support, don't try to cheat me, that it's OK and enough, ... Ruby String class in current state is TOO MUCH OVERLOADED: ... encoding is senseless - this is plain bit stream. ...
    (comp.lang.ruby)
  • Re: Why asci-only symbols?
    ... >> Perhaps string equivalence in keys will be treated like numeric equivalence? ... I know typewill be and in itself contain no encoding information now, ... >and a Unicode string, the system default encoding ...
    (comp.lang.python)