Re: Unicode & Pythonwin / win32 / console?



Robert wrote:
> * Webbrowsers for example have to display defective HTML as good as
> possible, unknown unicode chars as "?" and so on... Users got very
> angry in the beginning of browsers when 'strict' programmers displayed
> their exception error boxes ...

Right. If you would develop a webbrowser in Python, you should do the
same.

> No one is really angry when
> occasionally chinese chars are displayed cryptically on non-chinese
> computers.

That is not true. Japanese are *frequently* upset when their
characters don't render correctly. They even have a word for that:
moji-bake. I assume it is the similar for Chinese.

> * anything is nice-printable in python by default, why not
> unicode-strings!? If the decision for default 'strict' encoding on
> stdout stands, we have at least to discuss about print-repr for
> unicode.

If you want to see this change really badly, you need to write a PEP.

> * on Windows for example the (good) mbcs_encode is anyway tolerant as
> it: unkown chars are mapped to '?' . I never had any objection to this.

Apparently, you haven't been dealing with character sets long enough.
I have seen *a lot* of objections to the way the CP_ACP encoding
deals with errors, e.g.

http://groups.google.com/group/comp.lang.python/msg/dea84298cb2673ef?dmode=source&hl=en

When windows converts these file names in CP_ACP, then the
file names in a directory are not round-trippable. This is
a source of permanent pain.

> * I would also live perfectly with .encode(enc) to run 'replace' by
> default, and 'strict' on demand. None of my apps and scripts would
> break because of this, but win. A programmer is naturally very aware
> when he wants 'strict'. Can you name realistic cases where 'replace'
> behavior would be so critical that a program damages something?

File names. Replace an unencodable filename with a question mark,
and you get a pattern that matches multiple files. For example, do

get_deletable_files.py | xargs rm

and you remove much more files than you want to. Pretty catastrophic.

Regards,
Martin
.



Relevant Pages

  • Re: outputting Unicode to console
    ... when the program is compiled to use wide chars the printed output is correspondingly wide! ... to use UNICODE in console apps... ... your console must have a appropriate font which can display the chars; the normal font is not able to do this. ...
    (microsoft.public.win32.programmer.kernel)
  • Re: CFile ops using char or TCHAR
    ... >just noticed that my file handling class is working chars not TCHARs. ... >Do I need to be working in wide chars for CFile operations, ... you need to use the type of characters you want ... CE is heavily biased towards UNICODE. ...
    (microsoft.public.windowsce.embedded.vc)
  • Re: Is this expression viewable on sci.math?
    ... In general, any characters outside ... >> display as expected on all screens. ... > display Unicode nowadays. ... I'm not certain a good news reader suffices: ...
    (sci.math)
  • Re: i18n/font problem in mozilla/firefox/seamonkey
    ... or any of the mozilla-suite browsers that I have (mozilla, firefox, ... the problem character codes show as ... all the chars but the accents display correctly. ...
    (comp.sys.mac.apps)
  • Re: Fedora, unicode, console
    ... > to get UTF-8 enabled in console? ... *all* the Unicode characters: Fedora has chosen a good one, ... > has not all UTF-8 chars, ... Well, in vim, if you know the Unicode reference, try ...
    (Fedora)