Re: the stupid encoding problem to stdout



2011/6/11 Sérgio Monteiro Basto <sergiomb@xxxxxxx>:
ok after thinking about this, this problem exist because Python want be
smart with ttys

The *anomaly* (not problem) exists because Python has a way of being
told a target encoding. If two parties agree on an encoding, they can
send characters to each other. I had this discussion at work a while
ago; my boss was talking about being "binary-safe" (which really meant
"8-bit safe"), while I was saying that we should support, verify, and
demand properly-formed UTF-8. The main significance is that agreeing
on an encoding means we can change the encoding any time it's
convenient, without having to document that we've changed the data -
because we haven't. I can take the number "twelve thousand three
hundred and forty-five" and render that as a string of decimal digits
as "12345", or as hexadecimal digits as "3039", but I haven't changed
the number. If you know that I'm giving you a string of decimal
digits, and I give you "12345", you will get the same number at the
far side.

Python has agreed with stdout that it will send it characters encoded
in UTF-8. Having made that agreement, Python and stdout can happily
communicate in characters, not bytes. You don't need to explicitly
encode your characters into bytes - and in fact, this would be a very
bad thing to do, because you don't know _what_ encoding stdout is
using. If it's expecting UTF-16, you'll get a whole lot of rubbish if
you send it UTF-8 - but it'll look fine if you send it Unicode.

Chris Angelico
.



Relevant Pages

  • Re: Trying to set a cookie within a python script
    ... even the subset of characters that *do* ... the editor what encoding you want to use. ... In python, a string literal is enclosed by single quotes, double quotes, ... My question is what is the difference of the python's script output ...
    (comp.lang.python)
  • Re: unicode printing on Windows
    ... deal with programs that print UTF8 on stdout. ... got garbage and beeping). ... Python 3.3 will support cp65001, ... It has handled encoding issues ...
    (comp.lang.python)
  • Re: From python to LaTeX in emacs on windows
    ... > encoding for the python file by a magic comment and for the input data file. ... >>will have to convert the unicode string back to a byte sequence. ... The 'ignore' argument causes all characters, ...
    (comp.lang.python)
  • Re: a question about Chinese characters in a Python Program
    ... Normally I'd entertain the possibility of bugs in Python, ... and that you indicate how the Unicode values are encoded as ... bytes (by specifying an encoding). ... EVERY SINGLE character expression with try ... ...
    (comp.lang.python)
  • Re: a question about Chinese characters in a Python Program
    ... Python encoding. ... characters, but I've seen public advertisements (admittedly aimed at ... But it's tiring in python to deal with encodings, ... Unicode character to a byte value using the ASCII character value ...
    (comp.lang.python)