Re: WTF? Printing unicode strings



Ron Garret wrote:

I'm using an OS X terminal to ssh to a Linux machine.

Click on the "Terminal" menu, then "Window Settings...". Choose "Display" from
the combobox. At the bottom you will see a combobox title "Character Set
Encoding". Choose "Unicode (UTF-8)".

But what about this:

f2=open('foo','w')
f2.write(u'\xFF')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)

That should have nothing to do with my terminal, right?

Correct, that is a different problem. f.write() expects a string of bytes, not a
unicode string. In order to convert unicode strings to byte strings without an
explicit .encode() method call, Python uses the default encoding which is
'ascii'. It's not easily changeable for a good reason. Your modules won't work
on anyone else's machine if you hack that setting.

I just found http://www.amk.ca/python/howto/unicode, which seems to be
enlightening. The answer seems to be something like:

import codecs
f = codecs.open('foo','w','utf-8')

but that seems pretty awkward.

<shrug> About as clean as it gets when dealing with text encodings.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

.



Relevant Pages

  • Re: Problem reading file with umlauts
    ... UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range ... This file is contains data in the unicode ... character set and is encoded with utf-8. ...
    (comp.lang.python)
  • Re: Problem reading file with umlauts
    ... 'ascii' codec can't encode character u'\ufeff' in ... 'text' contains Unicode, but you're writing it to a file that's not ... or encode the text before writing: ... character set and is encoded with utf-8. ...
    (comp.lang.python)
  • Re: Problem reading file with umlauts
    ... UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range ... 'text' contains Unicode, but you're writing it to a file that's not ... character set and is encoded with utf-8. ...
    (comp.lang.python)
  • Re: decode unicode string using unicode_escape codecs
    ... notation like '\n' for LF. ... I'm trying to use the builtin codec because I assume it has better performance that for me to write pure Python decoding. ... But I'm not converting between byte string and unicode string. ... UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range ...
    (comp.lang.python)
  • Re: unicode and socket
    ... Unicode is an abstract data type. ... >> decode the octets to retrieve the unicode string. ... it's safer to just encode them to, say, UTF-8, transfer ... UTF-8 octets back into a unicode string. ...
    (comp.lang.python)