Printing UTF-8



I am new to unicode so please bear with my stupidity.

I am doing the following in a Python IDE called Wing with Python 23.

s = "äöü"
print s
äöü
print s
äöü
s
'\xc3\xa4\xc3\xb6\xc3\xbc'
s.decode('utf-8')
u'\xe4\xf6\xfc'
u = s.decode('utf-8')
u
u'\xe4\xf6\xfc'
print u.encode('utf-8')
äöü
print u.encode('latin1')
äöü

Why can't I get äöü printed from utf-8 and I can from latin1? How
can I use utf-8 exclusivly and be able to print the characters?

I also did the same thing an the same machine in a command window...
ActivePython 2.3.2 Build 230 (ActiveState Corp.) based on
Python 2.3.2 (#49, Oct 24 2003, 13:37:57) [MSC v.1200 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
s = "äöü"
print s
äöü
s
'\x84\x94\x81'
s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte
u = s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 0:
unexpected code byte


Why such a difference from the IDE to the command window in what it can
do and the internal representation of the unicode?

Thanks,
Shel

.



Relevant Pages

  • UnicodeDecodeError help please?
    ... Okay I'm getting really frustrated with Python's Unicode handling, ... UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte ... Basically my app is a search engine - I'm grabbing content from pages ...
    (comp.lang.python)
  • Re: help with unicode email parse
    ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ... So "str"%not accept unicode, ...
    (comp.lang.python)
  • Unicode and MoinMoin
    ... I'm writing scripts to export all our Movable Type blog posts to wiki ... The only issue I'm having relates to Unicode. ... 'utf8' codec can't decode byte 0x96 in position 4910: unexpected code ...
    (comp.lang.python)
  • Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
    ... Umm, just a wild guess, but how about raising an exception which includes ... # Decode and filter the list "manually" here. ... `path` to the file system encoding for getting the file names. ... unicode objects and from my linux file systems experience file names are ...
    (comp.lang.python)
  • Re: Unicode/UTF-8 decoding
    ... I don't really know how this work, but IE or Firefox browser can decode easily. ... This text looks as it has been decoded with a different encoding than was used to encode it. ... If you want to store unicode strings in the MySQL database, it has to be set up to use unicode as character set. ... While this gives the correct result for some strings, some byte codes used in UTF-8 doesn't represent a single character by themselves, so if you contine to store mis-decoded strings as unicode, you will sooner or later experience corrupted strings. ...
    (microsoft.public.dotnet.languages.vb)