Re: encode() question



En Tue, 31 Jul 2007 13:53:11 -0300, 7stud <bbxx789_05ss@xxxxxxxxx> escribió:

s1 = "hello"
s2 = s1.encode("utf-8")

s1 = "an accented 'e': \xc3\xa9"
s2 = s1.encode("utf-8")

The last line produces the error:

---
Traceback (most recent call last):
File "test1.py", line 6, in ?
s2 = s1.encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
17: ordinal not in range(128)
---

The error is a "decode" error, and as far as I can tell, decoding
happens when you convert a regular string to a unicode string. So, is
there an implicit conversion taking place from s1 to a unicode string
before encode() is called? By what mechanism?

Converting from unicode characters into a string of bytes is the "encode" operation: unicode.encode() -> str
Converting from string of bytes to unicode characters is the "decode" operation: str.decode() -> unicode
str.decode and unicode.encode should NOT exist, or at least issue a warning (IMHO).
When you try to do str.encode, as the encode operation requires an unicode source, the string is first decoded using the default encoding - and fails.

--
Gabriel Genellina

.



Relevant Pages

  • Re: LANG, locale, unicode, setup.py and Debian packaging
    ... passed a unicode path. ... Then, I suppose, I will have to decode each resulting byte string (via the ... To display their filename on the gui and the console. ...
    (comp.lang.python)
  • Re: Ascii Encoding Error with UTF-8 encoder
    ... trying to write out using a UTF-8 encoder? ... Your fundamental problem is that you are trying to decode an 8-bit string ... You decode whatever from an 8-bit string into Unicode. ...
    (comp.lang.python)
  • Re: encode() question
    ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position ... happens when you convert a regular string to a unicode string. ... You are trying to encode a string. ...
    (comp.lang.python)
  • Re: unicode codecs
    ... > unicode() and the string method decode(). ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ... I tried various combinations of unicode and non-unicode types, ...
    (comp.lang.python)
  • Re: Python 3.1.1 bytes decode with replace bug
    ... In the original example I decoded to UTF-8 and in this example the ... The problem in your original example, and in the current one, is not in decode(), but in encode, which is implicitly called by print, when needed to convert from Unicode to some byte format of the console. ... But since you're running in a debugger, there's an implicit print, which is converting unicode into whatever your default console encoding is. ...
    (comp.lang.python)