Re: encode() question
- From: "Gabriel Genellina" <gagsl-py2@xxxxxxxxxxxx>
- Date: Tue, 31 Jul 2007 14:18:59 -0300
En Tue, 31 Jul 2007 13:53:11 -0300, 7stud <bbxx789_05ss@xxxxxxxxx> escribió:
s1 = "hello"
s2 = s1.encode("utf-8")
s1 = "an accented 'e': \xc3\xa9"
s2 = s1.encode("utf-8")
The last line produces the error:
---
Traceback (most recent call last):
File "test1.py", line 6, in ?
s2 = s1.encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
17: ordinal not in range(128)
---
The error is a "decode" error, and as far as I can tell, decoding
happens when you convert a regular string to a unicode string. So, is
there an implicit conversion taking place from s1 to a unicode string
before encode() is called? By what mechanism?
Converting from unicode characters into a string of bytes is the "encode" operation: unicode.encode() -> str
Converting from string of bytes to unicode characters is the "decode" operation: str.decode() -> unicode
str.decode and unicode.encode should NOT exist, or at least issue a warning (IMHO).
When you try to do str.encode, as the encode operation requires an unicode source, the string is first decoded using the default encoding - and fails.
--
Gabriel Genellina
.
- Follow-Ups:
- Re: encode() question
- From: 7stud
- Re: encode() question
- References:
- encode() question
- From: 7stud
- encode() question
- Prev by Date: Re: What is the "functional" way of doing this?
- Next by Date: Encryption recommendation
- Previous by thread: encode() question
- Next by thread: Re: encode() question
- Index(es):
Relevant Pages
|