strange behaviour of str()
>
> Hello,
>
> I'm wondering about the following behaviour of str() with strings
> containing non-ASCII characters:
>
> str(u'foo') returns 'foo' as expected.
>
> str('lää') returns 'lää' as expected.
>
> str(u'lää') raises UnicodeEncodeError
>
This does not work, because you need an encoder to convert
unicode to str. str() does not know a priori which encoder
to use. There are many ways to encode a unicode string
to a classic byte-stream based string.
you have to procede as follows:
>>> s=u"äää"
>>> print s.encode("latin-1")
äää
try "utf-8" and "utf-16" instead of "latin-1"
Greetings, Uwe.
.
Relevant Pages
- PEP: Generalised String Coercion
... Title: Generalised String Coercion ... This PEP proposes the introduction of a new built-in function, ... use the unicode type. ... that assumes that string data is represented as str instances. ... (comp.lang.python) - Revised PEP 349: Allow str() to return unicode strings
... str() rather than adding a new built-in function. ... Allow strto return unicode strings ... write code that works with either string type and would also make ... We need to upgrade existing libraries, written for str instances, ... (comp.lang.python) - Re: Ascii Encoding Error with UTF-8 encoder
... trying to write out using a UTF-8 encoder? ... Your fundamental problem is that you are trying to decode an 8-bit string ... You decode whatever from an 8-bit string into Unicode. ... (comp.lang.python) - Re: Defacto standard string library
... That doesn't alter the fact that C's str* ... knife and you may damage the screw and you may bust your knuckles. ... The C string functions are a butter knife if Unicode is the screw. ... However, if you're messing with Unicode, you should be using wide characters/strings and the wcs*functions are the screwdriver you're looking for. ... (comp.lang.c) - Re: unicode bit me
... "repr(object) ... Return a string ..." ... Return the Unicode string version of object using one of the following ... of str() except that it returns Unicode strings instead of 8-bit ... (comp.lang.python) |
|