Re: str() should convert ANY object to a string without EXCEPTIONS !



On Sep 28, 12:37 pm, est <electronix...@xxxxxxxxx> wrote:
From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.

now we try this under windows:

str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.

And it is correct to fail, ASCII is only defined within range(128),
the rest (i.e. range(128, 256)) is not defined in ASCII. The
range(128, 256) are extension slots, with many conflicting meanings.


also almighty Linux

Python 2.3.4 (#1, Feb  6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.4.4 (#2, Apr  5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.5 (release25-maint, Jul 20 2008, 20:47:25)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

If that str() function has returned anything but error on this, I'd
file a bug report.

The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!

string is a byte array, but unicode and ASCII is NOT. Unicode string
is a character array defined up to range(65535). Each character in
unicode may be one or two bytes long. ASCII string is a character
array defined up to range(127). Other than Unicode (actually utf-8,
utf-16, and utf-32) and ASCII, there are many other encodings (ECBDIC,
iso-8859-1', ..., 'iso-8859-16', 'KOI8', 'GB18030', 'Shift-JIS', etc,
etc, etc) each with conflicting byte to characters mappings.
Fortunately, most of these encodings do share a common ground: ASCII.

Actually, when a strictly stupid str() receives a Unicode string (i.e.
character array), it should return a <unicode s at
0x423549af813e4954>, but it doesn't, str() is smarter than that, it
tries to convert whatever fits into ASCII, i.e. characters lower than
128. Why ASCII? Because character from range(128, 256) varies widely
and it doesn't know which encoding you want to use, so if you don't
tell me what encoding to use it'd not guess (Python Zen: In the face
of ambiguity, refuse the temptation to guess).

If you're trying to convert a character array (Unicode) into a byte
string, it's done by specifying which codec you want to use. str()
tries to convert your character array (Unicode) to byte string using
ASCII codec. s.encode(codec) would convert a given character array
into byte string using codec.

http://bugs.python.org/issue3648

One possible solution(Windows Only)

str(u'\ue863'.encode('mbcs'))
'\xfe\x9f'

actually str() is not needed, you need only: u'\ue863'.encode('mbcs')

print u'\ue863'.encode('mbcs')


I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!

Despair not, there is a quick hack:
# but only use it as temporary solution, FIX YOUR CODE PROPERLY
str_ = str
str = lambda s = '': s.encode('mbcs') if isinstance(s, basestring)
else str_(s)

Please fix this issue.

http://bugs.python.org/issue3648

Please.

.



Relevant Pages

  • Re: RfD: Escaped Strings
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... \b BS (backspace, ASCII 8) ... \ ** escapes to characters much as C does. ...
    (comp.lang.forth)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • RfD: Escaped Strings
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... \b BS (backspace, ASCII 8) ... chars + swap move ...
    (comp.lang.forth)
  • RfD - Escaped Strings (long)
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... \b BS (backspace, ASCII 8) ... \ ** escapes to characters much as C does. ...
    (comp.lang.forth)