Re: str() should convert ANY object to a string without EXCEPTIONS !



On Sep 28, 4:38 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
On Sat, 27 Sep 2008 22:37:09 -0700, est wrote:
str(u'\ue863')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.

What result did you expect?

[...]

The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!

To quote Terry Pratchett:

    "What sort of person," said Salzella patiently, "sits down and
    *writes* a maniacal laugh? And all those exclamation marks, you
    notice? Five? A sure sign of someone who wears his underpants
    on his head." -- (Terry Pratchett, Maskerade)

In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:

ord(u'\ue863')

59491

You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:

u'\ue863'.encode('mbcs')  # Windows only

But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:

u'\ue863'.encode('utf-8')

--
Steven

OK, I am tired of arguing these things since python 3.0 fixed it
somehow.

Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?
.



Relevant Pages

  • Re: C# and encodings
    ... different encoding than Unicode does (Unicode set uses three ... Any character encoding that is not Unicode by definition uses a different encoding than Unicode does. ... The point is that the Unicode "character" 0xfeff is not representable in any ANSI code page, and is treated specially by stripping it from input rather than replacing it with the "default character". ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: C# and encodings
    ... But if windows has numerous code pages, ... encoding, and thus have only 255 code points matched to characters? ... Unicode can't be represented in only 8-bits, ... But Notepad supports Unicode and yet it only recognizes 255 character, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: string to ascii on line feed
    ... first published ASCII as a standard in 1963. ... refer to multiple things, one of which might be "The encoding Java uses when we ask for the 'ASCII' encoding." ... Conceptually, we have a string in memory, and we wish to store that string to disk, using a specific encoding. ... Now when we say "Encoding FOO is n bits", what we usually mean is either "the encoding uses n bits per character to represent a given string" or the less restrictive "*on average*, the encoding uses n bits per character to represent a given string". ...
    (comp.lang.java.programmer)
  • C# and encodings
    ... Can code page support Unicode coded character set, ... Are there also 8-bit code pages which use Unicode character ... encoding, and thus have only 255 code points matched to characters? ... mark written in UTF-8. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: C# and encodings
    ... different encoding than Unicode does ... encoded into a binary stream using an encoding that either supports the ... So if code page supports only a subset of Unicode character set… ... characters as those in Unicode coded character set, ...
    (microsoft.public.dotnet.languages.csharp)