unicode by default



hi folks,
I am puzzled by unicode generally, and within the context of python specifically. For one thing, what do we mean that unicode is used in python 3.x by default. (I know what default means, I mean, what changed?)

I think part of my problem is that I'm spoiled (American, ascii heritage) and have been either stuck in ascii knowingly, or UTF-8 without knowing (just because the code points lined up). I am confused by the implications for using 3.x, because I am reading that there are significant things to be aware of... what?

On my installation 2.6 sys.maxunicode comes up with 1114111, and my 2.7 and 3.2 installs come up with 65535 each. So, I am assuming that 2.6 was compiled with UCS-4 (UTF-32) option for 4 byte unicode(?) and that the default compile option for 2.7 & 3.2 (I didn't change anything) is set for UCS-2 (UTF-16) or 2 byte unicode(?). Do I understand this much correctly?

The books say that the .py sources are UTF-8 by default... and that 3.x is either UCS-2 or UCS-4. If I use the file handling capabilities of Python in 3.x (by default) what encoding will be used, and how will that affect the output?

If I do not specify any code points above ascii 0xFF does any of this matter anyway?



Thanks.

kind regards,
m harris

.



Relevant Pages

  • Re: chr(i) ASCII under Python 3
    ... Under python 2.6, chr"Return a string of one character whose ASCII ... Unicode codepoint is the integer i." ...
    (comp.lang.python)
  • Re: chr(i) ASCII under Python 3
    ... Under python 2.6, chr"Return a string of one character whose ASCII ... Unicode codepoint is the integer i." ...
    (comp.lang.python)
  • Re: chr(i) ASCII under Python 3
    ... Under python 2.6, chr"Return a string of one character whose ASCII ... Unicode codepoint is the integer i." ...
    (comp.lang.python)
  • Re: python3 urlopen(...).read() returns bytes
    ... contain text in an arbitrary encoding. ... to decode the byte data into unicode. ... 2.x makes no difference between text in ASCII and arbitrary bytes. ... Python 3.0 makes a hard break for ASCII people because with 3.0 really ...
    (comp.lang.python)
  • Re: unicode by default
    ... I know some people say that, but according to the definitions of the unicode ... The earlier UCS-2 *cannot* represent chars in the ... The later UTF-16, which Python uses, can. ...
    (comp.lang.python)