unicode by default
- From: harrismh777 <harrismh777@xxxxxxxxxxx>
- Date: Wed, 11 May 2011 16:37:49 -0500
I am puzzled by unicode generally, and within the context of python specifically. For one thing, what do we mean that unicode is used in python 3.x by default. (I know what default means, I mean, what changed?)
I think part of my problem is that I'm spoiled (American, ascii heritage) and have been either stuck in ascii knowingly, or UTF-8 without knowing (just because the code points lined up). I am confused by the implications for using 3.x, because I am reading that there are significant things to be aware of... what?
On my installation 2.6 sys.maxunicode comes up with 1114111, and my 2.7 and 3.2 installs come up with 65535 each. So, I am assuming that 2.6 was compiled with UCS-4 (UTF-32) option for 4 byte unicode(?) and that the default compile option for 2.7 & 3.2 (I didn't change anything) is set for UCS-2 (UTF-16) or 2 byte unicode(?). Do I understand this much correctly?
The books say that the .py sources are UTF-8 by default... and that 3.x is either UCS-2 or UCS-4. If I use the file handling capabilities of Python in 3.x (by default) what encoding will be used, and how will that affect the output?
If I do not specify any code points above ascii 0xFF does any of this matter anyway?