Re: Python usage numbers

On Sun, 12 Feb 2012 17:27:34 -0500, Roy Smith wrote:

In article <mailman.5739.1329084873.27778.python-list@xxxxxxxxxx>,
Chris Angelico <rosuav@xxxxxxxxx> wrote:

On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy <tjreedy@xxxxxxxx> wrote:
The situation before ascii is like where we ended up *before*
unicode. Unicode aims to replace all those byte encoding and
character sets with *one* byte encoding for *one* character set,
which will be a great simplification. It is the idea of ascii applied
on a global rather that local basis.

Unicode doesn't deal with byte encodings; UTF-8 is an encoding, but so
are UTF-16, UTF-32. and as many more as you could hope for. But broadly
yes, Unicode IS the solution.

I could hope for one and only one, but I know I'm just going to be
disapointed. The last project I worked on used UTF-8 in most places,
but also used some C and Java libraries which were only available for
UTF-16. So it was transcoding hell all over the place.

Um, surely the solution to that is to always call a simple wrapper
function to the UTF-16 code to handle the transcoding? What do the Design
Patterns people call it, a facade? No, an adapter. (I never remember the

Instead of calling which only outputs UTF-16, write a
wrapper myfoo() which calls foo, captures its output and transcribes to
UTF-8. You have to do that once (per function), but now it works from
everywhere, so long as you remember to always call myfoo instead of foo.

Hopefully, we will eventually reach the point where storage is so cheap
that nobody minds how inefficient UTF-32 is and we all just start using
that. Life will be a lot simpler then. No more transcoding, a string
will just as many bytes as it is characters, and everybody will be happy

I think you mean 4 times as many bytes as characters. Unless you have 32
bit bytes :)


Relevant Pages

  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... (When Unix filesystems can write UTF-16 as ... to use decomposed characters instead of composed characters (e.g., ... even compress repetitive text which no encoding can. ...
  • Re: Unicode string libraries
    ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
  • Re: unicode in ruby
    ... wchar_t on MacOS X and Windows is UTF-16. ... superior Unicode support than anything else) both use UTF-16 as the ... native filename encoding. ...
  • Re: Unicode string handling problem
    ... But the file I actually want to process is Unicode (utf-16 encoding). ... contains mixed Chinese and English characters. ...
  • Re: What is the encoding of this String?
    ... array using UTF-16 is *not* an identity transformation. ... There are two notions of encoding being used at the same time :-( ... former are ways of representing Unicode data as sequences of logical integers ... Java's strings are Unicode data represented in the encoding /form/ UTF-16 ...