Re: If you could add anything you want



Oliver Wong wrote:

I like the flexibility of adding new characters.

I presume you mean that you like the flexibility the ISO and Unicode consortium
have to add new characters, rather than you would like to be free to define
your own (if do mean the latter then there is always the private-use area to
play in).


If we define a size
on char, then we either have a finite number of character we can define,
or we have something like surrogate pairs (or triplets, or quadruplets,
etc.) where you don't have a 1 to 1 correspondence between the "concept
of a character", the that "char data type in Java".

Me, I prefer to be able to manipulate characters as integers. Which requires
(for sanity) knowing how wide the integer is. Unicode isn't going to add
characters which don't fit into UTF-16, so there's a definite limit to how wide
the integer needs to be. Even if they /did/ scrap UTF-16 (hardly likely when
it would break Windows, .NET, /and/ Java ;-) there is still a unimaginably huge
amount of space available in the 31-bits that ISO limits itself to. It would
need several thousand "alphabets" the size of the unified HAN stuff to exhaust
that (and where are those writing systems lurking ?).


Potential sources for new characters (in approximate order of
probability):

* More domain-specific characters. E.g. musical notation for
percussive instruments, symbols for obscure operators in math, physics,

/Plenty/ of space is already available for that.


etc. * Integrating more popular, though "fictional" character sets,
into Unicode e.g. Klingon.

Ugh! Bloody sci-fi soap opera. (I /do/ like SF, I just don't like Star
Treck -- in any of its manifestations). IMO, adding that kind of thing (say
Tolkein's scripts) to Unicode would be a pathetic abuse of power.

And there's plenty of space anyway.


* Invention of a new language like Esperanto.

But would any sane new language use a writing system like Chinese ? And, if it
did, why would anyone want to take it seriously enough to add it to Unicode.
Let's say I design a language which, by definition, uses /all/ the Unicode
glyphs, in pairs, to denote a fixed but large set of words. That /can't/ fit
into any possible Unicode-like scheme since it has been deliberately designed
to break any finite scheme. So why should the scheme be extended to support
it ?


* New discovery by archeologists of ancient writing systems.

Certainly possible, and I would even call it probable. But why should that
require more space than is already available ?


* Contact with alien civilization which use a different character set.

Since Unicode is designed around /human/ writing schemes, reflecting /human/
perceptual processes and /human/ cultural history(ies), I don't think it would
be legitimate (and almost certainly impossible) to use Unicode to represent
another species' communication systems. Far better to adopt /their/ version of
Unicode for representing their communications.

And anyway, I sort of doubt whether there are any alien civilisations -- the
universe is far to big.

-- chris


.



Relevant Pages

  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • Re: Ada, Gnat and Unicode
    ... > that same ISO. ... Unicode is that the BMP are ... Unicode have at least two potential representations. ... representations of characters in the BMP through composition. ...
    (comp.lang.ada)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogate_Al?= =?windows-1252?Q?pha
    ... characters of an exotic eastern language using an ASCII keyboard. ... It is true to say that any keyboard of any language can be simulated ... communicate in large volume with China or Japan using CJK from Unicode ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)

Loading