Re: How to check variables for uniqueness ?



Chris Uppal wrote:
It depends on what you mean. String.length() returns, correctly, the number of
Java "char"s in the String. No problem there. What /is/ a problem is that
that is not the same as the number of characters in the Unicode text. That's a
problem caused by the mis-specification of Java's chars to be 16-bit
quantities. It is highly unfortunate, but there is very little that can be
done about it now. It means that correct programming is more difficult than it

When UNICODE was first proposed it was expected that 16 bits would be enough. The designers of Java believed them. I'm not sure of the exact timing of Unicode's extension beyond 16 bits relative to Java's development. Even if Gosling et al had known that Unicode would grow beyond 16 bits, it might still have been correct to use 16 bits for Java characters. Even as it was there was a fair bit of muttering about the space used by these wide characters.

As for case mapping, it worth noting that The Windows NTFS file system uses a special case mapping which doesn't correspond to that of any known locale. I wonder how much software exists which compares file names using regular string comparison.

Mark Thornton
.



Relevant Pages

  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogate_Al?= =?windows-1252?Q?pha
    ... characters of an exotic eastern language using an ASCII keyboard. ... It is true to say that any keyboard of any language can be simulated ... communicate in large volume with China or Japan using CJK from Unicode ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)