Re: How to check variables for uniqueness ?



Chris Uppal wrote:
It depends on what you mean. String.length() returns, correctly, the number of
Java "char"s in the String. No problem there. What /is/ a problem is that
that is not the same as the number of characters in the Unicode text. That's a
problem caused by the mis-specification of Java's chars to be 16-bit
quantities. It is highly unfortunate, but there is very little that can be
done about it now. It means that correct programming is more difficult than it
looks, and also more difficult than it /should/ be. There is nothing in the
problem space that makes this difficult (well, actually there is, but we'll
pretend there isn't for now[*]), it's not an /inherently/ complex problem, but
historical mistakes in Java's design mean that the API mostly works in terms of
UTF-16 encoding (sequences of 16-bit values) rather than in terms of real
Unicode characters.

Returning to the original context of this discussion, I repeat my assertion that it also /should/ be simple to use strings as map keys, and to do so case-insensitively if you so desire. AFAICT, in fact, using toUpperCase might be the way to do it, avoiding equalsIgnoreCase and toLowerCase. The weird German word with the untypable character becomes "BEISSEN" and so presumably does "beissen", so all variations on that become one key value. I guess x.toUpperCase().equals(y.toUpperCase()) has to be used as the "real" equalsIgnoreCase() then. And to get a canonical lower case form, x.toUpperCase().toLowerCase(), which will turn any spelling of that same word into "beissen". That becomes the "real" toLowerCase() then.

:P
.



Relevant Pages

  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogate_Al?= =?windows-1252?Q?pha
    ... characters of an exotic eastern language using an ASCII keyboard. ... It is true to say that any keyboard of any language can be simulated ... communicate in large volume with China or Japan using CJK from Unicode ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)