Re: How to check variables for uniqueness ?
- From: "Chris Uppal" <chris.uppal@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 15 Jan 2007 18:48:30 -0000
John Ersatznom wrote:
That is how equalsIgnoreCase() works:
"beißen".equalsIgnoreCase("BEISSEN"): false
Well, then, either Wong is completely nuts, or we're using different JDK
versions (1.6 here),
You mean you've tried this and found that your version gives different results
? I find that hard to believe unless its a side effect of attemting to use
non-ASCII characters in the input to javac. Try being explicit about using the
Unicode character (well, UTF16 value).
public class Test
{
public static void
main(String[] args)
{
System.out.println("bei\u00DFen -> " +
"bei\u00DFen".toUpperCase());
System.out.println("BEISSEN".equalsIgnoreCase("bei\u00DFen"));
System.out.println("BEISSEN".equals("bei\u00DFen".toUpperCase()));
// or equivalently, but using octal string escapes
System.out.println("bei\337en -> " + "bei\337en".toUpperCase());
System.out.println("BEISSEN".equalsIgnoreCase("bei\337en"));
System.out.println("BEISSEN".equals("bei\337en".toUpperCase()));
}
}
(Tested on 1.4.2, 1.5.0, and 1.6.0)
or (seems least likely) toUpperCase actually alters
the spelling of some words(!) rather than just changing a-z to A-Z
(likewise accented equivalents) while leaving the rest alone.
That sounds as if you /haven't/ actually tried it. (Nor read the documentation
for String.toUpperCase() which expounds on this subject).
String.toUpperCase() does /not/ change the spelling of words (how could it, it
doesn't know anything about words ?). What it does follow are the correct
(insofar as the Unicode spec is correct) rules for mapping lowercase to
uppercase. It produces the /same/ word with the /same/ spelling[*], but
(naturally) a different representation. In this case the number of visually
separable glyphs changes because the U+00DF character (LATIN SMALL LETTER SHARP
S) is a ligature of two logical characters, long s and short s (U+017F and
U+0073), there is no upper case ligature for that combination (compare fi and
FI in English typography), so the correct uppercase version of those (logical)
characters is the sequence SS. (At least that's the theory the Uncicode people
seem to be operating on -- they know more about it than me so I'm willing to
believe them).
It is simply erroneous to expect String.toUpperCase() to map characters
one-to-one in the way that English case mapping works. I can't, it isn't
supposed to, and it doesn't...
String.equalsIgnoreCase(), on the other hand, is badly broken in that it does
/not/ follow those rules. Or, since it's behaviour is clearly documented,
perhaps "broken" is too strong a term -- "badly misleading" might be preferred.
-- chris
[*] Arguably the concept "same spelling" is flawed in the context of Unicode
case mapping.
.
- Follow-Ups:
- Re: How to check variables for uniqueness ?
- From: John Ersatznom
- Re: How to check variables for uniqueness ?
- References:
- Re: How to check variables for uniqueness ?
- From: John Ersatznom
- Re: How to check variables for uniqueness ?
- From: Oliver Wong
- Re: How to check variables for uniqueness ?
- From: John Ersatznom
- Re: How to check variables for uniqueness ?
- From: Oliver Wong
- Re: How to check variables for uniqueness ?
- From: John Ersatznom
- Re: How to check variables for uniqueness ?
- From: Lew
- Re: How to check variables for uniqueness ?
- From: John Ersatznom
- Re: How to check variables for uniqueness ?
- Prev by Date: Re: check if string is utf-8
- Next by Date: Re: Column numbers in stack trace - enhancement request
- Previous by thread: Re: How to check variables for uniqueness ?
- Next by thread: Re: How to check variables for uniqueness ?
- Index(es):
Relevant Pages
|