Re: How to check variables for uniqueness ?



Oliver Wong wrote:
I don't see "colour" (with a U) in there anywhere, Oliver.

You weren't intended to.

Then you're missing the point entirely.

Must be, because I was under the impression I was making a point to you, as opposed to the other way around. I thought you were curious as to how manually doing case-insensitive conversions could fail, as opposed to using the build in equalsIgnoreCase().

Both will fail when you want words spelled differently to compare equal,
though Collator may have more smarts in that area.

"COLOR" and "colour" differ only by capitalization while "beissen" and "beißen" differ by spelling in a manner similar to "color" vs. "colour".

I disagree.

On what basis? The typo I made? It was meant to say "COLOR" and "color"
differ only by capitalization while "beissen" and "beißen" differ by
spelling in a manner similar to "color" vs. "colour".

In fact the analogy goes so far as for the number of letters in the
latter two examples to differ by one in both cases, and for a two letter
region in one to correspond to a single letter at the same place in the
other in particular. And (presumably -- I don't know the German word(s))
they are in both cases variant spellings of a different word --
differing in more than just capitalization, but used interchangeably or
as regional variants rather than having distinct meanings.

You should take the code I posted and put it in your favorite IDE, fix the compile errors (apparently, it's toLowerCase, not toLowercase), and run it.

It would have been nice if Sun had been consistent about their own
capitalization. There's also Character.isWhitespace (in the same class!
Note lowercase s) and System.arraycopy (note lowercase c), at minimum.
:P Maybe they need to implement an isCamelCase method (note second
capital C)... :)

In any event, I suppose the real lesson here is that String (and
friends) get you primitive ordering and comparisons, perhaps somewhat
Anglocentric, and you need to use Collator and relatives for serious
language-and-locale-sensitive comparisons. I don't know the extent to
which even the latter will cope with variant spellings, mind you. There
is also a where-do-you-draw-the-line issue -- from case to slight
variations in the actual sequence of letters used on to more overt
differences, as between "huge" and "giant" -- when should those be
considered synonyms, and when different? -- and on until if you broaden
your requirements enough solving the NLP seems to be a required
component of any conforming implementation. :) Language has a fuzziness
in it in actual human usage that computers have trouble with. It's
curiously not unlike the problems that arose elsewhere here today with
float and double comparisons. You can't rely usefully on == for the most
part, and using Math.abs(x - y) < someThreshold gives an "equality" test
that's more meaninful in some ways but is not transitive any more.
Eventually linguistic equality loses transitivity too -- you can play
all kinds of games of picking close synonyms of the previous word to
grow a chain that can end in a fairly good approximation to an antonym
for your starting word, in most any language, using either phonemic
proximity or lexical proximity, and get different results with each besides.

The real upshot is simply "computers, at present, don't have the ability
to really model things in linguistics". But they know about abstract
sequences of discrete, wholly-distinct characters that happen to stand
for graphical squiggles meaningful to humans.

Play to their strengths -- the computers' *and* the humans'. :)
.



Relevant Pages

  • Re: In Office 2007 the single quotation mark changes the following
    ... However, for more information on language formatting, see ... "Graham Mayor" wrote: ... All other keys seem to work as designated. ... u i o a and c letters after the ' & " keys. ...
    (microsoft.public.word.docmanagement)
  • Re: Static vs Dynamic
    ... AI, a language, ... First pass recognizes the letters and identifies them as being of the ... but it has the concept of a Letter by now learned in the first lesson ... base inheritance of "Node" and return command objects. ...
    (comp.lang.lisp)
  • Re: Creating functions for Kabbalah
    ... >> that produce meaninfull relationships within the language.. ... write down the letters in ... > number names from 1 to 9 in English adds up to its value. ... If you look at mathematics as a language the same as english it seems ...
    (sci.math)
  • Re: Spelling rules in old languages
    ... > fixed values for the letters; the language was then fixed for literary ... > language were left out ... id est formulam rationemque scribendi a grammaticis ... litteras modo sed syllabas aut permutat aut praeterit, ...
    (sci.lang)
  • Re: PEP 3131: Supporting Non-ASCII Identifiers
    ... identifiers in Python. ... The diatribe about cross language understanding of Python code is IMHO ... Not providing an explicit listing of allowed characters is inexcusable ... categories uppercase letters, lowercase letters, titlecase ...
    (comp.lang.python)