Re: collection framework: using the good interface



I need to be able to retreive very quickly an objet according to some signifiant properties: its orthographic form and its part of speech for instance. A hash seems suited, but I cannot use a hash where the key is the orthographic form, for instance:
{
"plane" -> ref_to_object_holding_this_word
}
because several properties are signifiant for word identity: there is a plane "noun" and a plane "verbe".

If you want to retrieve by several different properties, and do it efficiently for each of them, you need several different index structures.

In fact I would like to retreive objects using several properties always together: for instance "plane"+verb, not "plane" alone or "verb" alone. Two objects may be defined as "equal" if they share this properties.


If you don't care about efficiency for some or all of the properties, you can just iterate over a Set of all the objects asking each of them "are you a noun?".

Efficiency is an issue indeed: the situation is that I have two lexicons (two "collections" (Set or Map?) of Word objects); each Word is defined by lexicographic properties (form + POS) and hold other properties (number of occurrences in a sub-corpus for instance); for each object in one collection, I need to look at the "same" object in the other collection ("same" = same forme and same POS), compare the two frequency and compute a probability distribution function. An iteration over all the objects of the second collection for all the objects of the first collection + a check for equality without using hashcode are far too long I suppose. That's why implementing my collections as Map would allow something like:


Word wordInCollection1 = collection1.get(wordInCollection2);

If I compute hashcode and equality using the form and POS properties. But this imply Map where both key and value hold the same pointer, which looks strange.

I would not override hashCode and equals for this, because there isn't a single, fixed definition of equality between two of your word-representing objects other than identity.

I was thinking that "lexicographic equality" could be an equality between the two objects, or should the definition of equality between two objects always involve all of the properties of the objects? In that case, is it a good practice to create a "customised equality method" (say: lexicographicalyEquals() and lexicographicHashcode()), and implement a Map (LexicographicMap) using this methods insteed of equals() ?)


Patricia

Many thanks for your insights!

Sylvain
.



Relevant Pages

  • Re: collection framework: using the good interface
    ... signifiant properties: its orthographic form and its part of speech for instance. ... A hash seems suited, but I cannot use a hash where the key is the orthographic form, for instance: ... An iteration over all the objects of the second collection for all the objects of the first collection + a check for equality without using hashcode are far too long I suppose. ...
    (comp.lang.java.help)
  • Re: collection framework: using the good interface
    ... SL wrote: some signifiant properties: its orthographic form and its part of speech for instance. ... word object's class would implement Comparable, and should have equals ... If I compute hashcode and equality using the form and POS properties. ...
    (comp.lang.java.help)