Re: collection framework: using the good interface
- From: Patricia Shanahan <pats@xxxxxxx>
- Date: Tue, 19 Jul 2005 12:49:02 GMT
SL wrote:
I need to be able to retreive very quickly an objet according to some signifiant properties: its orthographic form and its part of speech for instance. A hash seems suited, but I cannot use a hash where the key is the orthographic form, for instance:
{
"plane" -> ref_to_object_holding_this_word
}
because several properties are signifiant for word identity: there is a plane "noun" and a plane "verbe".
If you want to retrieve by several different properties, and do it efficiently for each of them, you need several different index structures.
In fact I would like to retreive objects using several properties always together: for instance "plane"+verb, not "plane" alone or "verb" alone. Two objects may be defined as "equal" if they share this properties.
If you don't care about efficiency for some or all of the properties, you can just iterate over a Set of all the objects asking each of them "are you a noun?".
Efficiency is an issue indeed: the situation is that I have two lexicons (two "collections" (Set or Map?) of Word objects); each Word is defined by lexicographic properties (form + POS) and hold other properties (number of occurrences in a sub-corpus for instance); for each object in one collection, I need to look at the "same" object in the other collection ("same" = same forme and same POS), compare the two frequency and compute a probability distribution function. An iteration over all the objects of the second collection for all the objects of the first collection + a check for equality without using hashcode are far too long I suppose.
I wouldn't suppose, I would start with the simplest implementation and check whether it is fast enough.
If necessary, consider creating a fixed order among your word objects. It can be fairly arbitrary, such as String sort order as primary key, followed by part of speech for two words with the same spelling. The word object's class would implement Comparable, and should have equals and hashCode consistent with compareTo, but I would not make two words equal unless they are, in all respects, the same word.
Comparing two ordered lists, with the same sort key, to find the intersection is linear time.
If you can't make this work reasonably efficiently with a fairly simple approach, I would consider putting the words in a database with a column for each property. Database managers are designed to do things like selects and joins on multiple attributes very efficiently.
That's why implementing my collections as Map would allow something like:
Word wordInCollection1 = collection1.get(wordInCollection2);
If I compute hashcode and equality using the form and POS properties. But this imply Map where both key and value hold the same pointer, which looks strange.
I would not override hashCode and equals for this, because there isn't a single, fixed definition of equality between two of your word-representing objects other than identity.
I was thinking that "lexicographic equality" could be an equality between the two objects, or should the definition of equality between two objects always involve all of the properties of the objects? In that case, is it a good practice to create a "customised equality method" (say: lexicographicalyEquals() and lexicographicHashcode()), and implement a Map (LexicographicMap) using this methods insteed of equals() ?)
In programming, honesty is usually the best policy. If you override equals, you are saying "These two distinct objects have the same value". If you use equals to mean something else, it will probably give you trouble later.
I would deal with your properties at the word object level by having set and get methods for each property, such as spelling. The set method is not needed if you always set all properties in the constructor. Things like part of speech seem to me to be attributes of a word.
For generalization of search code, it may also be useful to have a generalized get method, where a parameter indicates which property you are asking about.
Patricia
Many thanks for your insights!
Sylvain
.
- Follow-Ups:
- References:
- collection framework: using the good interface
- From: SL
- Re: collection framework: using the good interface
- From: Patricia Shanahan
- Re: collection framework: using the good interface
- From: SL
- collection framework: using the good interface
- Prev by Date: Re: collection framework: using the good interface
- Next by Date: Re: collection framework: using the good interface
- Previous by thread: Re: collection framework: using the good interface
- Next by thread: Re: collection framework: using the good interface
- Index(es):
Relevant Pages
|