Re: hashCode() for Custom classes
- From: Logan Shaw <lshaw-usenet@xxxxxxxxxxxxx>
- Date: Sat, 19 Apr 2008 17:31:05 -0500
Mark Space wrote:
Logan Shaw wrote:
One thing nobody else has mentioned so far (at least I think not -- this
thread is long and I may have missed it) is that when combining, sometimes
it may be best to leave out the hash code of some of the members.
This could be particularly true if, say, instances of D tend to
contain a really long string.
I'm pretty sure this is false. Early versions of Java's hashcode() for Strings didn't use the whole string because of performance concerns. It turns out that this leads to lots of collisions and poor hashing when the parts you chose to leave out are the parts that actually differ.
So the learning was to eat the performance and hash the whole object.
Well yeah, it's very true that if you don't have any knowledge about
what parts may be expected to remain the same across a lot of objects
(or what parts might together be expected to be different), then you
shouldn't leave anything out. But if you do have that knowledge, it
could be worth it. Of course, you should be judicious about it.
One example is this: let's suppose I'm writing a TCP server and I
get a lot of connections from various clients. For a connection
object, I might have username, password, source IP address, and
source port number as fields. In this case, it might be totally
sufficient to consider only the IP address and port number when
implementing hashCode() for the connection object. They should be
pretty close to unique by themselves[1].
You can almost think of this in terms of database design and
normalization if you want to: could some subset of your fields
together be reasonably used as a primary key? Then those fields
by themselves *could* be sufficient for hashCode().
Another example: suppose I have a PostalAddress object whose
members are name, streetAddress, city, state, and zipCode. If
I have name, streetAddress, and zipCode (and if they're mandatory
so that I always have values for them), there is probably not any
benefit from including city and state in hashCode(), because what
are the chances that for any two objects, name, streetAddress, and
zipCode will be the same but city or state will be different?
Of course you can make the argument that I'm assuming my objects
model real-world things and relying on those real-world semantics;
I could have in some context, you might object, a whole bunch of
PostalAddress objects that are intended to model bogus addresses,
and then I might have bad performance. Maybe I'm making a class
called BadAddressCollector to contain and report on all the unique
bad addresses in a database. But I would argue that in such a
case, the number of collisions is still probably not that high,
and when it comes to performance, you should place emphasis on
the common case.
On the other hand, there's nothing wrong with being conservative
and avoiding assumptions even if they are only very slightly
questionable. But the subject is how best to implement hashCode(),
and it seems like this type of implementation could be appropriate,
sometimes.
- Logan
[1] Yes, it's true that the only thing that absolutely has to be
unique is the combination of source IP, source port, dest IP,
and dest port (and time), but in practice AFAIK TCP
implementations tend to just pick a new ephemeral port number
for every connection they initiate. Even if they don't, then
I have to be listening on more than one port or IP address --
and the TCP client has to have more than one connection
*to me* -- for there to be a collision.
.
- Follow-Ups:
- Re: hashCode() for Custom classes
- From: Mark Space
- Re: hashCode() for Custom classes
- From: Lew
- Re: hashCode() for Custom classes
- References:
- hashCode() for Custom classes
- From: sasuke
- Re: hashCode() for Custom classes
- From: Logan Shaw
- Re: hashCode() for Custom classes
- From: Mark Space
- hashCode() for Custom classes
- Prev by Date: Re: enumerate the consumers of foo.toString() within an application
- Next by Date: Re: Memory leak
- Previous by thread: Re: hashCode() for Custom classes
- Next by thread: Re: hashCode() for Custom classes
- Index(es):
Relevant Pages
|