Re: hashCode() for Custom classes



Mark Space wrote:
Logan Shaw wrote:

One thing nobody else has mentioned so far (at least I think not -- this
thread is long and I may have missed it) is that when combining, sometimes
it may be best to leave out the hash code of some of the members.

This could be particularly true if, say, instances of D tend to
contain a really long string.

I'm pretty sure this is false. Early versions of Java's hashcode() for Strings didn't use the whole string because of performance concerns. It turns out that this leads to lots of collisions and poor hashing when the parts you chose to leave out are the parts that actually differ.

So the learning was to eat the performance and hash the whole object.

Well yeah, it's very true that if you don't have any knowledge about
what parts may be expected to remain the same across a lot of objects
(or what parts might together be expected to be different), then you
shouldn't leave anything out. But if you do have that knowledge, it
could be worth it. Of course, you should be judicious about it.

One example is this: let's suppose I'm writing a TCP server and I
get a lot of connections from various clients. For a connection
object, I might have username, password, source IP address, and
source port number as fields. In this case, it might be totally
sufficient to consider only the IP address and port number when
implementing hashCode() for the connection object. They should be
pretty close to unique by themselves[1].

You can almost think of this in terms of database design and
normalization if you want to: could some subset of your fields
together be reasonably used as a primary key? Then those fields
by themselves *could* be sufficient for hashCode().

Another example: suppose I have a PostalAddress object whose
members are name, streetAddress, city, state, and zipCode. If
I have name, streetAddress, and zipCode (and if they're mandatory
so that I always have values for them), there is probably not any
benefit from including city and state in hashCode(), because what
are the chances that for any two objects, name, streetAddress, and
zipCode will be the same but city or state will be different?

Of course you can make the argument that I'm assuming my objects
model real-world things and relying on those real-world semantics;
I could have in some context, you might object, a whole bunch of
PostalAddress objects that are intended to model bogus addresses,
and then I might have bad performance. Maybe I'm making a class
called BadAddressCollector to contain and report on all the unique
bad addresses in a database. But I would argue that in such a
case, the number of collisions is still probably not that high,
and when it comes to performance, you should place emphasis on
the common case.

On the other hand, there's nothing wrong with being conservative
and avoiding assumptions even if they are only very slightly
questionable. But the subject is how best to implement hashCode(),
and it seems like this type of implementation could be appropriate,
sometimes.

- Logan

[1] Yes, it's true that the only thing that absolutely has to be
unique is the combination of source IP, source port, dest IP,
and dest port (and time), but in practice AFAIK TCP
implementations tend to just pick a new ephemeral port number
for every connection they initiate. Even if they don't, then
I have to be listening on more than one port or IP address --
and the TCP client has to have more than one connection
*to me* -- for there to be a collision.
.



Relevant Pages

  • Re: Looking for Firewall Port Info.
    ... >make the connection firewall friendly, and use the least amount of ports ... >the connection from, will binding and initiating a connection to a remote IP ... >source port is dynamic and not predictable? ... In most cases, the source port doesn't matter, only the destination ...
    (comp.security.firewalls)
  • RE: Is it possible to derease gradually the number of Client port (add up time table) ?
    ... client were to use the same source port every time, ... this - lots of outbound connections from the same source port. ... With a TCP connection, remote address, remote port and local port ... Security Linux, the comprehensive security solution that combines six ...
    (Incidents)
  • Re: more than 65535 outbound connections
    ... what must be unique is the four-parts of a connection ... source IP, source port, destination IP, destination port ... re-use the same source port for multiple connections, ...
    (Linux-Kernel)
  • Re: What is the NUTS Daemon?
    ... >> destination port, unless the source is under 1024. ... >> request being made to the HTTP port on your machine from a port I would ... If the source port was fixed then you'd be restricted to 1 connection as ... that connection would stop any other connections using the same source port. ...
    (comp.security.firewalls)
  • Re: Correction
    ... Normally to physically disconnect is just a matter of reaching for the ... >> I have an ADSL connection which polls my computer from time to time, ... > disallow each and every port with Windows Firewall? ...
    (microsoft.public.windowsxp.messenger)