Re: Best way to hash a set of integers?



Digital Puer wrote:
Ben Pfaff wrote:
"Digital Puer" <digital_puer@xxxxxxxxxxx> writes:

However, I'm still looking for info on how to best hash a *set*
of integers;

Can you
tell us anything about the sets?

I guess I'm looking for a way to compute what could be
called a multi-dimensional key. Suppose I have three integers
representing age, height (in cm), and weight (in lbs). I want
to somehow combine those together into a single key that will be
my index into a hash table (prior to doing mod hash_table_size).

That's not a set! You could have a person who is 95 years old
and 95 lbs. So, "95" can occur twice, so it's not a set. Another
reason it's not a set is that the order matters.

So, it's actually a list of integers. Or maybe a record of
integers would be a better way to describe it.

Just how to hash these depends on several things. How big will
your hash table be? What is the distribution of your numbers?
Is it a random or is there some pattern to it? In most of these
cases, there will be a pattern. Weight and height will be pretty
close to a normal distribution. Age will be different.

You do have one good thing going for you with these kinds of
values: most of the low-order bits will be very evenly
distributed. It's extremely unlikely that even ages will
greatly outnumber odd ages, or that even weights will greatly
outnumber odd ones, etc. So that means that at least the least
significant bit in each is close to evenly distributed.

With weight, the 5 least significant bits are likely to be
pretty evenly distributed, maybe even 5 or 6 bits. With height,
most people will be between 160 cm to 180 cm and evenly spread
across that interval, so you probably only have something like
the 4 least significant bits that are evenly distributed. With
age, things are a little trickier, but you should be able to
use the 3 or 4 least significant bits.

So, one possible hash function is this:

int hash (int ageyears, int heightcm, int weightlbs)
{
int h;

h = ageyears & (1 << 3 - 1);

h = (h << 4) | (heightcm & (1 << 4 - 1));

h = (h << 5) | (weightlbs & (1 << 5 - 1));

return h;
}

That gives you a 12-bit hash code that should be pretty evenly
distributed across the 4096 possible hash values. If you need
something better, you'll have to be more clever.

- Logan
.



Relevant Pages

  • Re: Cluster analysis for beginners
    ... weight from 1000 Daltons to 100000 Daltons. ... If you now find that a certain motif (e.g. a specific phosphorylation ... "Compared to a uniform distribution of proteins" or "Compared to a unimodal distribution of proteins with a peak of XXX Daltons..." ...
    (sci.stat.math)
  • Sampling Threshold for Distributions
    ... chart and those that are not? ... 100 tree branches, I have measured their length and their weight, both ... variability in my population then there is a decent chance that these ... if I include them in my distribution chart w/o including ...
    (sci.stat.math)
  • Re: ObSki: another run with flatboarding
    ... weight distribution. ... No ski control is needed if you have the balance. ... That's "part of weight ... you need to learn the language of skiing. ...
    (rec.skiing.alpine)
  • Re: Organic milk / hormone free milk
    ... minor difference in milk levels can be tweaked accordingly. ... weight is over the 95%ile no matter how old he is exactly. ... checks starting at 2 years of age but many are not yet doing so. ... when their plotted charts reflect the consistency and all milestones ...
    (misc.kids)
  • calling methods in a string
    ... //Prompt user to enter age ... String ageString = JOptionPane.showInputDialog(null, "Enter Age:", ... int age = Integer.parseInt; ... //Prompt user to enter weight ...
    (comp.lang.java.help)