Re: Suggestions for double-hashing scheme
- From: Tim Rentsch <txr@xxxxxxxxxxxxxxxxxxx>
- Date: 08 Jun 2005 03:07:03 -0700
CBFalconer <cbfalconer@xxxxxxxxx> writes:
> Tim Rentsch wrote:
> ... snip ...
> > Incidentally, the example table above has full 64-bit values for
> > each entry, but a large table need not take 64 bits for each prime
> > in the table. Storing the entries as deltas off of a computable
> > "base" value for each entry reduces the amount of storage needed,
> > and means the table source code can be written without having to
> > assume the existence of 64 bit data types. All the deltas for the
> > more complete table mentioned (with 16 base values per power of
> > two) are less than 256; the entire table of approximately 16 * 64
> > * 2 primes can be stored in about 2K bytes.
> Which is exactly what is done in hashlib.
Yes, I saw that hashlib did this; my apologies, I should have
given an attribution.
> Making all the deltas
> negative from a power of two is helpful, because malloc systems
> tend to allocate space + some overhead, and that is most efficient
> if it all fits into a power of two.
I think it's more likely that the interaction with malloc() is
minimal, since other calls to malloc() will make the memory not be
page aligned anyway. On the other hand, choosing a power of two plus
some small delta means that choosing a rehash value can be done by
rehash = (hash & largest_ones_mask_less_than_table_size) + 1;
which can be done very quickly and provides the vast majority of the
set of rehash values available.