Re: Suggestions for double-hashing scheme



CBFalconer <cbfalconer@xxxxxxxxx> writes:

> Tim Rentsch wrote:
> >
> ... snip ...
> >
> > Incidentally, the example table above has full 64-bit values for
> > each entry, but a large table need not take 64 bits for each prime
> > in the table. Storing the entries as deltas off of a computable
> > "base" value for each entry reduces the amount of storage needed,
> > and means the table source code can be written without having to
> > assume the existence of 64 bit data types. All the deltas for the
> > more complete table mentioned (with 16 base values per power of
> > two) are less than 256; the entire table of approximately 16 * 64
> > * 2 primes can be stored in about 2K bytes.
>
> Which is exactly what is done in hashlib.

Yes, I saw that hashlib did this; my apologies, I should have
given an attribution.


> Making all the deltas
> negative from a power of two is helpful, because malloc systems
> tend to allocate space + some overhead, and that is most efficient
> if it all fits into a power of two.

I think it's more likely that the interaction with malloc() is
minimal, since other calls to malloc() will make the memory not be
page aligned anyway. On the other hand, choosing a power of two plus
some small delta means that choosing a rehash value can be done by

rehash = (hash & largest_ones_mask_less_than_table_size) + 1;

which can be done very quickly and provides the vast majority of the
set of rehash values available.
.



Relevant Pages

  • Re: Suggestions for double-hashing scheme
    ... > each entry, but a large table need not take 64 bits for each prime ... Storing the entries as deltas off of a computable ... negative from a power of two is helpful, because malloc systems ...
    (comp.programming)
  • SWL 1.01 released
    ... I've released a new version of the SWL libraries for Power C. ... adds a more efficient replacement for Power C's malloc() and free ...
    (comp.sys.cbm)