Re: Suggestions for double-hashing scheme
- From: CBFalconer <cbfalconer@xxxxxxxxx>
- Date: Wed, 25 May 2005 13:15:45 GMT
websnarf@xxxxxxxxx wrote:
>
.... snip ...
>
> Another important note is about the size of the hash table itself.
> A common mistake in the literature and elsewhere is to pick the
> table size as a prime number. They do this in order to relax the
> constraints on h1() in order to guarantee that it traverses the
> table with a mere != 0 criteria. Hopefully my explanation above
> shows how naive and pedestrian such reasoning is. Worse yet is
> that the -> index = hashfunction(entry) % tablesize; line all of
> a sudden has a new bottleneck, namely the "%" function. As you
> well know, "%" (or equivalently division) by a non-constant
> number is basically one of the slowest operations you can do on
> any CPU outside of transcendentals or IO.
>
> The more sensible approach, of course, is to pick a table size
> that is a power of 2. Then you have index = hashfunction(entry)
> & (tablesize-1); which removes the "%" bottleneck ("&", as you
> know, basically costs nothing.) The earlier analysis of h1()
> above makes it clear that traversing the table will be a non-
> issue, if you simply choose them from a small table of primes.
Your argument against the use of primes for table size revolves
purely around the expense of a modulo operation, as I read it. The
substitutes require some sort of table search, possibly the small
primes table. Even if that is limited to the first 20 primes, say,
I maintain that that effort is going to be at least comparable, if
not greater, than the modulo calculation. Nothing except profiling
on an actual installation can answer that.
A further point is whether or not such modulo effort is significant
in the overall probe operation, which can only be answered after
the significance of a probe effort in the overall application is
evaluated.
All of these considerations probably pale in comparison with the
effect of cache hits on modern machinery. Again, profiling is the
only possible answer, and not very likely to give a definative
result. In practice the effectiveness of the cache will be highly
affected by the job mix on a particular machine.
Meanwhile the use of a prime table size has the advantage of
simplicity and robustness.
--
Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
.
- Follow-Ups:
- Re: Suggestions for double-hashing scheme
- From: websnarf
- Re: Suggestions for double-hashing scheme
- References:
- Suggestions for double-hashing scheme
- From: Clint Olsen
- Re: Suggestions for double-hashing scheme
- From: websnarf
- Suggestions for double-hashing scheme
- Prev by Date: Re: How to implement the arithmetic operations(addition, substraction, multiplication, division) in the fail safe ssystem
- Next by Date: Re: FNV(Fowler/Novell/Vo) hashing algorithm
- Previous by thread: Re: Suggestions for double-hashing scheme
- Next by thread: Re: Suggestions for double-hashing scheme
- Index(es):
Relevant Pages
|
|