Re: maintenance of hash tables.
- From: "cr88192" <cr88192@xxxxxxxxxxx>
- Date: Fri, 3 Jul 2009 08:18:41 -0700
"Pascal J. Bourguignon" <pjb@xxxxxxxxxxxxxxxxx> wrote in message
news:87ws6qmcw1.fsf@xxxxxxxxxxxxxxxx
"cr88192" <cr88192@xxxxxxxxxxx> writes:
"Andrew Tomazos" <andrew@xxxxxxxxxxx> wrote in message
news:e2a2fff7-b993-4ddc-a955-1336961b10ed@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Jun 30, 5:10 pm, "cr88192" <cr88...@xxxxxxxxxxx> wrote:
in practice, although divides or modulo operators are a bit more
expensive
than masks, typically they are not a significant portion of the overall
performance, so the cost is often reasonable.
<
If we use x86 as an example than a binary and (AND) is approximately
50-100 times faster than a modulus (IDIV). [*]
If the keys are easily hashed (such as numeric types) than a modulus
will be a significant cost associated with the hash function.
-Andrew.
[*] See Intel 64 and IA-32 Architectures Optimization Reference
Manual, Appendix C
ok...
well, this operation is weighted against:
hashing the string (if a string is used);
various other costs (checking if the spot is free, calculating the next
permutation, ...);
...
so, mod is slow, but in many cases, this can be glossed over some (which
is
why I said 'typically'...).
in some cases, there are "faster" ways to approximate division and
modulo,
but I am not sure of any that are particularly appropriate in this case.
I
may have to check compiler output, and see if it finds some way to
optimize
away the idiv...
but, yes, if speed is critical (and/or one is using a highly optimized
hash
table), then a power-of-2 size is better...
In lisp, the default for hash-table is to use the identity of the key
[:test (function eql)]. In this situation, there's no lengthy hash to
compute, just the modulo on the address (or object in case of fixnums
and characters). The way they're specified, it is intended that CL
hash-table do not use collision lists, but have always more slots in a
vector than entries in the hash, so collisions are handled by storing
the colliding entries in adjacent slots. This means that in the worse
case, there's a small number of additionnal EQL comparisons made. As
soon as the hash-table becomes too filled, the vector is extended to
render it sparse again.
<snip>
I am not sure how this is directly related, or what exactly is trying to be
argued here...
but, I guess it can be noted that LISP is its own set of issues...
but, I guess the core issue is that, yes, mod (and div) are a bit slower
than shifts or masks, but in many cases, it is not noticable (the weight of
everything else often compensates enough that there is not as much worry
over the number of clock cycles taken for particular instructions...).
of course, a mod is often slower than a conditional, which says something...
however, given a typical hash operation may involve several conditionals in
a row (is spot empty? is spot the wanted value? is the loop finished? ...),
the mod is no longer the big killer (especially if the value check is a
'strcmp'...).
but, alas, if we are looking up based on a pointer-address or integer (no
strcmp, ...), this mod may be significant, and it may well be worthwhile to
use a power-of 2 table...
of course, as a matter of practice I tend to keep all fixed-size tables as
powers of 2, as after all, it is not useful to waste time unnecessarily...
or such...
.
- References:
- Re: maintenance of hash tables.
- From: Andrew Tomazos
- Re: maintenance of hash tables.
- From: cr88192
- Re: maintenance of hash tables.
- From: Pascal J. Bourguignon
- Re: maintenance of hash tables.
- Prev by Date: Hotel Salcia Danube Delta
- Next by Date: Re: maintenance of hash tables.
- Previous by thread: Re: maintenance of hash tables.
- Next by thread: Re: maintenance of hash tables.
- Index(es):
Relevant Pages
|