Re: is it true that hash-tables increase their capacity to the next prime number?
- From: Kaz Kylheku <kkylheku@xxxxxxxxx>
- Date: Wed, 30 Jan 2008 16:06:53 -0800 (PST)
On Jan 30, 3:32 pm, Slobodan Blazeski <slobodan.blaze...@xxxxxxxxx>
wrote:
Thanks to everybody for their answers I will keep both versions, and
after I finish my implementation will bencmark with real data and
implementations.
One more question is it true that hash-tables increase their capacity
to the next prime number?
No. Some algorithms use power-of-two sizes.
The prime number size is associated with the open addressing technique
of handling collisions. With open addressing, the table does not
contain pointers to chains of entries. It contains the entries
themselves. When two or more keys hash to the same table entry, the
conflict is not resolved by starting a linked list, but by allocating
another entry at some displaced location within the table. The probing
search continues around the table until a free entry is found, or
nothing is found. Of course, the same search is used when finding an
existing key. When keys are deleted, they must be replaced by special
``tombstone'' markers; they can't be marked as empty entries, because
that would falsely terminate searches for valid keys that require
further probing.
If the table has a prime number P for its size, you can use an
increment other than 1 to search for empty slots. All integers in the
range 2..P-1 are relatively prime to P, since it is prime. This means
that by searching around the table by any of these increments (modulo
P) will guarantee that you will eventually visit all entries in the
table. For instance suppose the table has size 7 and you use an
increment of 3. You will hit all of the residues modulo 7: 0, 3, 6,
1, 4, 7, 2, 5. Suppose the table had size 8 and you used increment 3.
That would still work since 8 and 3 are relatively prime. But an
increment of 2 would only hit half the entries! If the hash table has
size 7, any increment (modulo 7) will work, as long as it's not
congruent to 0 (modulo 7). This is linear probing. There is also
quadratic probing, where a quadratic function is used to scan through
the entries. For instance hash(key) + i^2, for i = 0, 1, 2, ... A
prime table size allows quadratic probing to also visit all of the
nodes.
When collisions are resolved by chaining, using powers of two for the
table sizes has some overwhelming advantages. For instance, reducing a
key to modulo any given table size is just a bitwise AND with the
current bitmask. When a binary table has to grow, each chain's items
will split into exactly two chains, so relocating the items is easy. A
new bit is exposed in each of their keys, and it can be 0 or 1, which
determines where the item will move. You don't have to recompute the
hashing function for anything in the table. Just look at their keys
through a bit mask, and do some list operations.
.
- Follow-Ups:
- Re: is it true that hash-tables increase their capacity to the next prime number?
- From: Harald Hanche-Olsen
- Re: is it true that hash-tables increase their capacity to the next prime number?
- References:
- adjustable array vs hash-table
- From: Slobodan Blazeski
- Re: adjustable array vs hash-table
- From: Xah Lee
- Re: adjustable array vs hash-table
- From: Slobodan Blazeski
- is it true that hash-tables increase their capacity to the next prime number?
- From: Slobodan Blazeski
- adjustable array vs hash-table
- Prev by Date: Re: adjustable array vs hash-table
- Next by Date: Re: When is it good to use #+implementation ?
- Previous by thread: is it true that hash-tables increase their capacity to the next prime number?
- Next by thread: Re: is it true that hash-tables increase their capacity to the next prime number?
- Index(es):
Relevant Pages
|
|