Re: is it true that hash-tables increase their capacity to the next prime number?



On Jan 30, 3:32 pm, Slobodan Blazeski <slobodan.blaze...@xxxxxxxxx>
wrote:
Thanks to everybody for their answers I will keep both versions, and
after I finish my implementation will bencmark with real data and
implementations.
One more question is it true that hash-tables increase their capacity
to the next prime number?

No. Some algorithms use power-of-two sizes.

The prime number size is associated with the open addressing technique
of handling collisions. With open addressing, the table does not
contain pointers to chains of entries. It contains the entries
themselves. When two or more keys hash to the same table entry, the
conflict is not resolved by starting a linked list, but by allocating
another entry at some displaced location within the table. The probing
search continues around the table until a free entry is found, or
nothing is found. Of course, the same search is used when finding an
existing key. When keys are deleted, they must be replaced by special
``tombstone'' markers; they can't be marked as empty entries, because
that would falsely terminate searches for valid keys that require
further probing.

If the table has a prime number P for its size, you can use an
increment other than 1 to search for empty slots. All integers in the
range 2..P-1 are relatively prime to P, since it is prime. This means
that by searching around the table by any of these increments (modulo
P) will guarantee that you will eventually visit all entries in the
table. For instance suppose the table has size 7 and you use an
increment of 3. You will hit all of the residues modulo 7: 0, 3, 6,
1, 4, 7, 2, 5. Suppose the table had size 8 and you used increment 3.
That would still work since 8 and 3 are relatively prime. But an
increment of 2 would only hit half the entries! If the hash table has
size 7, any increment (modulo 7) will work, as long as it's not
congruent to 0 (modulo 7). This is linear probing. There is also
quadratic probing, where a quadratic function is used to scan through
the entries. For instance hash(key) + i^2, for i = 0, 1, 2, ... A
prime table size allows quadratic probing to also visit all of the
nodes.

When collisions are resolved by chaining, using powers of two for the
table sizes has some overwhelming advantages. For instance, reducing a
key to modulo any given table size is just a bitwise AND with the
current bitmask. When a binary table has to grow, each chain's items
will split into exactly two chains, so relocating the items is easy. A
new bit is exposed in each of their keys, and it can be 0 or 1, which
determines where the item will move. You don't have to recompute the
hashing function for anything in the table. Just look at their keys
through a bit mask, and do some list operations.
.



Relevant Pages

  • Re: Purpose of "UserAssist" registry keys?
    ... Deleting the entries should certainly be alright. ... As with all MRU keys they ... The entries in those two keys seem to be encrypted using ... cipher, also known as a Caesar shift cipher or shift cipher, is one of the ...
    (microsoft.public.windowsxp.general)
  • Re: UP -- patent restricted?
    ... some poor unsuspecting schmuck. ... Maybe there are capabilities in SQL that I'm totally ignorant of. ... After doing that I'd like to "ride the B-tree index" to visit and check all the entries for this client. ... When you need so many keys, i.e. you have so many functions, so much functionality, you don't have a lot of freedom to conform with what others have done. ...
    (comp.databases.pick)
  • Re: I18N management tool?
    ... Entries are updated ... Use your version control system to monitor file changes. ... and the keys are left in there... ... Add a check-in trigger to check the ...
    (comp.lang.java.programmer)
  • Re: Gathering Workstation IP Address
    ... I began to look at the entries you specify prior to generating my code. ... using these keys would have complicated the task for me. ... >> 'Launch IPCONFIG ...
    (microsoft.public.scripting.vbscript)
  • Re: word shifts
    ... tuple of ord differences (modulo 26) of consecutive characters. ... Very nice solution, it uses the same strategy used to find anagrams, ... where keys are ...
    (comp.lang.python)