Re: hash table idea good or no good?

From: Christian Gollwitzer (auriocus_at_btcips73x1.cip.uni-bayreuth.de)
Date: 11/19/03


Date: Wed, 19 Nov 2003 18:06:47 +0100


>>If your key-space is larger than the available cells
>
>
> If? It unsigned long is 32 bits and the primes I listed are much smaller
> than that then indeed the key-space is clearly much larger than the
> number of cells in each level.
OK, I'm a bit confused what your actual key is. I'll try to describe
three different uses of your function, for which I'll construct the
worst case storage.

Case a)
You want to use strings as the key. Your hash=hash_fn(key) returns some
hash (say hash_fn(key)=MD5(key)) which than serves as the primary key.
To compute the index for table[i], you use

table[i][hash % prime[i]]

The flaw here is, that different keys may hash to the same hash_fn(key).
If you take 17 strings that have the same hash=hash_fn(key), then you
will end up with the table full with 17 entries. That's what I ment with
  my first message

Case b)
You want to store strings, but you use the entire string as one big
number = hash. Then your function accesses the values as

table[i][key % prime[i]]

Because in this case the key is unlimited in size, it is easy to find a
series with the same property as above:
key1=prime1 --> hashes to 0 in table 1
key2=prime1*prime2 --> hashes to 0 in table1, than to 0 in table2
.
.
.
key[n]=product(prime[k], k=1..n)

If the keys are inserted in this order (Note: The performance of your
scheme depends on the insertion order - this asymmetry is one of its
flaws), then again you will have obsessed each cell 0 in each table
causing O(exp(n)) storage

Case c)
Your keys are really only 32 bit integers. In this case the series above
won't work, because if the products are truncated, this will corrupt the
least common multiple scheme used above. An alternate scheme I tried was
much better in storage (O(2^n)), but may be it can be improved to
something polynomial:
Store 0.
Store p1. This will alloc table2, since it is hashed to zero in table 1
store p2%p1 //goes to table1
store p2 // goes to table3, because p2%p1 in t1 and 0 in t2 are used
store p3%p2%p1 //goes to table 1
store p3%p2 // goes to table 2
store p3%p1 // goes to table 1
store p3 // to table 4 , because hashes to zero in t3

As can be derived easily, to alloc level n, I need 2^(n-1) values at
maximum. There is a quite good chance that I don't need them all. This
scheme can be improved with the idea from b) given that the product of
two primes always hashes to zero for these two primes. May be with
number theory magic one can find even worse cases.

So in all cases you gave bad storage performance guaranteed for one
particular insertion order of keys. (OK, to be honest, I could not proof
it in the last case, but I suspect one can get it with O(n^2)).

>>(that's the basics
>>of hashing), you will always have collisions (simple set theory). Maybe
>>you can avoid this one extreme case, but I think it is basically no good
>>idea.
>
>
> Well your right; by itself this is not a good idea. I think it the worst
> case can be avoided however if it is combined with another technique. For
> example you could just maintain a list to catch all collisions until
> the list reaches some fraction of the size of the current level at
> which point you allocate the next level and re-insert all items in the
> collision list. That way you remove the chance for pathologically poor
> memory utilization but when you resize you only reinsert a fraction
> (e.g. 1/8th) of the number of items in the current level.

If you want to write optimize the hash table, *and* combine with another
technique, it may be successful. However, for read optimization it is no
good idea because you have approx. O(log n) steps for reading if the
table is full. (Considering your primes are approx. c^n, which is sensible)

-- 
Vale !
		Christianus Auriocus


Relevant Pages

  • RE: HeapFree() Failure
    ... HeapAllocand HeapFreecalls with GlobalAlloc() and GlobalFree ... globally defined strings it declares like so: ... Sixteen bytes are successfully allocated to store pointers to ...
    (microsoft.public.windowsce.embedded)
  • Re: RGB LEDs
    ... or ask for 'pale goldenrod' and the LEDs are set that way. ... those strings... ... You don't need to store the entire table as raw ASCII. ...
    (sci.electronics.design)
  • HeapFree() Failure
    ... globally defined strings it declares like so: ... Sixteen bytes are successfully allocated to store pointers to ... All this stuff it does in WndProc_OnCreate(lpWEA wea), ...
    (microsoft.public.windowsce.embedded)
  • Re: QC Report - FastMM4 is Slow!!
    ... In it I store around 5 string pointers, a stringlist with loth of strings and poiners to objects containing data about events in a persons life, like birth etc. and some other component. ... Some silent windows are also sending continually messages, I don't know why (Must recompolie again to see if D5 version do the same, might also be part of the slowdown so I need to rewrite the initial part, not creating any other windows and the internal FF2 database engine than the mainmenu, creating the other when needed, both to save space used in RAM but also to decrease messages sent to the listboxes etc. (Could that be another bug??)) ...
    (borland.public.delphi.language.basm)
  • Re: Is the following little function UNICODE-safe? ...
    ... static DWORD dwDefID; ... convert narrow strings to wide strings and vice versa using functions like ... extensions and assumes it can store three characters in a DWORD, ... How long do you plan to store this ...
    (microsoft.public.vc.mfc)