# Re: Hash Function problem

*From*: websnarf@xxxxxxxxx*Date*: 1 Mar 2006 20:57:41 -0800

Hugo wrote:

I need to generate a hash key for (at max) 50 millions different entries (An

entry is a 200 characters length string). Since this key will be used as an

unique id, I want as few collisions as possible (ideally none).

For design purpose, I wish to keep my id on 8 bytes (64 bit) if possible.

I request your help on this 2 questions:

- Do you think a hash function with a 64 bit output will be enough to

guarantee almost no collision on 50 millions entries ?

If its a good hash function, it needs to output to at least about 52

bits. So 64 bits is fine.

- Which hash function do you suggest me to use ?

For 64 bits? I am not aware of a really good one taylored for 64 bits,

but obviously you can use SHA-2 (or even MD5 or SHA-1 if you don't

actually want security.) You might be able to modify other functions

like RC4.

I currently using MD5, truncating the output from 128 to 64 bit which is

quite a waste of CPU time.

160 to 64 you mean.

Until now I get no collision on tiny test corpus (500k entries), but I need

some advice to get further.

The Birthday paradox tells you that you need about twice as many bits

as the bits in the quantity of your entries in order to make the

expected number of collisions in the whole system to be 1. I.e., one

collision in a universe of 50 million entries is not bad.

Thanks in advance,

PS: (Feel free to point me out a more appropriate newsgroup, if any)

Even though its not actually crypto related, the folks in sci.crypt can

set you straight on this.

--

Paul Hsieh

http://www.pobox.com/~qed/

http://bstring.sf.net/

.

**Follow-Ups**:**Re: Hash Function problem***From:*websnarf

**References**:**Hash Function problem***From:*Hugo

- Prev by Date:
**Re: Best copying stable sort algorithm?** - Next by Date:
**Re: Best copying stable sort algorithm?** - Previous by thread:
**Re: Hash Function problem** - Next by thread:
**Re: Hash Function problem** - Index(es):