Re: About Hsieh's hash: initial value? 64 bit?
- From: websnarf@xxxxxxxxx
- Date: 12 Apr 2006 17:57:35 -0700
schoedl@xxxxxxxxx wrote:
are there any news out there about which value to use as initial one
when implementing Paul Hsieh's hash for incremental updates?
You can initialize it with *any* value, however a non-zero one usually
has slightly better start up properties (if you start with 0, and you
prefix the data with any number of 4-byte blocks of 0, then you will
"0-sink" and will collide on any common tails after any number of such
4 byte blocks of 0s). If you know the length in bytes of the whole
thing a priori, then using the length itself is a good choice because
the 0-sinking scenarios will decrease with different lengthed objects.
[...] Also, has anyone come up with a 64-bit version?
If anyone has, let me know. :) One thing you can do to create a
64-bit hash function is to use two different 32-bit hash functions and
simply concatenate the two functions. Bob Jenkins' function and mine
are certainly independent, so you could do that.
Unfortunately, my function very densely uses all the functional units
of an Athlon CPU, and therefore trying to loop jam my function with,
say, Bob Jenkins' one-at-a-time function will not actually lead to
improved performance versus just running one after the other -- however
I have not actually tested this (I could be wrong.)
[...] I want to use the hash as a
checksum (rather than a hash-table index) and want to avoid collisions
as much as I can.
Well if you just want to use it as a checksum, it is not optimal. In
particular look at the final "tempering code". It is totally
irrelevant to the purposes of being a checksum -- it further mixes a
static 32 bit value, to increase the avalanching, but not increasing
the input data content. This is clearly not needed for a checksum.
OTOH, experimentally, I have *seen* that my hash function has
checksum-like properties on ordinary text (I have not found a pair of
english words in ASCII which will hash to the same value with my
function).
CRCs of course, have a much longer history as checksums. My function
was only "discovered" two years ago. The 0-sinking issue in my
function is kind of obvious. On the other hand we *know* that CRCs
have an analogue to the 0-sink issue; its just harder to find. The
main advantage of using my function, of course, is that its much faster
(so is Bob Jenkins' function, BTW.) So you can try to use it
experimentally to see how it goes.
If you expect to be actively attacked on the checksum (meaning that
input might be carefully constructed to create false positives)
actually neither CRCs nor my function are satisfactory, and you should
use something like SHA-2.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
.
- References:
- About Hsieh's hash: initial value? 64 bit?
- From: schoedl
- About Hsieh's hash: initial value? 64 bit?
- Prev by Date: Re: Univ C Question
- Next by Date: Re: Univ C Question
- Previous by thread: Re: About Hsieh's hash: initial value? 64 bit?
- Next by thread: usb and vc++
- Index(es):
Relevant Pages
|