Re: Efficient Text File Copy

From: Keith Bostic (bostic_at_sleepycat.com)
Date: 02/03/04


Date: 2 Feb 2004 19:00:19 -0800

CBFalconer <cbfalconer@yahoo.com> wrote in message news:<401C3C49.38F12EE7@yahoo.com>...
>>> Why do I recommend Chris Torek's hash? Because Chris says it
>>> works well and I had no reason to disbelieve him; in my
>>> experience, he's right - it does work well.

While the function works well for strings, it doesn't work well
for numbers (for example, indexing on record numbers), in my
experience. Berkeley DB ended up using this:

/*
 * Fowler/Noll/Vo hash
 *
 * The basis of the hash algorithm was taken from an idea sent by
email to the
 * IEEE Posix P1003.2 mailing list from Phong Vo
(kpv@research.att.com) and
 * Glenn Fowler (gsf@research.att.com). Landon Curt Noll
(chongo@toad.com)
 * later improved on their algorithm.
 *
 * The magic is in the interesting relationship between the special
prime
 * 16777619 (2^24 + 403) and 2^32 and 2^8.
 *
 * This hash produces the fewest collisions of any function that we've
seen so
 * far, and works well on both numbers and strings.
 *
 * PUBLIC: u_int32_t __ham_func5 __P((DB *, const void *, u_int32_t));
 */
u_int32_t
__ham_func5(dbp, key, len)
        DB *dbp;
        const void *key;
        u_int32_t len;
{
        const u_int8_t *k, *e;
        u_int32_t h;

        if (dbp != NULL)
                COMPQUIET(dbp, NULL);

        k = key;
        e = k + len;
        for (h = 0; k < e; ++k) {
                h *= 16777619;
                h ^= *k;
        }
        return (h);
}

There are a variety of hash functions Berkeley DB has tried
at various times -- the ones we've liked best we've kept:

        http://www.opensource.apple.com/darwinsource/10.3/BerkeleyDB-6/db/hash/hash_func.c

Regards,
--keith
 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com



Relevant Pages

  • Re: Efficient Text File Copy
    ... While the function works well for strings, ... * The basis of the hash algorithm was taken from an idea sent by ... There are a variety of hash functions Berkeley DB has tried ...
    (comp.lang.c)
  • Re: Crack in Computer Security Code Raises Red Flag
    ... >same hash when team members used a hash algorithm called SHA-1 ... it's fairly simple to generate two colliding bit strings; ... problem with your hash function when, given an original bit string, ...
    (sci.crypt)
  • Re: Crack in Computer Security Code Raises Red Flag
    ... >same hash when team members used a hash algorithm called SHA-1 ... it's fairly simple to generate two colliding bit strings; ... problem with your hash function when, given an original bit string, ...
    (alt.computer.security)
  • The certification password of Internet Explorer 7 and operation of auto complete
    ... About the certification password of Internet Explorer and operation ... By remembering the strings that are input in the following text ... In this registry, there are values whose name is a string of 42 bytes ... We cannot guess the original strings from the hash value, ...
    (Bugtraq)
  • Re: Maximum String size in Java?
    ... > for long strings, so on average, SFH bakes it in the performance ... >> distribution over the hash table size. ... > you need to be concerned about Unicode strings. ... construct a hash function that does appreciably better than the one ...
    (comp.programming)