Re: Hash table in C++? STL?

From: Siemel Naran (SiemelNaran_at_REMOVE.att.net)
Date: 04/15/04


Date: Thu, 15 Apr 2004 05:31:12 GMT


"Claudio Puviani" <puviani@hotmail.com> wrote in message news:MMhfc.43722
> "Kevin Goodsell" <usenet2.spamfree.fusion@neverbox.com> wrote

> > Does anyone know what will need to be provided in
> > order to hash on a particular type? std::map requires
> > a less-than comparison operation -- what operations
> > would a hash-based map need?

> A hashing function and an equality operation are the minimum requirements,
but
> requiring a relational comparison (such as less_than) instead of equality
would
> allow for certain optimizations on the part of the implementers (such as
using
> a tree-like structure instead of lists for the buckets). I don't know
whether
> the standard will require equality or a relational comparator.
>
> Sadly, the hashing function is where all the complexity lies and I'd be
> pleasantly surprised if more than 1% of programmers knew what constitutes
a
> good hashing function. With a bad hashing function (and a naive bucket
> strategy), performance can quickly degenerate to much worse than a
balanced
> tree on large data sets.

I don't think I'm in that 1%. A naive guess is to add up all the letters
and modulo by the bucket size, which they say should be prime, but why is
this?

size_t hash(const char * s, int buckets) {
   size_t h = 0;
   for ( ; *s; ++s) h += *s;
   return h%bucket.
}

But we need hash only the first maxchars characters.

To answer Kevin's question about the hash function, it should be like this
in general

class hash {
public:
   typedef char char_type;
   typedef typename std::vector<T>::size_type hash_type;
   explicit hash(size_t maxchars);
   hash_type operator()(char_type begin, char_type end, size_t numbuckets)
const;
private:
   size_t maxchars;
};

The hash may increase numbuckets dynamically as it grows, so numbuckets is a
parameter to operator(). But the number of chars to hash is a constant, and
should not change as we increase or decrease the number of buckets, so it is
argument to the constructor and gets stored in the class.

In our code above we rely on an implicit conversion from char_type to
hash_type (which seems not reasonable), and hash_type::operator+= (which
seems reasonable as hash_type is just an unsigned integer).



Relevant Pages

  • Re: Perl hashing speedup ?
    ... I feel that disbalanced bucket usage / and less number of hash buckets ... Now I have some queries regarding way perl hashing mechanism maps keys ... about how perl hashes work internally; but I guess we will be needing ...
    (comp.lang.perl.moderated)
  • Re: Hashing
    ... What you have to keep in mind is that a hash table for words in by ... linear testing of buckets. ... What is important is to ensure that you have an effective collision ... memory for the table, you need two arrays, one is the pointers to the ...
    (alt.lang.asm)
  • Re: SGI hash_map
    ... >> studied hash functions you'll find on the web. ... > And before inserting into the map, each hash value is calculated with: ... a uniform distribution. ... a program to count the number of buckets that contain multiple items. ...
    (comp.lang.cpp)
  • Re: Hash-table versus B-Tree
    ... > Hash collision is handled thru a linked list. ... Need to provide a hashing function that hashes record keys ... > with tree. ...
    (comp.programming)
  • Re: Hash table in C++? STL?
    ... I wouldn't dare claim to be either, without doing some research first. ... by the number of items that hash to that index. ... perhaps you meant "the size measured in number of buckets" and I was ... second hash function is used to determine the sequence of slots to try. ...
    (comp.lang.cpp)