Re: SGI hash_map

From: Ivan Vecerina (INVALID_use_webform_instead_at_vecerina.com)
Date: 03/22/05


Date: Tue, 22 Mar 2005 07:28:57 +0100


"Christian Meier" <chris@gmx.ch> wrote in message
news:d1mo02$3t7$1@news.hispeed.ch...
> "Ivan Vecerina" <NOT_VALID_please_use_contact_webform@vecerina.com>
> schrieb
>> It is essential that the function returns a relatively uniformly
>> distributed
>> random value. A minimalistic way to achieve this is to multiply an input
>> value by some large prime number, better is to use one of the many well-
>> studied hash functions you'll find on the web.
>> For an intro, see for example:
>> http://www.concentric.net/~Ttwang/tech/inthash.htm
...
> Because there is no hash<uint64_t> function, I wrote my own. As the
> hash<int> value for an int of the value 5435438 is 5435438 and for 123456
> is
> 123456, I just return the uint64_t value:
> return static_cast<size_t>(ujHash);
>
> My values do not have to be multiplied by a prime number because I get
> different values with little difference (not 1000000, 2000000 and
> 3000000).
> And before inserting into the map, each hash value is calculated with:
> hash_val %= bucket_count();
> And the number of buckets is always a prime number in the SGI
> implementation.
Depending on how the values are distributed, you may or may not have
a uniform distribution. If you care to check, you could probably write
a program to count the number of buckets that contain multiple items.

> In the meantime I looked up the source code of the SGI library. And there
> is
> the function insert_unique which is called by the hash map function
> ::insert():
>
> pair<iterator, bool> insert_unique(const value_type& __obj)
> {
> resize(_M_num_elements + 1);
> return insert_unique_noresize(__obj);
> }
>
> This means: Each time an element is inserted into the hash map, it will be
> checked for resizing depending on _M_num_elements. _M_num_elements is the
> number of ALL elements in the map. If I have all elements in the same
> bucket, the map will be resized after reaching the number of buckets
> altough
> they are all in the same bucket...
> I don't know why this is written like this.
This ensures that item search is always as efficient as possible (if
this doesn't matter to a program, then std::map may be a better candiadate).
Like for the resizing of std::vector, the number of 'rehasings' in hashmap
is amortized constant relative to the number of contained item. So
this is normally not a problem. (NB: there are some sophisticated hash
table algorithms to dynamically 'redistribute' items, but they only make
sense in specific implementations).

> This implementation is written for a hash codes which are unique.
Yes, this is what they are supposed to be !

> Well, this is no problem for numeric data
> types of smaller size than std::size_t. But this implementation of the
> hash
> map would be quite ugly if I wanted to insert large strings for
> example.....
Again, not really a problem because the number of hasch code computations
is amortized constant (~2) per item inserted.

> Well, I could answer my question by myself. But I do not really understand
> why the SGI people want to have as many buckets as elements in every
> case....
In non-pathological cases (proper hashing) this is what allows hash_map
to perform queries at optimal speed - this is the only benefit of
hash_map. Searching (linearily) through multiple items in the same
bucket can be quite expensive.

Cheers,
Ivan

-- 
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form


Relevant Pages

  • Re: [Ocfs2-devel] Ocfs2 performance bugs of doom
    ... With 64K buckets the hash, a typical region of the table looks like: ... A poor distribution as you already noticed. ...
    (Linux-Kernel)
  • Re: How best to tabulate frequencies of strings
    ... Jon Harrop wrote: ... your hash function is actually far worse than that. ... map onto the same tiny set of integers anyway. ... the distribution of your hashes is massively biased towards the center of ...
    (comp.programming)
  • Re: Perl hashing speedup ?
    ... I feel that disbalanced bucket usage / and less number of hash buckets ... Now I have some queries regarding way perl hashing mechanism maps keys ... about how perl hashes work internally; but I guess we will be needing ...
    (comp.lang.perl.moderated)
  • Re: Hashing
    ... What you have to keep in mind is that a hash table for words in by ... linear testing of buckets. ... What is important is to ensure that you have an effective collision ... memory for the table, you need two arrays, one is the pointers to the ...
    (alt.lang.asm)
  • Re: Hash table in C++? STL?
    ... > a tree-like structure instead of lists for the buckets). ... > the standard will require equality or a relational comparator. ... the hashing function is where all the complexity lies and I'd be ... But we need hash only the first maxchars characters. ...
    (comp.lang.cpp)