Re: hash table idea good or no good?
From: Christian Gollwitzer (auriocus_at_btcips73x1.cip.uni-bayreuth.de)
Date: 11/21/03
- Previous message: Corey Murtagh: "Re: how smart is the registry api?"
- In reply to: Michael B Allen: "Re: hash table idea good or no good?"
- Next in thread: Michael B Allen: "Re: hash table idea good or no good?"
- Reply: Michael B Allen: "Re: hash table idea good or no good?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 21 Nov 2003 11:59:44 +0100
Michael B Allen wrote:
> On Thu, 20 Nov 2003 07:58:11 -0500, Christian Gollwitzer wrote:
>
>
>>2. In the *best case*, your storage is O(n). But the time required for
>>storing *one entry* is not O(1), but *O(log(n))*, where n is the number
>>of entries.
>
> I don't understand what you're saying. Clearly the time required to store
> one entry is better than O(log(n)). A linked list would be O(log(n)).
Nope. A linked list has O(n) because you cannot perform a binary search.
A binary tree has O(log(n)) in the average case and O(n) in the worst case.
> The
> technique I described would be the number of levels (nl) plus the log
> of the number of elements in the collision list (nc).
And the number of levels is approximately c*log(n), as I worked out
above. The constant c may be very small, in this case you can forget
about it for only few levels.
> The size of the
> collision list is the number of collisions occuring as the load factor
> grows from ~0.37 to 0.75 before the next level is allocated and the
> load factor drops to 0.37 (1/2 of 0.75 because the total cumulative
> size of the map (n) doubles). I don't know what that size is yet but
> it's certainly not n.
>
>
>>Another try:
>>
>>
>>int
>>hashmap_put(struct hashmap *h, void *key, void *data) {
>> const double loadfactor=0.75;
>> unsigned long hash;
>> // level is an item of struct hashmap, initially = 0 // also
>> items_in_this_level, initially = 0
>
>
> I don't think this is a very good idea either. It will only fill 75%
> of the table at each level which is a pretty bad waste of space in itself.
Yes, you are right with it. It is no good idea.
>>read performance is really bad. You'll get O(log n) again for searching
>>one item.
>
>
> Again, I don't see where you get linked list asymtotic growth from this.
> void *
> hashmap_get(const struct hashmap *h, const void *key)
> {
> unsigned long hash;
> int level;
>
> if (h->tables[0] == NULL) {
> return NULL;
> }
>
> hash = h->hash ? h->hash(key) : hash_ptr(key);
>
> for (level = 0; level < 16; level++) {
> int table_size = table_sizes[level];
> struct entry *table = h->tables[level];
> struct entry *e;
>
> e = table + hash % table_size;
>
> if (e->key && hash == e->hash && h->cmp(key, e->key) == 0) {
> return e->data;
> }
>
> if (h->tables[level + 1] == NULL) {
> iter_t iter;
> struct entry *e;
> struct linkedlist *col = (struct linkedlist *)&h->col[h->whichcol];
>
> /* search the collision list */
> linkedlist_iterate(col, &iter);
> while ((e = linkedlist_next(col, &iter))) {
> if (hash == e->hash && h->cmp(key, e->key) == 0) {
> return e->data;
> }
> }
>
> break;
> }
> }
>
> return NULL;
> }
Assume I use the hashmap once for inserting a dictionary and then for
looking up values many times. Assume I've inserted roughly two million
items and there were no or only few collision. Then about half of the
items, i.e. one million, is in table[15]. Because you look first in
table[0], then table[1] etc. you need 16 accesses until the item is
found for 50% of the items. 15 accesses for 25% and so on.
So the average number of accesses for an item is:
sum(k*2^(k-17), k=1..16) = 15.00001525....
So the average number of levels consulted is approximately fifteen. If
you'd used rehashing, than all elements not in the collision list would
be found after only one access, which is clearly faster. So for this
kind of use your approach is not suitable. It maybe for others. Try
running it on a large example (canonical one is counting the frequency
of all words in a large textfile) and compare with other
implementations, like the hashmap from STL or CBFalconers
implementation. Hard to say from theory, which one is the best in these
cases. Of course you need to use the same hash function for such a test.
Measure time and storage used. And try with one particularly bad "hash
function" that simply always returns one to see what happens in case of
many collisions.
-- Vale ! Christianus Auriocus
- Previous message: Corey Murtagh: "Re: how smart is the registry api?"
- In reply to: Michael B Allen: "Re: hash table idea good or no good?"
- Next in thread: Michael B Allen: "Re: hash table idea good or no good?"
- Reply: Michael B Allen: "Re: hash table idea good or no good?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|