Re: maximum size of a hash table

From: John Bokma (postmaster_at_castleamber.com)
Date: 02/28/05


Date: 28 Feb 2005 18:32:49 GMT

Anno Siegel wrote:

> John Bokma <postmaster@castleamber.com> wrote in comp.lang.perl.misc:
>> Anno Siegel wrote:
>>
>> > John Bokma <postmaster@castleamber.com> wrote in
>> > comp.lang.perl.misc:
>> >> Anno Siegel wrote:
>> >>
>> >> > John Bokma <postmaster@castleamber.com> wrote in
>> >> > comp.lang.perl.misc:
>> >> >> Anno Siegel wrote:
>> >> >>
>> >> >> > John Bokma <postmaster@castleamber.com> wrote in
>> >> >> > comp.lang.perl.misc:
>> >
>> > [hashes]
>> >
>> >> >> I am happy with constant avg length of c, and hence O(1) look
>> >> >> up.
>> >> >
>> >> > Fine. My point is that there are useful applications of hash
>> >> > tables also in the O(n) range.
>> >>
>> >> Uhm, like? (Ok, memory stress testing is one).
>> >
>> > It's a simple space/time tradeoff. You only get constant access
>> > time when collisions are rare, so the number of buckets must be
>> > larger than the number of keys.
>>
>> All depends on the hash function of course, but if the size is big
>> (n) and the number of collisions is small (constant), you still have
>> O(1) access.
>
> Collisions *can't* be rare when the number of keys exceeds the number
> of buckets, no matter what the hash function.

You wrote (see above): that "the number of buckets must be larger than
the number of keys" to make collisions rare. With perfect hashing you
can hash n elements in n slots without having any collision.

Moreover, it is possible to have relatively a lot of collision and still
have O(1) look up. As long as the average number of collisions is a
(small) constant wrt n, the look up is O(n).

>> > Say you expect a million keys and linear search
>> > is too slow by a factor of 100. Use a hash with 100 buckets (some
>> > more to compensate overhead).
>>
>> It all depends on what you want. If linear search is slow, and you
>> need a lot of access it might be way smarter to sort ( O(n log n) )
>> and use binary search for the look ups ( O( log n ) ).
>>
>> With 100 buckets you get 100 lists of 10,000 items each. So it takes
>> average 5,000 steps to find something.
>>
>> Compare that with binary search :-D.
>
> You must keep the list sorted. When insertions and/or deletions are
> frequent, you lose the advantage.

search, insertion and deletion are all O(log n). ( at most 20 steps )

Your proposal has O(n) look up (5,000 steps avg), O(1) insertion, and
O(n) deletion (of a specific element, since you have to look it up
first, 5,000 steps avg, otherwise O(1)).

Technically you are using a hash table with 100 buckets, each bucket
containing a list. Implementing it that way, and sorting each list would
give a major speed up (max 14 steps to find an item with binary search,
max 14 steps for insert/delete). [1]

> What's the point? I'm not saying that hashes with large load factors
> are a super trick that solves everything. They can be, and have been
> used that way. You asked for an example.

I can't think of any advantage of an O(n) look up hash table, it works
like an unsorted list, see above.

-- 
John                   Small Perl scripts: http://johnbokma.com/perl/
               Perl programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html
                        


Relevant Pages

  • Re: Perl hashing speedup ?
    ... I feel that disbalanced bucket usage / and less number of hash buckets ... Now I have some queries regarding way perl hashing mechanism maps keys ... about how perl hashes work internally; but I guess we will be needing ...
    (comp.lang.perl.moderated)
  • Re: Simple Algorithm (Perfect Hashing?)
    ... a hash algorithm does something similar but in a one-way fashion and ... Collisions are naturally not permitted! ... In fixed hash tables you use a table whose length is a prime number. ... the hash function looks like ...
    (comp.programming)
  • Re: [PATCH] dentry and inode cache hash algorithm performance changes.
    ... Did u try the new hashing algorithm with dcache bench? ... number of clock cycles spent by the hash and lookup. ... that most buckets are ... With the new hash function, ...
    (Linux-Kernel)
  • Re: Hashing
    ... A few of the texts indicate that certain hash ... > capacity to avoid hashing collisions within the application domain. ... an entry as "empty" because it might be along a rehash path for some ... for a creation of a "general hash function". ...
    (alt.lang.asm)
  • Re: Hashing
    ... What you have to keep in mind is that a hash table for words in by ... linear testing of buckets. ... What is important is to ensure that you have an effective collision ... memory for the table, you need two arrays, one is the pointers to the ...
    (alt.lang.asm)