Re: maintenance of hash tables.
 From: pjb@xxxxxxxxxxxxxxxxx (Pascal J. Bourguignon)
 Date: Mon, 29 Jun 2009 12:40:16 +0200
mohangupta13 <mohangupta13@xxxxxxxxx> writes:
hello everyone,
This may sound pretty basic but still I have this doubt.
It is know that to maintain the constant time operations in a hash
table the number of elements in the table should be proportional to
the slots in the table.
But I have seen in a lot many places that to maintain this, whenever
the number of elements in a hash table reach half the table size ,the
size of the table is doubled.
Why this factor of 2 (only) is chosen most of the time??
The numerical value of the factor doesn't import much. What import is
that it be a _factor_. That is that the increasing or decreasing of
the size of the hash table be geometrical. This way, the _amortized_
complexity can remain proportional to the number of elements.
I believe it must have to do something relating to statistics but can
I get a solid theoretical explanation for the same.
Imagine you have a vector, and you keep inserting elements, from slot
0 up. When you reach the end of the vector, you allocate a new vector
twice the size, therefore you have to copy all the old elements before
inserting the new element. Count the number of copies you have to do
to fill a vector of N elements when you start from a vector of size 1:
size insert allocate new vector copies total copies
1 slot 0 no 0 0
1 slot 1 yes 1 1
2 slot 2 yes 2 3
4 slot 3 no 0 3
4 slot 4 yes 4 7
8 slot 5 no 0 7
8 slot 6 no 0 7
8 slot 7 no 0 7
8 slot 8 yes 8 15
16 slot 9 no 0 15
16 slot 10 no 0 15
16 slot 11 no 0 15
16 slot 12 no 0 15
16 slot 13 no 0 15
16 slot 14 no 0 15
16 slot 15 no 0 15
16 slot 16 yes 16 31
32 slot 17 no 0 31
32 slot 18 no 0 31
....
32 slot 31 no 0 31
32 slot 32 yes 32 63
64 slot 33 no 0 63
....
64 slot 63 no 0 63
So you can see that to insert 64 elements, from slot 0 to slot 63, you
need to make only 63 copies (and of course, 64 initial insertions):
inserting is O(N).
You can get the same result if you _multiply_ the size by 1.5, or 3 or
any other factor.
On the other hand, notice how much copying is done while the size of
the vector is less than 8. That's the reason why it's often take as
the minimum size for extensible vectors or hashtables.

__Pascal Bourguignon__
.
 FollowUps:
 Re: maintenance of hash tables.
 From: mohangupta13
 Re: maintenance of hash tables.
 From: Andrew Tomazos
 Re: maintenance of hash tables.
 References:
 maintenance of hash tables.
 From: mohangupta13
 maintenance of hash tables.
 Prev by Date: Re: Generate a danom bumber that follow a certain probability mass distribribution
 Next by Date: Re: maintenance of hash tables.
 Previous by thread: maintenance of hash tables.
 Next by thread: Re: maintenance of hash tables.
 Index(es):
Relevant Pages
