# Re: maintenance of hash tables.

*From*: pjb@xxxxxxxxxxxxxxxxx (Pascal J. Bourguignon)*Date*: Mon, 29 Jun 2009 12:40:16 +0200

mohangupta13 <mohangupta13@xxxxxxxxx> writes:

hello everyone,

This may sound pretty basic but still I have this doubt.

It is know that to maintain the constant time operations in a hash

table the number of elements in the table should be proportional to

the slots in the table.

But I have seen in a lot many places that to maintain this, whenever

the number of elements in a hash table reach half the table size ,the

size of the table is doubled.

Why this factor of 2 (only) is chosen most of the time??

The numerical value of the factor doesn't import much. What import is

that it be a _factor_. That is that the increasing or decreasing of

the size of the hash table be geometrical. This way, the _amortized_

complexity can remain proportional to the number of elements.

I believe it must have to do something relating to statistics but can

I get a solid theoretical explanation for the same.

Imagine you have a vector, and you keep inserting elements, from slot

0 up. When you reach the end of the vector, you allocate a new vector

twice the size, therefore you have to copy all the old elements before

inserting the new element. Count the number of copies you have to do

to fill a vector of N elements when you start from a vector of size 1:

size insert allocate new vector copies total copies

1 slot 0 no 0 0

1 slot 1 yes 1 1

2 slot 2 yes 2 3

4 slot 3 no 0 3

4 slot 4 yes 4 7

8 slot 5 no 0 7

8 slot 6 no 0 7

8 slot 7 no 0 7

8 slot 8 yes 8 15

16 slot 9 no 0 15

16 slot 10 no 0 15

16 slot 11 no 0 15

16 slot 12 no 0 15

16 slot 13 no 0 15

16 slot 14 no 0 15

16 slot 15 no 0 15

16 slot 16 yes 16 31

32 slot 17 no 0 31

32 slot 18 no 0 31

....

32 slot 31 no 0 31

32 slot 32 yes 32 63

64 slot 33 no 0 63

....

64 slot 63 no 0 63

So you can see that to insert 64 elements, from slot 0 to slot 63, you

need to make only 63 copies (and of course, 64 initial insertions):

inserting is O(N).

You can get the same result if you _multiply_ the size by 1.5, or 3 or

any other factor.

On the other hand, notice how much copying is done while the size of

the vector is less than 8. That's the reason why it's often take as

the minimum size for extensible vectors or hash-tables.

--

__Pascal Bourguignon__

.

**Follow-Ups**:**Re: maintenance of hash tables.***From:*mohangupta13

**Re: maintenance of hash tables.***From:*Andrew Tomazos

**References**:**maintenance of hash tables.***From:*mohangupta13

- Prev by Date:
**Re: Generate a danom bumber that follow a certain probability mass distribribution** - Next by Date:
**Re: maintenance of hash tables.** - Previous by thread:
**maintenance of hash tables.** - Next by thread:
**Re: maintenance of hash tables.** - Index(es):