Re: Red-black trees?

Jon Harrop <jon@xxxxxxxxxxxxxxxxx> wrote:
Mark Wooding wrote:
robin <robin_v@xxxxxxxxxxx> wrote:
A hash table is a table, and therefore is of fixed size, and would be
unsuitable for such a task.

You've missed the memo about extending hash tables. If you get it
right, the work done to grow the table leaves you with O(1) insertion
times (amortized).

Or O(n) insertion times, which is worse than a balanced binary tree!

The problems with hash tables that I know of are as follows.

* You sometimes need to wait for hash table growth. If you're
sensible, you'll grow the table by a constant factor each time. If
the table contains n objects, then it'll have been resized O(log n)
times, and you'll have had to move O(n) things in the process.
Since you've inserted O(n) things, that works out constant overall.
So if good-on-average is OK (and it usually is) then you have
nothing to worry about.

* If an adversary can choose the items you'll be inserting into your
table, he might be able to arrange for hash collisions to occur.
The solution is to use an (almost) 2-universal hashing family, and
don't tell the adversary which function you chose.

* The table doesn't store its items in any useful order. Sorry: if
this is a problem, use a tree.

Hash tables have more important problems, e.g. GCs often don't like them.

Certainly if you're using object addresses to identify objects, and
you're using a GC which moves objects about, then you're going to have
to arrange for the hash table to be rehashed during scavenging. (Either
that or you can try to find something else to hash on, and use object
address to disambiguate. But that sounds messy.)

I don't recall reading anything especial in Jones and Lins about hash
tables being problematic. Would you like to explain further?

-- [mdw]