Re: Red-black trees?

Mark Wooding wrote:
Jon Harrop <jon@xxxxxxxxxxxxxxxxx> wrote:
Mark Wooding wrote:
robin <robin_v@xxxxxxxxxxx> wrote:
A hash table is a table, and therefore is of fixed size, and would be
unsuitable for such a task.

You've missed the memo about extending hash tables. If you get it
right, the work done to grow the table leaves you with O(1) insertion
times (amortized).

Or O(n) insertion times, which is worse than a balanced binary tree!

The problems with hash tables that I know of are as follows.

* You sometimes need to wait for hash table growth. If you're
sensible, you'll grow the table by a constant factor each time. If
the table contains n objects, then it'll have been resized O(log n)
times, and you'll have had to move O(n) things in the process.
Since you've inserted O(n) things, that works out constant overall.
So if good-on-average is OK (and it usually is) then you have
nothing to worry about.

* If an adversary can choose the items you'll be inserting into your
table, he might be able to arrange for hash collisions to occur.
The solution is to use an (almost) 2-universal hashing family, and
don't tell the adversary which function you chose.

* The table doesn't store its items in any useful order. Sorry: if
this is a problem, use a tree.

Also: hash tables cannot support persistence efficiently (because they rely
upon mutation) whereas trees can (if you have a GC and use immutable

Hash tables have more important problems, e.g. GCs often don't like them.

Certainly if you're using object addresses to identify objects, and
you're using a GC which moves objects about, then you're going to have
to arrange for the hash table to be rehashed during scavenging. (Either
that or you can try to find something else to hash on, and use object
address to disambiguate. But that sounds messy.)

I don't recall reading anything especial in Jones and Lins about hash
tables being problematic. Would you like to explain further?

If all allocated blocks are small then the GC handles them incrementally.
However, arrays of pointers are typically handled atomically. This
introduces an arbitrarily-long stall into minor collections, undermining
pause times and potentially bogging down the whole system.

So arbitrarily-large hash tables are really bad all-round for soft real-time
apps in managed languages.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.