Re: Red-black trees?

Mark Wooding wrote:
Jon Harrop <jon@xxxxxxxxxxxxxxxxx> wrote:
Mark Wooding wrote:
robin <robin_v@xxxxxxxxxxx> wrote:
A hash table is a table, and therefore is of fixed size, and would be
unsuitable for such a task.

You've missed the memo about extending hash tables. If you get it
right, the work done to grow the table leaves you with O(1) insertion
times (amortized).

Or O(n) insertion times, which is worse than a balanced binary tree!

The problems with hash tables that I know of are as follows.

* You sometimes need to wait for hash table growth. If you're
sensible, you'll grow the table by a constant factor each time. If
the table contains n objects, then it'll have been resized O(log n)
times, and you'll have had to move O(n) things in the process.
Since you've inserted O(n) things, that works out constant overall.
So if good-on-average is OK (and it usually is) then you have
nothing to worry about.

* If an adversary can choose the items you'll be inserting into your
table, he might be able to arrange for hash collisions to occur.
The solution is to use an (almost) 2-universal hashing family, and
don't tell the adversary which function you chose.

* The table doesn't store its items in any useful order. Sorry: if
this is a problem, use a tree.

Also: hash tables cannot support persistence efficiently (because they rely
upon mutation) whereas trees can (if you have a GC and use immutable

Hash tables have more important problems, e.g. GCs often don't like them.

Certainly if you're using object addresses to identify objects, and
you're using a GC which moves objects about, then you're going to have
to arrange for the hash table to be rehashed during scavenging. (Either
that or you can try to find something else to hash on, and use object
address to disambiguate. But that sounds messy.)

I don't recall reading anything especial in Jones and Lins about hash
tables being problematic. Would you like to explain further?

If all allocated blocks are small then the GC handles them incrementally.
However, arrays of pointers are typically handled atomically. This
introduces an arbitrarily-long stall into minor collections, undermining
pause times and potentially bogging down the whole system.

So arbitrarily-large hash tables are really bad all-round for soft real-time
apps in managed languages.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.

Relevant Pages

  • [PATCH RFC] device-mapper: verity, an integrity checking target
    ... The root node of that tree is set at mapped drive ... The hash tree implementation, dm-bht.c, provides a means to verify only ... +of cryptographic digests with any algorithm supported by the kernel crypto API. ... config DM_SNAPSHOT ...
  • Re: Composite hash table/linked list
    ... > table combined with a balanced or self-adjusting tree. ... insertion makes it more than tolerable, ... frequent that deletion. ... searching provided by the hash table. ...
  • Re: Locality of allocations
    ... Actually I only needed 4 bytes of hash value, ... It does so by running through all sectors in the input ... tree structure with pointers to the sectors so I can ... I have a storage which already contains a lot of sectors. ...
  • Re: data structure for network protocol
    ... As a first thought, I was considering linked lists, but they are probably ... The second option is tree. ... hash tables. ... in an ordinary array (the main cost is that, if the number of items gets ...
  • Re: Hash-table versus B-Tree
    ... > Hash collision is handled thru a linked list. ... Need to provide a hashing function that hashes record keys ... > with tree. ...