Re: Hash-table versus B-Tree

From: CBFalconer (
Date: 12/07/04

Date: Tue, 07 Dec 2004 13:00:47 GMT

"" wrote:
> KP Bhat wrote:
> I assume by B-Tree we also include balanced trees like AVL.
> Hash collision is handled thru a linked list.
>> I am trying to compare and contrast the use of Hash-table and
>> B-Trees as indexing mechanisms for record-based files. Here's
>> what I have got so far. Kindly suggest additions/modifications.
>> Hash-table
>> 1. Simple indexing scheme
>> 2. Insert and delete operations very efficient
> Delete is not efficient. A potentially long linked list need to
> be traversed in case of collisions.

Not necessarily so. See hashlib, reference later. No linked list
is involved, and the operation is O(1).

>> 3. Retrieve operation very efficient
>> 4. Need to provide a hashing function that hashes record keys
>> into some number

This is usually a trivial operation. The trick is to supply a hash
function suitable for the data involved, and good systems can be
quite forgiving. The penalty for a bad function is poorer

>> 5. Supporting GetNext and GetBulk operations quite expensive

Depends on your definition of these. I have no idea what you have
in mind.

>> 6. Does not scale that well as the size of the data increases

Scales just fine until the data base no longer fits in virtual
memory. Then the B-tree is far superior.

> 7. Collision handling can lead to poor searches ("retrieve
> operation"). The complexity is O(n) for search and delete.

Not so. Again, see the hashlib reference at end. Search and
delete times are O(1).

>> B-Tree
>> 1. Complex indexing scheme
>> 2. Insert and delete operations slightly expensive due to need
>> to retain the B-Tree property (i.e. splitting/joining nodes)
>> 3. Retrieve operation very efficient
>> 4. No need to provide a hashing function
>> 5. Supporting the GetNext and GetBulk operations very efficient
>> 6. Scales extremely well as the size of the data increases
> 7. search, insert, delete always O(log n) (thus supports #6 above)
> Once a solid library is implemented for the tree, its good to go
> with tree. You never know you may have to handle large data in
> future.

You can conveniently examine the characteristics of hash systems
via hashlib, which is available under GPL at:


Chuck F ( (
   Available for consulting/temporary embedded and systems.
   <>  USE worldnet address!

Relevant Pages

  • [PATCH RFC] device-mapper: verity, an integrity checking target
    ... The root node of that tree is set at mapped drive ... The hash tree implementation, dm-bht.c, provides a means to verify only ... +of cryptographic digests with any algorithm supported by the kernel crypto API. ... config DM_SNAPSHOT ...
  • Re: Locality of allocations
    ... Actually I only needed 4 bytes of hash value, ... It does so by running through all sectors in the input ... tree structure with pointers to the sectors so I can ... I have a storage which already contains a lot of sectors. ...
  • Re: What are my options for dictionary hashing?
    ... In a hash, yes. ... He is instead using a hash function to select one of 16 possible lists ... So the goal here isn't to find a hashing function ... poor pseudo-random number generator populating those 128 values, ...
  • Re: data structure for network protocol
    ... As a first thought, I was considering linked lists, but they are probably ... The second option is tree. ... hash tables. ... in an ordinary array (the main cost is that, if the number of items gets ...
  • Re: Detecting changed portions of large data set
    ... >> incremental hashing method such as Zobrist Hashing, ... >> by XORing out the old hash and XORing in the new one. ... > tree approach (which I'm reading as being useful for dealing with ...