Re: Hash-table versus B-Tree

From: CBFalconer (cbfalconer_at_yahoo.com)
Date: 12/07/04


Date: Tue, 07 Dec 2004 13:00:47 GMT


"kar1107@gmail.com" wrote:
> KP Bhat wrote:
>
> I assume by B-Tree we also include balanced trees like AVL.
> Hash collision is handled thru a linked list.
>
>> I am trying to compare and contrast the use of Hash-table and
>> B-Trees as indexing mechanisms for record-based files. Here's
>> what I have got so far. Kindly suggest additions/modifications.
>>
>> Hash-table
>> 1. Simple indexing scheme
>> 2. Insert and delete operations very efficient
>
> Delete is not efficient. A potentially long linked list need to
> be traversed in case of collisions.

Not necessarily so. See hashlib, reference later. No linked list
is involved, and the operation is O(1).

>
>> 3. Retrieve operation very efficient
>> 4. Need to provide a hashing function that hashes record keys
>> into some number

This is usually a trivial operation. The trick is to supply a hash
function suitable for the data involved, and good systems can be
quite forgiving. The penalty for a bad function is poorer
performance.

>> 5. Supporting GetNext and GetBulk operations quite expensive

Depends on your definition of these. I have no idea what you have
in mind.

>> 6. Does not scale that well as the size of the data increases

Scales just fine until the data base no longer fits in virtual
memory. Then the B-tree is far superior.

>
> 7. Collision handling can lead to poor searches ("retrieve
> operation"). The complexity is O(n) for search and delete.

Not so. Again, see the hashlib reference at end. Search and
delete times are O(1).

>
>>
>> B-Tree
>> 1. Complex indexing scheme
>> 2. Insert and delete operations slightly expensive due to need
>> to retain the B-Tree property (i.e. splitting/joining nodes)
>> 3. Retrieve operation very efficient
>> 4. No need to provide a hashing function
>> 5. Supporting the GetNext and GetBulk operations very efficient
>> 6. Scales extremely well as the size of the data increases
>
> 7. search, insert, delete always O(log n) (thus supports #6 above)
>
> Once a solid library is implemented for the tree, its good to go
> with tree. You never know you may have to handle large data in
> future.

You can conveniently examine the characteristics of hash systems
via hashlib, which is available under GPL at:

  <http://cbfalconer.home.att.net/download/hashlib.zip>

-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!