Re: Extreamly large Hashtable



David McDivitt wrote:
>
> A binary search is the fastest and most easily implemented scheme, when it
> can be used. This is because binary searches are at the core of any other
> lookup scheme, including databases going through an index. By laying out a
> sorted flat file and doing a low level binary search directly on that flat
> file, much overhead is eliminated. If records are large, then a key and
> offset value can be obtained from the flat file, and the offset used to
> fetch actual data from another file. For variable length records obtain
> offset and length from the first file and use to fetch from the second file.
> Binary searches in this manner are greatly enhanced by the base operation
> system which caches disk info, removing that load from the application
> completely.

Databases on disk going through an index (rather than a hash) will generally use a
b-tree scheme these days. B-tree is not really a binary search and is much faster for
searching disk, even more with a good cache. For example, maintaining the top of the
tree in memory yields significant improvements because they tend to be balanced (a big
issue with binary tree). They also don't normally sort the data records thus allowing
multiple indexes for the same data. Generally, b-tree is preferred because it allows
range searching and ordered sub-sets which hashing doesn't. All of this is more
appropriate for database purposes.

--
Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
.



Relevant Pages

  • Re: Extreamly large Hashtable
    ... > A binary search is the fastest and most easily implemented scheme, ... > lookup scheme, including databases going through an index. ... > sorted flat file and doing a low level binary search directly on that flat ... the hashtable will require 1 disk seek to do the same. ...
    (comp.lang.java.programmer)
  • Re: Compatible fstat()
    ... it doubles the guessed size of the disk until ... and then uses binary search to figure out ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: GEOM is too verbose
    ... In message, Maxim Sobolev writes: ... >Hi Poul, ... >I use application that detects size of disk using binary search. ...
    (freebsd-current)