Re: fast dictionary search algorithm

From: CBFalconer (cbfalconer_at_yahoo.com)
Date: 02/19/04


Date: Thu, 19 Feb 2004 06:03:35 GMT

Paul Hsieh wrote:
> CBFalconer <cbfalconer@yahoo.com> wrote:
>
> > ... snip ...
> >
> > Using hashlib doesn't involve "designing an auto-resizing hash
> > table" - it is already done.
>
> Well you have to do define "done". For example, it looks like in your
> design its easy to fill your hash table almost entirely with "DELETED"
> (just insert a lot, then delete most of it.) That has no affect on
> insertion speed, but it will degrade your scan performance. I.e.,
> even an almost empty hashtable can perform as badly as if its almost
> completely filled.

No it won't. If the fill (including deletions) exceeds 88% it is
reorganized. If the count of deleted items is above a threshold
percentage, the table is not expanded, but simply reorganized.
The thing that counts is the number of probes to find an entry (or
establish it is not present).

... snip ...
>
> Any open hash table would not suffer from any of these problems. I
> think its kind of hard to argue that hashlib represents the end-all
> be-all of hash table implementations.

Of course it isn't. No generalized system can maximize
performance. But I maintain the odds are very high that it
suffices.

>
> > ... snip ...
> >
> > The way to improve cache performance is to improve locality of
> > reference. One pointer per entry, stored in one contiguous array,
> > seems to have fairly local references.
>
> Not for a hashtable it won't. That's not what locality of reference
> means. It means that adjacent memory is likely to be accessed soon
> after the original. There's no way to get locality of reference in a
> hash table unless the whole thing fits into your L1 cache (or L2 if
> that's good enough for you.) This is the problem with closed hash
> tables in general. The very design specifically works against caches.
> You're intentionally trying to spread out entries as much as
> possible.

The table proper is as compact as possible. The thing that harms
cache performance is reaching out to other areas, which will
normally only be once or twice during any access.

-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!


Relevant Pages

  • Re: fast dictionary search algorithm
    ... Any open hash table would not suffer from any of these problems. ... That's not what locality of reference ... Paul Hsieh ...
    (comp.programming)
  • Re: hash two keys to one index
    ... What goes into the map are pairs of (reference to key, ... When I insert an object into the hash table, I pass in ... void insert(Object obj, int hash) throws HashTableFull ... int probe = 0; ...
    (comp.lang.java.programmer)
  • Re: Net::SFTP::Attributes
    ... when I ask a simple question on the correct usage of a module I would ... If $subref is specified, for each entry in the directory, $subref will ... be called and given a reference to a hash with three keys: ...
    (comp.lang.perl.modules)
  • Re: Is the teaching of non-reentrant HLASM coding practices ever defensible?
    ... Locality of data reference is good, and locality of instruction reference is good. ... This is not really an issue for compromise or for statesmanlike fatuities of the form 'Reentrant code has important uses; on the other hand there is scope for non-reentrant constructions'. ...
    (bit.listserv.ibm-main)
  • Re: hash two keys to one index
    ... What goes into the map are pairs of (reference to key, ... When I insert an object into the hash table, I pass in ... void insert(Object obj, int hash) throws HashTableFull ... int probe = 0; ...
    (comp.lang.java.programmer)