Re: fast dictionary search algorithm
From: CBFalconer (cbfalconer_at_yahoo.com)
Date: 02/19/04
- Next message: CBFalconer: "Re: Algorithm ideas"
- Previous message: Thad Smith: "Re: Fast pancake flipping"
- In reply to: Paul Hsieh: "Re: fast dictionary search algorithm"
- Next in thread: CBFalconer: "Re: fast dictionary search algorithm"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 19 Feb 2004 06:03:35 GMT
Paul Hsieh wrote:
> CBFalconer <cbfalconer@yahoo.com> wrote:
>
> > ... snip ...
> >
> > Using hashlib doesn't involve "designing an auto-resizing hash
> > table" - it is already done.
>
> Well you have to do define "done". For example, it looks like in your
> design its easy to fill your hash table almost entirely with "DELETED"
> (just insert a lot, then delete most of it.) That has no affect on
> insertion speed, but it will degrade your scan performance. I.e.,
> even an almost empty hashtable can perform as badly as if its almost
> completely filled.
No it won't. If the fill (including deletions) exceeds 88% it is
reorganized. If the count of deleted items is above a threshold
percentage, the table is not expanded, but simply reorganized.
The thing that counts is the number of probes to find an entry (or
establish it is not present).
... snip ...
>
> Any open hash table would not suffer from any of these problems. I
> think its kind of hard to argue that hashlib represents the end-all
> be-all of hash table implementations.
Of course it isn't. No generalized system can maximize
performance. But I maintain the odds are very high that it
suffices.
>
> > ... snip ...
> >
> > The way to improve cache performance is to improve locality of
> > reference. One pointer per entry, stored in one contiguous array,
> > seems to have fairly local references.
>
> Not for a hashtable it won't. That's not what locality of reference
> means. It means that adjacent memory is likely to be accessed soon
> after the original. There's no way to get locality of reference in a
> hash table unless the whole thing fits into your L1 cache (or L2 if
> that's good enough for you.) This is the problem with closed hash
> tables in general. The very design specifically works against caches.
> You're intentionally trying to spread out entries as much as
> possible.
The table proper is as compact as possible. The thing that harms
cache performance is reaching out to other areas, which will
normally only be once or twice during any access.
-- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
- Next message: CBFalconer: "Re: Algorithm ideas"
- Previous message: Thad Smith: "Re: Fast pancake flipping"
- In reply to: Paul Hsieh: "Re: fast dictionary search algorithm"
- Next in thread: CBFalconer: "Re: fast dictionary search algorithm"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|