Re: my paper: A cascade hash algorithm with O(1) worst case lookup time
- From: "jeff lee" <shaohua@xxxxxxxxx>
- Date: 14 Aug 2006 00:07:29 -0700
Thank you for your reply. sorry for the long delay since your reply.
I'm reading another paper these days.
Chris Uppal 写道:
jeff lee wrote:to the above three questions, I don't know what the answers are.
I don't know whether such idea is new. so I need experienced
researchers here to help me review it.
I'm not an "experienced researcher" but here are some of the questions
that I'd like to see answered.
Is there an upper-bound to the space needed to hold a cascade hash
table for fixed amount of data, given that it's response (as I
understand it) to collisions in the last space result in gtowing and
rehashing rather than falling back to a linear-probe algorithm ?
How does the average (not worst) case compare with a linear probing
implementation using the same total number of slots ?
Your paper assumes randomly distributed hashes throughout (which is
fair enough), but it would be interesting to know how the algorithm
would respond to difficult (skewed) loads -- it is better or worse than
linear probing ? Perhaps it would maintain speed, unlike linear
probing, but at the cost of using more space.
I think a small deliberate "bad" data set could make collision happen
in the last hash table, then the hash tables have to be enlarged. but I
haven't tried that. I just have tests on random numbers and consecutive
numbers. Cascade hash has much better performance on consecutive
numbers. but such set is not skewed.
I don't know how the change of the relative sizes will affect the
Is there any significance to the relative sizes of the sub-hash tables
? Is powers-of-two just a convenient implementation detail, or does it
relate in some way to the design load of each sub-table ? My guess
would be that there is, or should be, a relationship, and that if each
space is half the size of its predecessor, then it's because the design
assumes that the previous space will run at 50% load -- but that would
imply an overal design-load of 50% and you assert decent performance at
higher loads than that. (Also linear probing would work well at such a
low load, anyway...)
performance. but such a relative size is not because the design assumes
that the previous space will run at 50% load. Actually, from the
experiment results, you can see the load of the last table is very
low(less than 10%). A low load factor in some tables is essential for
new items to find their places.
I think linear probing will have bad performance when the load is high.
If my understanding is correct, the lookup is troublesome... I mean you
don't know when to stop searching an item
cache is not a big problem. because hops in the same table is short,
On a more practical level, how does all the jumping around in memory
affect performance ? If reading from another place in memory places an
undue load on the system's memory cache hierarchy then it may perform
much worse than a purely abstract analysis would suggest. Data point:
I once has a linear-probing hash table implementation where a typical
first probe would involve a disk seek + read, any subsequent sequential
probes into the same part of the hash table were -- to a very good
approximation -- free, whereas a second proble to an unrelated part of
the same would have incurred another seek. That table could be
operated at over 90% load without measurable impact on its performace.
Obviously, disk seek time is a pretty high penalty, but even for a pure
RAM implementation, filling a cache line, only to discard it and fill a
second (or third) unrelated cache line, might be significantly slower
than a linear probing implementation using the same total space and
making better use of the caches.
and cache will not miss. but in different tables, cache will surely
miss. fortunately most items are stored in the first two levels.
I don't know if that's the kind of feedback you wanted, but nobody else
has replied, so...
-- chris
.
- Follow-Ups:
- References:
- Prev by Date: This is about computing
- Next by Date: Re: This is about computing
- Previous by thread: Re: my paper: A cascade hash algorithm with O(1) worst case lookup time
- Next by thread: Re: my paper: A cascade hash algorithm with O(1) worst case lookup time
- Index(es):
Relevant Pages
|