Re: Suggestions for double-hashing scheme
- From: websnarf@xxxxxxxxx
- Date: 13 Jun 2005 02:04:11 -0700
Tim Rentsch wrote:
> websnarf@xxxxxxxxx writes:
> > Tim Rentsch wrote:
> > > websnarf@xxxxxxxxx writes:
> > > > Clint Olsen wrote:
> > > > > I'm interested in coding up a double-hash implementation (rather than
> > > > > chaining). Why? Well, because it is an interesting problem, and I'm
> > > > > trying to see if an implmentation can work just as good as chaining to
> > > > > handle collisions.
> > > >
> > > > [...snip...]
> > > >
> > > > Once you sufficiently optimize a hash table, the big cost in hashing
> > > > turns out to be the cost of *COMPARING* the entries (*copying* entries
> > > > (if you aren't using just refs) is the next secondary effect, followed
> > > > by computing of the hash function). So don't obsess too much over
> > > > "clever" techniques for reducing delete impact. You should think of
> > > > "Delete" as adding design complications, not performance problems.
> > >
> > > An over-generalization. Comparing can be relatively expensive,
> > > but it often (or even mostly) isn't. Besides the case where
> > > elements in the hash table are pointers, which is a common case
> > > in its own right, [...]
> >
> > This is true, but you are not carrying my statement to its logical
> > conclusion. The point is that, once optimized, the entire overhead of
> > managing the hash table is very low. If nothing costs anything in your
> > hash table, then indeed, your performance bottleneck may end up going
> > anywhere, but more than likely in these situations your performance hit
> > isn't going to be hash table access at all.
>
> Irrelevant to the point about the relative costs of comparing, hashing,
> etc. Sometimes comparing is the most expensive part of hash table
> lookup, sometimes it isn't.
If you manage to make the comparison costs that are lower than length
prefixed string compare, then the whole thing has such low overhead, as
to not be worthy of focus. Same is true of quicksort/heapsort, or
other sorting algorithms. Yet, you always see # comparisons quoted for
the cost of these sorts.
I've made these statements after looking at performance profiles. I
don't know what you are looking at.
> > > often compares of, for example, structures, differ in the first
> > > few bytes. Doing a trivial reject by doing a 4-byte compare on
> > > the first word of the two elements often lets structure
> > > comparison run at almost the full speed of pointer comparison.
> >
> > Ok, but we can see that from a O(*) point of view this is all
> > equivalent. The problem with any variable length structures, or
> > entries where you are hoping the first 4 bytes to be a sufficient
> > differentiator is that you are at the mercy of the internal structural
> > bit determinability. And you'll eat an additional branch miss penalty
> > when you are wrong.
> >
> > By comparison, any good hash function basically always has a minimal
> > probability of "false positives" regardless of internal structure. It
> > also doesn't add anything to the raw cost, because you have to
> > calculate the hash value for every entry you search for or insert
> > anyway.
> >
> > So you may have to do some analysis to *know* whether or not raw
> > comparison is good enough. Simply using prescreening removes this
> > uncertainty uniformly. This more closely follows my personal focus of
> > making universal libraries that *always* work well regardless of the
> > situation, as opposed to libraries that only work under narrow
> > situations that you try to justify/impose upon your consumers.
>
> You seem to be assuming that the structures are variable length, which
> they may not be.
No, I am not. Even complex comparison typically boils down to lexical
comparisons. That means the number of raw comparisons are variable,
even if the structures are fixed in size.
> > > > Anyhow, coming back to what *IS* important. Since comparing is where
> > > > the real bottleneck is, you have to do anything possible to reduce this
> > > > impact. How can you do this? First of all, the "swap to front"
> > > > heuristic can help, but more importantly you need to do hash-value
> > > > *PRE-SCREENING*. That is to say, entries should retain their full
> > > > hash-value along with them in the table, and as you scan, you first
> > > > check that the hashes match (between the one in the table and the entry
> > > > you are searching for) before incurring the expensive cost of comparing
> > > > entries (if hash(e1) != hash(e2), which is fast to compute, then you
> > > > already know that e1 != e2). In this way, so long as your hash
> > > > function is reasonable, the comparison cost is absolutely minimized
> > > > (you basically only perform comparisons when its practically a forgone
> > > > conclusion that the entry is a match). (Remember, you keep the entire
> > > > *original* hash function along with each entry, not just the index for
> > > > h0() or h1().)
> > >
> > > Keeping the hash value is a good technique to know, but
> > > whether it's a good idea in any particular instance depends
> > > on other things, including the size of the elements being
> > > hashed. If the elements being hashed are 8 bytes and a hash
> > > value is 4 bytes, it's probably better to keep just the
> > > element being hashed, which means for the same space the
> > > load factor will be 33% less. The smaller load factor will
> > > mean fewer collisions and almost certainly translate into
> > > better performance.
> >
> > This is important for the cases you describe if you are memory
> > constrained, or if you expect the size to be right on the edge of a
> > cache size.
>
> Unless you want to do an apples and oranges comparison, the different
> approaches should be compared using the same amount of memory for the
> hash table. For any *given* amount of memory, not storing the hash
> value will result in a better load factor than storing it.
True, but "load factor" is an abstract thing somewhat related to
performance. False positive comparisons have a direct performance
impact; it'll just end up being a percentage of your costs (it
increases the # of comparisons by about 6.5% in a search limited test I
ran).
> > > > > In double hashing, the overall hash is calculated via:
> > > > >
> > > > > H(k) = h0(k) + n * h1(k)
> > > > >
> > > > > Where h0 and h1 are the hash functions and n is the probe attempt (0, 1, 2,
> > > > > ...).
> > > >
> > > > Yes. But its important to note that the total number of bits required
> > > > to produce h0 and h1 is almost certainly < 32. This requires some
> > > > analysis:
> > > >
> > > > Computing h1() is interesting because on the one hand it must not be 0
> > > > and each possibility must be coprime with the current table size and
> > > > other possible values for it. And on the other hand the range of
> > > > possible values of h1() does not need to be all that great.
> > > >
> > > > First of all, if you don't already know about it, familliarize yourself
> > > > with the "Birthday Paradox":
> > > > http://en.wikipedia.org/wiki/Birthday_Paradox . Understanding this is
> > > > key to understanding how to properly design your reprobe strategy. If
> > > > you don't understand the Birthday Paradox, then you are going to have a
> > > > hard time designing a truly sound and efficient hash table.
> > > >
> > > > Basically in a hash table that's starting to fill up, the worst
> > > > colliding initial h0() positions might have around 10 entries aliasing.
> > > > What you want is for each of these entries to have a different reprobe
> > > > skip value. The Birthday Paradox basically says that if you pick
> > > > random choices amoungst a palette of about 100 entries, then you can
> > > > expect about "1" additional collision in skip values. So this is quite
> > > > tolerable.
> > > >
> > > > I also said above that the possible h1() choices should be coprime with
> > > > each other. Hopefully this is easy to understand -- if h1() were to
> > > > output 3 then 6, for example, then chains that followed those probe
> > > > values from the same h0() would intersect quit a bit! Whereas, if it
> > > > were to output 5 and 7, then their common intersections are a lot less
> > > > minimal. So being coprime with each other makes a big difference.
> > > >
> > > > Ok, so what does this all mean? It means that h1() essentially doesn't
> > > > need more than about 7 bits worth of degrees of freedom, and should be
> > > > taken from a table of numbers which are non-zero and coprime which each
> > > > other, as well as the size of the hash table itself. An array of the
> > > > first 128 primes starting from, say 11 (to set a lowest expected
> > > > interesection length that is greater than the greatest expected reprobe
> > > > list), indexed from a uniform 7 bit hash function is sufficient. There
> > > > are other choices, but this is clearly the most obvious one that fits
> > > > the above described criteria.
> > >
> > > The idea that the choices for h1() should be restricted to
> > > sets that have only prime values is flawed. There will be fewer
> > > collisions if *all* values that are relatively prime to the table
> > > size (but not necessarily to each other) are in the set of
> > > choices for h1(). For example, assuming a table size of 512
> > > entries, and looking at the first 5 re-probes, the two cases
> > > calculate out as follows:
> > >
> > > 256 relatively prime values => collision density of 0.234%
> > > 93 primes >= 11 and < 512 => collision density of 0.355%
> >
> > What the hell is this measuring? Explain the scenario. Have you
> > entered more than 5 entries off the same initial offset and are you
> > measuring *average* number of collisions or something like that?
> >
> > You've also made a critical analytical mistake. The range of the
> > primes must also satisfy (in this case) the restriction that p*5 < sz.
> > Otherwise it will just wrap around the table and actually lose
> > coprimality property it was designed for. I.e., for tables of size
> > 512, you should pick primes between 11 and 102, which is actually far
> > fewer than 93 choices, but will actually have fewer collisions.
>
> Consider the case where two entries hash to the same slot, and their
> rehash values are chosen independently and randomly from sets of
> rehash values r0, r1, r2. Using a table size of 512, and considering
> "landing spots" chosen randomly among the first 5 reprobe slots:
>
> if r0 is the set of primes between 11 and 102, and
> if r1 is the set of primes between 11 and 512, then
>
> r1 is better than r0; there is a smaller chance of interference
> (common landings) in the first 5 reprobe slots using r1 than r0;
>
> and
>
> if r1 is the set of primes between 11 and 512, and
> if r2 is the set of odd numbers between 0 and 512, then
>
> r2 is better than r1; there is a smaller chance of interference
> in the first 5 reprobe slots using r2 than r1 (or any proper
> subset of r1)
>
> Although the description here is informal, [...]
Very. You've ignored the whole reasoning behind my design. You only
look at two probes coming off the same offset. Firstly, the birthday
paradox does not apply, and secondly this doesn't measure the real
problem my design addresses which are worst case scenarios.
> [...] I think it's complete enough so anyone with a basic
> background in statistics should be able to see what is meant.
Maybe, but they have no idea if what you are saying is sound. (Its
not.)
> These numbers are easy to calculate; try calculating actual numbers
> rather than just reasoning about what the numbers should be.
Ok, stop waving your hands and describe exactly what these numbers
*ARE*, and what relation they have to a real hash table. I'm not going
to calculate nonsense for no reason.
> The statements above also hold for 10 reprobe slots.
Well your statements about 5 and 10 reprobe slots have equal merit.
> > Look. 1) Practical testing indicates that reprobe length should not
> > exceed about 10. 2) Average case is not so important because the
> > effectiveness of the initial offset dominates -- the average case has,
> > in fact, a reprobe length < 2.
>
> True but irrelevant.
Its not. The point is that what you are focussing on as the problem is
already addressed by the primary hash function. Average cases don't
matter, since they are, in essense, already solved.
What matters is worst cases. This is why people consider
double-hashing instead of fixed-offset/linear hashing in the first
place. That your alternatives seems to cause initial intersections to
miss other entries off the same initial offset is interesting, but says
nothing about hitting other entries.
The real value of the secondary hash function is how it performs under
the stress of a filling hash table. You see my design requires
intersections to either always happen with different reprobe sequences
in at most one position each, or the uncommon case of picking the
idenitcal secondary hash function. So long accumulations are just less
common.
> > The value of the secondary hash value is that it should be effective on
> > average given the < 10 collision assumption, as well as being as
> > effective as possible in the *worst* cases. The fact is that 2 and 4
> > are going to be relatively prime with a prime sized table -- but those
> > choices have a 50%/100% intersection rate, which is a horrible worst
> > case. I have explained this in another post on this subject.
>
> I read the earlier explanation. It was wrong before; repeating it
> doesn't make it right.
I have only direct analysis (which I've posted and you have not
addressed beyond claiming its wrong), design (its meant avoid worst
case scenarios in ways that other solutions simply don't address) and
practical data (around a 3% average probe length advantage versus "all
possible odd numbers", for example) to back up my assertion. You've
got hand waving.
> > What the hell is this measuring? [...repeated from above...]
> >
> > > Incidentally, the above are theoretical calculations, not
> > > experimental measurements; they don't depend on specific input
> > > values. The numbers above should hold across large numbers of
> > > hash table usage, if the hash functions being used are good.
> >
> > But the numbers are also wrong, as explained above.
>
> If you don't know what it what I was calculating, how can you say
> it produced the wrong numbers? :)
Because you messed up the prime table choice limits, you are looking at
probe lengths only up to 5, and you are only looking at two probes.
That's how I knew what you were doing was wrong, without knowing for
sure what it is you are measuring.
> > > > So why go into all that much detail just on h1()? The key thing is to
> > > > realize that it uses so few bits (only 7) from a hash function which is
> > > > likely to output so many more bits. Furthermore the h0() function
> > > > itself is generally going to use far less than 32 bits. So hopefully
> > > > this makes it clear that the right design is to use a *SINGLE* hash
> > > > function that outputs, say, 32 bits, and split off as many bits as is
> > > > required for h0(), and use the remaining bits (up to 7 of them) as the
> > > > index for h1(). If you look at Bob Jenkin's page, or my page on hash
> > > > functions you will see that basically all high performance hash table
> > > > hash functions output 32 bits (or more.) So this seems to be the
> > > > precise sweet spot for hash function/table design.
> > > >
> > > > Although this seems to limit the hash table to 33 million entries,
> > > > actually, the "7 bits for h1()" is overspecified. That is to say, as
> > > > your hash table grows beyond 33 million entries, you can simply
> > > > cannibalize "h1 bits" for use in h0. My own testing indicates that
> > > > h1() is still effective with as low as 3(!) bits for indexing (for half
> > > > a billion entries) and of course it still technically functions even if
> > > > there no bits available for h1() (at which point you should be thinking
> > > > about 64 bit solutions anyways -- this is the one redeeming aspect of
> > > > the FNV hash function, since 64 a bit version of it exists.)
> > >
> > > If N bits are needed to choose among the complete set of h1()
> > > values for this table size, we could choose h1() based on "low
> > > N bits of hash" and h0() based on "hash ^ hash>>N".
> >
> > Its the same difference, except you may not need h1(). Just as a
> > matter of micro-optimization, you would rather make sure h0() gets all
> > the performance attention, while h1() can eat a minor penalty as a
> > result. The balance is a net win.
>
> I prefer to focus on quality of implementation first, and worry about
> performance later (and only if necessary). It's far more likely that
> the performance bottleneck is somewhere other than the few operations
> needed here.
This is why its called a micro-optimization. Nevertheless, its
completely correct as I stated it.
> [...] And, if I recall, you advocated shifting the hash value
> and indexing a table of primes to get the rehash value; loading a
> word from memory is more likely to noticeably affect performance than
> a shift and an xor.
The table is extremely small. The load is likely to come from the L1
cache, regardless of how the hash table as a whole grows.
Reducing the intersection costs of the worst cases is where I
concentrated my efforts wrt the "quality" of my hash table analysis.
> > > > Another important note is about the size of the hash table itself. A
> > > > common mistake in the literature and elsewhere is to pick the table
> > > > size as a prime number. They do this in order to relax the constraints
> > > > on h1() in order to guarantee that it traverses the table with a mere
> > > > != 0 criteria. Hopefully my explanation above shows how naive and
> > > > pedestrian such reasoning is. Worse yet is that the -> index =
> > > > hashfunction(entry) % tablesize; line all of a sudden has a new
> > > > bottleneck, namely the "%" function. As you well know, "%" (or
> > > > equivalently division) by a non-constant number is basically one of the
> > > > slowest operations you can do on any CPU outside of transcendentals or
> > > > IO.
> > >
> > > This analysis is superficial. First, the cost needs to be
> > > paid only on the initial calculation
> >
> > The majority of entries have a probe length of exactly 1. The initial
> > cost is the whole kit and kabootle. Who is being superficial here?
>
> First, if you read what I said, it says the analysis is superficial,
> not the person writing it.
Fine, change that to "Which is superficial here?"
> Second, the average number of probes is *always* greater than 1, even
> if 75% of the lookups hit on the first try. An average probe length
> of 2 or slightly higher is usually a better estimate.
My examination shows that average probe length (for a scenario
dominated by lookups of already present entries) is something like 1.4.
So the first entry cost still represents about 70% of the performance
hit no matter what.
BTW, did you also know that 99.9% of all people have greater than the
average number of arms?
> Third, there seems to be a built-in assumption that the load factor
> must be low (I remember seeing a number of 70% somewhere). In some
> applications it makes sense for the load factor to be higher, 90% or
> 95%.
This is if you decide to kill yourself on the design in pursuit of that
extra 20-25%. Your balancing re-insert idea is going to radically
increase insert costs as you near your load factor saturation point,
because a growing percentage of probe sequences will be exactly the
length of the maximum allowable.
> [...] To repeat my basic point, choosing a prime table size makes
> sense *in some applications*. The analysis above seems to ignore such
> possibilities, hence it is "superficial" (superficial isn't the same
> as wrong, remember, just superficial).
You haven't substantiated this claim that its actually good for some
applications. I think you just lose because of the enormous cost of
modulo no matter what.
> > > -- reprobes can be done using
> > >
> > > hash = hash + h1;
> > > if( hash >= table_size ) hash -= table_size;
> >
> > Because branches never cost anything right? You simply aren't looking
> > at this correctly. I don't know if you've ever done hand-optimization
> > or micro-optimizations before, but you are really missing it here.
> > Once you tweak the hash table design, the only thing you are left to
> > focus your attention on, wrt to performance, are this
> > micro-optimization issues.
>
> My statement was that reprobes can be done using this technique,
> which is correct; and that the cost of doing the comparison and
> subtract is less than the cost of doing a modulo, which is true
> on most platforms.
But with your design, you need one modulo for each of h0 *and* h1 (if
necessary), and given an average probe length of < 2, you are dominated
by these costs. Trust me, I've measured this, you just are.
> > > Second, the benefits of lower collision density for reprobes
> > > when using prime table sizes is ignored.
> >
> > Only if we ignore the fact that you've messed up the prime table range.
>
> Choosing only primes for the rehash value set will result in worse
> reprobe collision performance.
An unsubstantiated assertion. Also just wrong as far as my data shows.
> [...] If you go through the calculation
> for choosing two random independent values from different sets of
> rehash values you should see what I mean.
No I won't, because that's not how a hash table behaves. On the other
hand, if I were to, say, actually build multiple kinds of demo hash
tables and examined this for myself, I just might see the truth. Take
a wild guess as to what I did to see the truth of this?
> > > Third, just because the table size is prime doesn't mean that
> > > a runtime modulo operation must be done. Chuck Falconer's
> > > hashlib, for example, uses just a small number of prime table
> > > sizes (about 20, but it could just as easily be 50 or 100) --
> > > it would be easy to produce a routine with a 'switch()'
> > > statement where each of the modulo operations were done using
> > > a compile time constant.
> >
> > The performance of "switch()" is equivalent to a branch miss, or a full
> > pipeline flush. So its a little faster than divide, but not much. Add
> > to that the cost of the simulated divide, and you are not going to come
> > out ahead. Interesting that you are willing to mismeasure "collison
> > densities" but unwilling to measure this. Trust me, the results will
> > not inspire you.
>
> The measured cost of a branch miss on my platform is only 20% the cost
> of a (non-constant) modulo operation, or in other words about faster
> by about 5 times. Just FYI.
Ok, so you have a crummy divide. Now you have to add in the cost of
the simulated modulo, then the rejoining jump, (not to mention the
registers you spilled and refilled for this all to happen if you are on
an x86.) How's the performance looking now? (Remember, my solution
does single masks -- depending on how the pipelining works out, it
should end up costing about 1 clock along the critical path.)
> > > [...] Depending on the architecture, this improvement might be
> > > better accomplished using a pointer to function (one function
> > > for each of the fixed choices of table size).
> >
> > You, of course, are referring to some architecture will a really fast
> > indirect/indexed jump? Can you name one? Otherwise I'm going to have
> > to dispute your use of the word "improvement".
>
> I've compared switch statements to calls through pointer-to-function
> on several platforms (x86, some Sun processors). In some cases
> calling through pointer-to-function was faster than using a switch
> statement.
If calling through pointer-to-function is faster than switch, then that
is a statement about the poor quality of your compiler. Because guess
what one alternative to implementing switch is?
> > > Another advantage to using prime table sizes is that they naturally
> > > use all bits of the hash value.
> >
> > How can you, on the one hand, claim to expect arbitrary bit masking
> > versus first mixing then masking to perform equivalently (true for any
> > good hash function), then turn around and claim here that using all
> > bits is some advantage?
>
> I never claimed that masking before mixing would perform equivalently
> to mixing and then masking; if you think I did then you misunderstood
> what I was saying.
Oh. This is a well known property of good hash functions. I just
assumed you would endorse it as you posted in another thread. I may
have misunderstood you on this. I won't bother to follow this up.
We'll just leave this here on the record.
> > > [...] Hashing a pointer 'p', for example,
> > > might be as simple as '(uint32_t) p'. This approach could easily
> > > perform poorly in a power-of-two-sized hash table, but would very
> > > likely do just fine in a hash table with a prime number of slots.
> >
> > This is going to do poorly no matter what.
>
> An unsupportable statement, because it is false. I'm willing to bet
> any amount of money that I could find an application of hashing
> where hashing pointer values in this way would work just fine.
You could find a special scenario. That seems to be the basis of your
entire arguments throughout. In any event, how do you think the cost
of modulo would compare to an inlined hash function on a 4 or 8 byte
input?
> > Remember, my design
> > recommendation is to use one good hash function, then extract bits for
> > both h0 and h1 from it. So using '(uint32_t) p' is not an acceptable
> > hash function at all -- certainly you can't use it for h0(), where most
> > of the performance costs are focused. Since you have to pay the price
> > to get a good h0() anyways, and since a good h1() basically pops out
> > almost for free, this idea just doesn't apply at all.
>
> I tend not to give weight to statements of opinion or unsupported
> conclusions; what I hope to see is some kind of evidence, or clear,
> cogent reasoning (and hopefully both). Repeating unsupported or false
> statements doesn't make them true.
If you generate a hash *ONCE* then its faster than generating a hash
*TWICE*. If you use (uint32_t) p % sz as one hash function, then what
are you planning to do about the second one? If you can't see this
simple reasoning, then it has nothing to do with my ability to give
evidence or state clearly what the issue is.
> > > If your hash tables are mostly empty (load factor of less than 50%)
> > > then the cost of the initial probe is likely to dominate, and that
> > > first modulo operation is going to look pretty big. But some
> > > applications do better with large hash tables that are almost full
> > > (eg, where the hash table is approximately the size of the L2 cache),
> > > with load factors of more than 90%; in these cases, choosing a table
> > > size that is prime may very well pay off.
> >
> > Fits in the L2 cache ... what's with you and CBFalconer? Why are such
> > tiny hash tables so important to you? You're letting these very
> > particular scenarios affect the design decisions for something that is
> > supposed to be a scalable ADT. How can you, on the one hand here, be
> > concerned about on-chip performance which is a concern *SOMETIMES*, and
> > yet be totally oblivious to the modulo cost for a prime sized table
> > which is a concern *ALL THE TIME*?
>
> Let me repeat my basic point, because you seem to have missed it -
> using tables of prime size can be helpful *in some applications*.
So name one. (It would actually support your point if you named one
where choosing power of 2 really doesn't work well at all.)
> General routines with generic calling sequences are just fine in
> lots of situations; there are other situations where making
> different design decisions yields a better solution *for that case*.
>
> In any case, cache-thrashing because a program is growing beyond
> the bounds of the L2 cache usually has a significant impact on
> performance, and is quite reasonable as a real-world consideration;
It is, if you have a way of staying substantially inside of the L2
cache as an alternative. Hash tables, by their very nature simply
cannot expose any natural locality as they grow beyond the size of the
L2 cache. The very mechanisms that make hash tables work well, work
against any locality. You have to use linear probing with an very
small offsets to play these games. I would assume that linear probing
is just unacceptable on its face, but given that we don't seem to have
any common ground, perhaps I should not assume this.
> the cost of cache-thrashing is very likely to swamp other things
> like whether modulo or ANDing is used to calculate hash table
> indices.
So, if memory hits / locality are what you are worried about most, why
not instead focus on the "move-to-front" heuristic? It would seem
there would be ways of combining it with your "re-insert" idea, thus
decreasing average scan length.
> > > > The more sensible approach, of course, is to pick a table size that is
> > > > a power of 2. Then you have index = hashfunction(entry) &
> > > > (tablesize-1); which removes the "%" bottleneck ("&", as you know,
> > > > basically costs nothing.) The earlier analysis of h1() above makes it
> > > > clear that traversing the table will be a non-issue, if you simply
> > > > choose them from a small table of primes.
> > >
> > > Certainly there are advantages of using a table size that is a
> > > power of two. But there also are drawbacks, and these haven't
> > > been mentioned. Using power of two size tables, the set of
> > > choices is limited: when growing a 128 MB table, for example,
> > > the smallest next size is 256 MB. Using prime size tables, on
> > > the other hand, allows more economical growth, because the
> > > density of primes is so much higher than the density of powers
> > > of two - we can easily find a prime 25%, 10%, or even only 2%
> > > larger. Obviously, whether this economy is needed depends on
> > > the application, but it's nice to have it available when we
> > > need it; limiting hash table code to use only table sizes
> > > that are powers of two removes that choice.
> >
> > Talk about not mentioning the drawbacks -- if you use a "25%", "10%" or
> > "2%" rehash criteria, then guess where all your performance goes? It
> > goes into *REBUILDING THE TABLE*. I've posted elsewhere, that in fact,
> > serious consideration should be put into just doing *quadruplings* of
> > the hash size (obviously you would fall back to doubling if the
> > quadrupling malloc failed), since this will greatly reduce the number
> > of times an entry is re-inserted due to a rebuild.
>
> There isn't always the luxury of doubling, let alone quadrupling. If
> I have a 2 gigabyte hash table on my x86 box, doubling the size isn't
> an option; growing by 10% or 25% probably is.
Uhh ... think about it. How do you rebuild the hash table without
having the previous hash table around? Assuming you need them both in
memory at the same time at the time of build, you are actually limited
to ~1.9GB for a 10% growth and ~1.77GB for the 25% growth scenario. If
you have some way of making this work with realloc() in a way that
makes sense, I'd like to know about it.
If you are quadrupling, and falling back to doubling, then it means
that your previous size before hitting the max size is a lot less than
with your slow-crawl method -- so it is at least possible to use half
of the addressable memory.
In any event, if you are pushing your system this hard, then system
limitations will enter the picture much sooner than software limits
anyhow.
> [...] The point is that
> having the flexibility has value - it doesn't matter how fast a
> program is if it can't do the computation it's needed to do.
You haven't demonstrated this. This scenario is marginal, and is not
supported by your solution anyhow.
> (In fact most systems run into memory limitations before the hash
> table gets to be 2 gigabytes, but the basic point remains - there
> are sizes where a large array simple can't be grown by a factor
> of two.)
But my claim is that the memory pressure is not worse using the
quadrupling method.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
.
- References:
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- From: websnarf
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- Prev by Date: Re: puzzle
- Next by Date: Re: puzzle
- Previous by thread: Re: Suggestions for double-hashing scheme
- Next by thread: Re: Suggestions for double-hashing scheme
- Index(es):
Relevant Pages
|