Re: Making a hash of things...
From: C (blackmarlin_at_asean-mail.com)
Date: 07/06/04
- Next message: Betov: "Re: A Symbol Table Benchmark"
- Previous message: Randall Hyde: "Re: A Symbol Table Benchmark"
- In reply to: Randall Hyde: "Re: Making a hash of things..."
- Next in thread: Alex McDonald: "Re: Making a hash of things..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 6 Jul 2004 06:39:51 -0700
"Randall Hyde" <randyhyde@earthlink.net> wrote in message news:<L4gGc.5698$R36.2326@newsread2.news.pas.earthlink.net>...
> "C" <blackmarlin@asean-mail.com> wrote in message
> news:33d97ee5.0407050559.6d4e1e32@posting.google.com...
> > Been busy writing examples to test Luxasm's macro subsystem
> > under 'normal' usage conditions. But I though (some of) you
> > may be interested in the routines anyway, so here they are.
> > Any optimisation suggestions would be welcome.
> >
> > C
> > 2004-07-05
(I do not thing I worded that well -- I ment that the functions
where part of an -input- file to test Luxasm, not part of Luxasm
itself.)
> An interesting point to note is that some of the hash functions
> require that you first collect the string and operate on the string
> as a whole rather than being able to compute the hash value
> while scanning the characters in the identifier. Keep in mind that
> having to touch all the characters in the identifier twice can be
> expensive.
Yup, though I tend to store the length along with the value
of a string in most of my programmes -- therefore you already
have the length computed when you want to apply a hash.
Having said that, I was thinking the same thing with respect
to the Luxasm scanner -- there is no reason for me not to just
calculate a hash as I read in the label, then return that from
the tokeniser, so the parser can find/allocate a label without
the labels module needing to calculate the hash again.
> You don't want the function to be *too* expensive, and you
> want to avoid table lookups (as the table gets larger, you tend
> to blow the cache away, and that's far worse than the number
> of instructions you spend on each character in the string).
Yup, I just stick to a modified rotating hash (one a little
more complex than the one given above). Though I may try to
knock together a hash with better mixing -- the one-at-a-time
hash looks good on the surface, but is very difficult to
schedule the instruction -- I thing modifying that could work
give the best speed / distribution tradeoff.
> And given the lengths of typical identifiers, worrying about all
> of this is probably a moot point anyway. For example, my
> original code for HLA v2.0 used a repe.cmpsb instruction
> to compare strings when a collision occured. Processing
> 100,000 symbols (in equate statements) took about 0.75
> seconds. I replaced the *slow* repe.cmpsb instruction
> with a very fancy string compare algorithm optimized for
> comparing identifiers. New time? 0.75 seconds. Needless to
> say, I went back to the repe.cmpsb method (as it's a lot
> easier to understand and maintain).
:-) Yes, I use rep/cmpsb for collision resolution too, though
I do check the length before applying a string compare and the
binary tree does work as an excellent complement to the hash
table -- been using them that way for years.
> Because of the small number of symbols in a typical identifier,
> and the fact that programmers tend to use a lot of the same
> characters in their identifiers, and also the fact that programmers
> tend to use a lot of the same prefixes in their symbols,
> no matter how fancy your hashing function is, the distribution
> through the hash table is going to wind up being based
> on just a few characters in a typical identifier.
Yes, that is why I think I need better mixing than the current
function I am using. (Though with a 16384 entry table, plus
seperate tables for keywords and locals names, the chances of a
collision are remote anyway.) Also I may get around to adding
a perfect hash function for the keywords table -- though that
will have to wait until the keywords list stabalises.
> One thing that *did* make a small difference in HLA v2.0
> was the use of a binary search to handle collisions.
> In certain degenerate cases, where you choose symbols
> specifically to thwart the hashing function, using the binary
> search help overcome the degenerate operation of the hash table.
Yes -- but you could choose symbols to thwart both, and then it
would have the performance of a linked list :-), but I think
we can safely say that the chances of that happening by accedent
are approaching zero. What would the complexity function of a
hash-table plus binary-tree be anyway, my guess would be:
O( ( n ln n ) / n ).
C
2004-07-06
- Next message: Betov: "Re: A Symbol Table Benchmark"
- Previous message: Randall Hyde: "Re: A Symbol Table Benchmark"
- In reply to: Randall Hyde: "Re: Making a hash of things..."
- Next in thread: Alex McDonald: "Re: Making a hash of things..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|