Re: Making a hash of things...

From: C (blackmarlin_at_asean-mail.com)
Date: 07/06/04


Date: 6 Jul 2004 06:39:51 -0700


"Randall Hyde" <randyhyde@earthlink.net> wrote in message news:<L4gGc.5698$R36.2326@newsread2.news.pas.earthlink.net>...
> "C" <blackmarlin@asean-mail.com> wrote in message
> news:33d97ee5.0407050559.6d4e1e32@posting.google.com...
> > Been busy writing examples to test Luxasm's macro subsystem
> > under 'normal' usage conditions. But I though (some of) you
> > may be interested in the routines anyway, so here they are.
> > Any optimisation suggestions would be welcome.
> >
> > C
> > 2004-07-05

(I do not thing I worded that well -- I ment that the functions
where part of an -input- file to test Luxasm, not part of Luxasm
itself.)

> An interesting point to note is that some of the hash functions
> require that you first collect the string and operate on the string
> as a whole rather than being able to compute the hash value
> while scanning the characters in the identifier. Keep in mind that
> having to touch all the characters in the identifier twice can be
> expensive.

Yup, though I tend to store the length along with the value
of a string in most of my programmes -- therefore you already
have the length computed when you want to apply a hash.

Having said that, I was thinking the same thing with respect
to the Luxasm scanner -- there is no reason for me not to just
calculate a hash as I read in the label, then return that from
the tokeniser, so the parser can find/allocate a label without
the labels module needing to calculate the hash again.

> You don't want the function to be *too* expensive, and you
> want to avoid table lookups (as the table gets larger, you tend
> to blow the cache away, and that's far worse than the number
> of instructions you spend on each character in the string).

Yup, I just stick to a modified rotating hash (one a little
more complex than the one given above). Though I may try to
knock together a hash with better mixing -- the one-at-a-time
hash looks good on the surface, but is very difficult to
schedule the instruction -- I thing modifying that could work
give the best speed / distribution tradeoff.

> And given the lengths of typical identifiers, worrying about all
> of this is probably a moot point anyway. For example, my
> original code for HLA v2.0 used a repe.cmpsb instruction
> to compare strings when a collision occured. Processing
> 100,000 symbols (in equate statements) took about 0.75
> seconds. I replaced the *slow* repe.cmpsb instruction
> with a very fancy string compare algorithm optimized for
> comparing identifiers. New time? 0.75 seconds. Needless to
> say, I went back to the repe.cmpsb method (as it's a lot
> easier to understand and maintain).

:-) Yes, I use rep/cmpsb for collision resolution too, though
I do check the length before applying a string compare and the
binary tree does work as an excellent complement to the hash
table -- been using them that way for years.

> Because of the small number of symbols in a typical identifier,
> and the fact that programmers tend to use a lot of the same
> characters in their identifiers, and also the fact that programmers
> tend to use a lot of the same prefixes in their symbols,
> no matter how fancy your hashing function is, the distribution
> through the hash table is going to wind up being based
> on just a few characters in a typical identifier.

Yes, that is why I think I need better mixing than the current
function I am using. (Though with a 16384 entry table, plus
seperate tables for keywords and locals names, the chances of a
collision are remote anyway.) Also I may get around to adding
a perfect hash function for the keywords table -- though that
will have to wait until the keywords list stabalises.

> One thing that *did* make a small difference in HLA v2.0
> was the use of a binary search to handle collisions.
> In certain degenerate cases, where you choose symbols
> specifically to thwart the hashing function, using the binary
> search help overcome the degenerate operation of the hash table.

Yes -- but you could choose symbols to thwart both, and then it
would have the performance of a linked list :-), but I think
we can safely say that the chances of that happening by accedent
are approaching zero. What would the complexity function of a
hash-table plus binary-tree be anyway, my guess would be:
O( ( n ln n ) / n ).

C
2004-07-06



Relevant Pages

  • Re: How to omit blank spaces in the text?
    ... Private Sub Command1_Click ... Dim ssql As String, ssql2 As String, ssql3 As String, trimname As String ... character set from &H21 to &H7E provides for ASCI alpha numeric characters ... >> When the password is first created you calculate the hash and store>> that, ...
    (microsoft.public.vb.general.discussion)
  • Reducing the chance of collisions in known encryption systems
    ... is needed is any password that will result in the captured hash. ... A very long string of random characters ... Firstly a rule is decided upon so that we dont make this terminally ...
    (sci.crypt)
  • Re: How to write a diff in VB6 for comparing two xml files?
    ... No, the best you could do is to read both into string and use StrCompbut it's inefficient and, but using the hash ... Private Declare Function CryptAcquireContext Lib "AdvAPI32.dll" Alias _ ... Dim HashAAs Byte, HashLenA As Long ...
    (microsoft.public.vb.general.discussion)
  • Re: something like switch in c
    ... >> straightforward string comparisions. ... > inner table size and/or add symbols to expand the hash. ... It all depends on the empirical pattern of the actual keys. ... The value of the random number generator is UNCHANGED on ...
    (comp.programming)
  • Re: How to make PKCS#7 signature using CryptoAPI?
    ... Those MSDN samples hash a string PLUS the null byte (so that it ... I tried your sample and had no problem verifying with openssl (after I added ... functions (including CryptSignMessage). ...
    (microsoft.public.platformsdk.security)