Hashing

From: Chewy509 (chewy509.doesnt.like.spam_at_austarnet.com.au)
Date: 03/02/05


Date: Wed, 2 Mar 2005 21:15:42 +1000

Hi Everyone,

I've been looking into several hashing algorithms, in particular for use in
a compiler/assembler.

While the tens of texts I have read, all discuss the differences between
them, and appropriate ways to implement them (mainly discribed in C). A few
of the texts indicate that certain hash algorithm's are better suited to
particular domain due to inherited capacity to avoid hashing collisions
within the application domain. This makes quite a bit of sense, as some
would be suitable for authentication (where rare occurances of collisions
can occur without too much grief), while others are more suited for hashing
tokens/labels within a source file (where a single collision can mean the
difference between an assembler/compiler not working). However they don't
indicate what algorithm is most suited to what application.

So, since we have a few people here that have implemented assemblers and/or
compilers (notably Randy and Rene), what hashing algorithms are best suited
for hashing tokens within a source file, that essentially will only contain
7bit ASCII, with tokens (or labels) no larger than 32 characters?

At this stage speed is NOT a requirement, but the inherit ability of the
algorithm to avoid collisions within the domain is extremely important. I
would prefer that the target hash size to be 64bit, but larger would be fine
if I could guarantee that with the algorithm would generate no collisions
within the above domain, eg 7bit ASCII (really just 'a..z', 'A..Z', '0..9',
'_' characters only), with token length no larger than 32 characters.

PS. I still haven't finalised what the largest token/label length should be,
is 32 characters a good size, or should I make this smaller or larger? How
often would people use labels larger than say 16 or 32 characters?
PPS. This is for a mini project of mine to build my own HLA. (don't worry
Randy, I have no desire to rule the HLA niche of assemblers, just to learn
about lexical scanning and parsing in a lot more detail, and obviously learn
the basics of language development).

-- 
Darran (aka Chewy509)... 


Relevant Pages

  • Re: Encryption ??
    ... algorithm itself, at least not in this area. ... I stated, not very clearly, that the algorithm as implemented had a potential buffer overflow problem when it was used correctly i.e. when the text was an exact multiple of 8. ... Notice now that not only don't we have buffer overflow, but it still seems like we encrypt and decrypt the entire string even though I removed the memcpy function that was supposed to copy these extra characters. ...
    (comp.lang.clipper)
  • Re: Simple Algorithm (Perfect Hashing?)
    ... I'd like to convert an ascending sequence of variable length numbers ... This could be then used to populate the database. ... that there must be an algorithm that does something like the ... Collisions are naturally not permitted! ...
    (comp.programming)
  • Re: Beginners Algorithm
    ... > For fun I decided to whip up an encryption algorithm using Java. ... > character map technique so I could choose the valid input/output characters. ... > characters in the encrypted string ... If you want to encrypt in Java, implement any of the five AES finalists. ...
    (sci.crypt)
  • Re: Web Site Hackers
    ... method other than a bug in the password algorithm. ... I wonder what determines an 'easy to crack PW'! ... If you have a password of "kcifix", 6 characters long all small letters. ... to the power of 6 combinations. ...
    (rec.outdoors.rv-travel)
  • Re: best language for text string search
    ... That is a small data set. ... characters and then searching for a numeric string within as it was ... are often best done using trees and Fortran has very poor support for ... I suspect such an algorithm will literally build the ...
    (comp.programming)