Re: Hash function for int-aligned text (was: accessing char as intthrough union)



Hallvard B Furuseth wrote:

Thanks to several people for answers and explanations, in particular
websnarf. A few details:

websnarf@xxxxxxxxx writes:
Hallvard B Furuseth wrote:

.... snip ...

The keys are words in a dictionary or text file, average size
just 5-6 characters + the padding zeroes. (And most searches
find nothing.)

Oh well, these are extremely short strings.

Actually that was quite wrong, I seem to have counted a test sample
instead or something at a late night. 5-6 letters is more like it
for some inputs, ~10 for others.

What you're going to what to do is special case them by *length*,
not alignment. Then just completely unroll the hash computation
for all the short cases (say up to 16 characters or so) and bail
out with a more general hasher (like mine, or Bob Jenkins').

Aah, of course.

Of course the overhead of splitting off the long strings, and thus
of calling strlen on them, will probably overshadow any gains,
especially when the expected lengths are short. You can't really
say anything without making measurements on the actual data.

--
Some informative links:
<news:news.announce.newusers
<http://www.geocities.com/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/>


.



Relevant Pages

  • Re: Great SWT Program
    ... "control-Z to undo" is as natural as breathing. ... obtuse and difficult to learn when the arrow keys are ... characters and then type in the resulting number before hitting my ...
    (comp.lang.java.programmer)
  • Re: transforming german characters
    ... Thank you John, this is really useful. ... [snip - my statement of problem] ... ># Use a character class because all keys are single characters ...
    (comp.lang.perl.misc)
  • Re: general design issue
    ... i'd like to know if there are keys starting with given prefix + 'a', ... Different dialects of SQL use diferent characters for ... "key" is a reserved word in a number of SQL dialects. ...
    (comp.databases)
  • Re: perl style: can I combine two steps into one?
    ... > I have a hash where the keys are strings and the values are integers. ... > I wanted to produce a report of the name/value pairs, ...
    (comp.lang.perl.misc)
  • Re: store collections of strings in tdb or gdbm
    ... >> - Each record consists of a fixed number of strings. ... I know nothing about tdb so feel free to ... string_type contains no special characters. ...
    (comp.programming)