Re: something like switch in c

From: Edward G. Nilges (spinoza1111_at_yahoo.com)
Date: 07/09/04


Date: 9 Jul 2004 01:59:48 -0700

Programmer Dude <Chris@Sonnack.com> wrote in message news:<mdrre097oio77qk1fae9n1k69a5ckm90ua@4ax.com>...
> Edward G. Nilges writes:
>
> > [snip]
> >
> > [The above is extempore C: collegial correction is always appreciated:
>
> None is required. I've written similar several times.
>
> > Of course, this assumes that there's a large performance benefit to
> > doing switches on single characters as opposed to a series of more
> > straightforward string comparisions.
>
> The trade off is likely the need for great speed and/or a large number
> of strings to compare. The above technique gives you a fast first sort
> and can, in theory, reduce your total comparisons by 1/26. Or 1/52 if
> case matters, and maybe 1/52+ if you throw in some symbols.
>
> Another version of this makes searching a linked list much faster. It's
> similar to a skip list, but in this case maintain a separate fix-size
> table of, say, 28 slots. The middle 26 slots correspond to "A" thru "Z"
> and contain links to the first nodes in the list with a key that starts
> with their character. If the list has no matching nodes, the link is
> to the next node following (or null if no more nodes). The first table
> slot points to the very first node (or null), and the last slot points
> to the node(s or null) following the "Z" nodes. As above, double the
> inner table size and/or add symbols to expand the hash.

These techniques can indeed be very fast but may be superseded by
their (almost as fast) generalization, which is Knuth's classic
hashing algorithm: for arbitrary string x, compute its hash generally
as a function of the table size (in the simplest case mod).

Visit the hash location. If it is occupied scan forward to the end of
the table and, on reaching the end, scan starting at the beginning
until you get back to the starting point or find a matching, empty or
deleted entry. If you find an empty entry or a match, Miller Time. If
you find a delete you must note this place (as a good place to store
the new entry) but continue the search for a match.

Many ingenious "hashing" techniques of yore are but specializations of
this more general method.

In C you can elegantly compute the hash function.

I prefer this more general method simply because it is more flexible.
Of course the choice of hash function is critical.

If you are writing a compiler, for example, and some clown is using
Hungarian, you can't expect to use the first three characters of the
identifiers since they will hash to the same location.

It all depends on the empirical pattern of the actual keys. Of course,
this means it's feasible to write a self-tuning hash function. But
probably not necessary since there are functions that guarantee a
completely random hash for a given set of keys even if the keys have
patterns.

In fact their existence is theoretically provable if you consider the
worst case: each key is the same, as all the other keys. Some clown
has written some program that consists, let us say, of the statement
i=0 replicated 1000 times (seen worse).

The variable i would always hash to the same place if your hash
function is mod the size of the table, therefore your performance will
be O(n*n). But all you have to do is call your friend, the random
number generator, multiply his value times the value of i (whether its
complete numeric value or the value of some part), and mod this to the
table size to get a complete "scattering".

Paradoxically, i being always the SAME in this thought-experiment
makes rnd*value(i) mod tableSize completely random which is surprising
until you think about it, and reflect on what happens when value(i) is
unity. The value of the random number generator is UNCHANGED on
multiplication by unity or any unchanged value.

If on the other hand, the values of the keys change in a patterned
fashion this could I suppose bias the random number generator.

All this is probably somewhere in Knuth. It may sound patronizing but
is not intended, it is merely me thinking without bringing up French
theory.
>
> (That's what all this is, a super-fast hash into a list with very deep
> buckets. (And in this case, buckets with tails that link to the heads
> of the next bucket.) But it can still vastly increase performance if
> performance is an issue.)
>
Sometime in the 1980s I decided the hell with microefficiency and
realized that Knuth was generalizing cruder approaches better used in
many cases than a highly specialized appoach. YMMV: but usually when I
invent a new way of searching it sucks.

In C I generally took the value of the last character mod the size of
the table.

Using nothing more than the classic Knuth hash, a few years ago I
developed a reinvention of the Visual Basic collection that ran in 85%
of the time of the Microsoft collection. However my boss was kind
enough to remind me that Francesco Balena had beaten me by five years.
> > ...programmers in languages where the string appears to be a primitive
> > data type encounter performance hits because the string is composed
> > of many characters.
>
> All strings are comprised of many characters. (Where many is defined
> as "zero or more". :-)
>
> Now,... you can't possibly really think that, for example, VB compares
> strings in some fast atomic fashion that transcends, for example, C's
> character-by-character analysis.... Therefore, I conclude you're just
> trying to start a fight.

Who, me?

My claim was in fact that languages wherein "string" is a primitive
data type including Visual Basic, perl and Java, might be less
"efficient" than languages where the programmer has to invent special
purpose methods for strings such as C.

Therefore I don't claim that VB transcends C in this regard.

In fact, posters here have shown that using complex string expressions
in tight loops is bad in VB.

No reason therefore to argue.

However, I do feel that the micro-inefficiency is outweighed by the
advantage of having strings as a primitive data type.

In fact and in .Net VB string implementation is very fast. When the
string is modified in any way it is copied to a different location in
the heap (fast) and the old version is marked for deletion. Creates
some clutter and gc work but if you need frequent changes there are
string "objects" which support this.

I think the consensus is that strings, albeit complexes made of other
things, should be primitive in most languages. This is because the
great but secret scandal of programming is its fundamentally literary
nature, but I digress.
>
> Sorry. Not interested. Will talk programming. Nothing else.



Relevant Pages

  • Re: Hash keys
    ... hash with a symbol is not the same as using a string. ... symbols for my keys. ... YAML object into a hash, and didn't realize it was keyed with strings so ... What is the preferred method of keying hashes? ...
    (comp.lang.ruby)
  • Re: How to write a diff in VB6 for comparing two xml files?
    ... No, the best you could do is to read both into string and use StrCompbut it's inefficient and, but using the hash ... Private Declare Function CryptAcquireContext Lib "AdvAPI32.dll" Alias _ ... Dim HashAAs Byte, HashLenA As Long ...
    (microsoft.public.vb.general.discussion)
  • Re: How to make PKCS#7 signature using CryptoAPI?
    ... Those MSDN samples hash a string PLUS the null byte (so that it ... I tried your sample and had no problem verifying with openssl (after I added ... functions (including CryptSignMessage). ...
    (microsoft.public.platformsdk.security)
  • Re: How to make PKCS#7 signature using CryptoAPI?
    ... "Mitch Gallant" wrote: ... Those MSDN samples hash a string PLUS the null byte (so that it ... functions (including CryptSignMessage). ...
    (microsoft.public.platformsdk.security)
  • Re: Base36
    ... static string tokens = ... But - I don't think you want all those silly characters in the product key. ... I should be able to recalc the hash at the client ... > conversion to long so I can pass each long to the BaseXX converter to get ...
    (microsoft.public.dotnet.languages.csharp)