Re: Fastest DFA recognizer




"Torben Ægidius Mogensen" <torbenm@xxxxxxx> wrote in message
news:7z7hmygb05.fsf@xxxxxxxxxxxxxx
"Peter Olcott" <NoSpam@xxxxxxxxxxxxxx> writes:

I was wondering if there is a generally faster way to
implement a DFA
recognizer than by using a state transition matrix.

Not asymptotically faster, since it is linear (assuming
matrix lookup is
constant time). You can sometimes get faster code by
translating the
matrix into code (essentially a switch statement per
state) instead of
using an array, but it depends on the compiler which is
faster. Flex
does this.

I use a state transition matrix to look up the switch case
value. I can not imagine how to (for example) match 50
different keywords without the table and only use just the
switch statement. How is this done?


You can also halve the number of table lookups if you
match two
characters at a time. This makes the table enormously
larger, though.
If you use 8-bit characters, the table size is 256 x
states. If you
match two characters at a time, it is 65536 x states,
i.e., 256 times
larger.

Since matrix lookup is not constant time on modern
machines, this might
actually make your code slower. So you can do the
converse: Match half
a character (4 bits) at a time, which makes your table
roughly 32 x
states (since you need to insert extra states). This
reduces your table
by 8, which will more likely make it fit your cache.
Obviously, if
there are only a few states, it will anyway.

Torben


.