Re: the lowest number of comparisons in string matching
- From: "Tim Peters" <tim.one@xxxxxxxxxxx>
- Date: Fri, 22 Sep 2006 14:08:05 -0400
[laura]
I'm looking for the algorithm which does the lowest number of
comparison for the string matching problem (in the worst case).
I know that the complexity is liniar in the length of the strings, but
I'm interested in finding the actual number of comparisons.
The lowest number (theoretical) is m-n+1, but as far as I know no
algorithm has achieved this limit.
(where m is the lenght of the text and n is the length of the pattern).
Well, m-n+1 can't be right -- for example, when m=n the worst case is that
the pattern and the text are the same, in which case we obviously need n
comparisons to determine that they /are/ the same. For example, find "dog"
in "dog". Unless you make at least m=n=3 comparisons, you can't be sure it
matches.
If you allow for preprocessing of the pattern, then there are ways to build
searchers for general regular expressions (of which a fixed string is a
simple special case) that look at each character of the text through the
first match exactly once. How you count comparisons is a bit fuzzy then,
since a recognizing automaton generally /branches/ on the current text
character via some form of indexed lookup table. For the "dog" example:
state := S0
for i := 1 through m:
switch on state:
case S0: /* looking for initial 'd' */
if text[i] = 'd':
state := S1
continue looping
case S1: /* have 'd', looking for 'o' */
if text[i] = 'o':
state := S2
else:
state := S0
continue looping
case S2: /* have 'do', looking for 'g' */
if text[i] := 'g':
return success /* leftmost 'dog' ends at text[i] */
else:
state := S0
continue looping
return failure /* 'dog' doesn't appear in the text */
That's so simple I didn't need a lookup table for the text characters, and
it obviously does no more than m comparisons (exactly one comparison is done
per trip through the loop, and the loop executes no more than m times). You
could complicate it a bit more to return failure at the start of state S0 if
i > m-2, and at the start of S1 if i > m-1 (in which cases, not enough
characters remain in the text for it to be possible to reach state S2). You
can also replace the loop and the switch on the state with a pile of gotos
coupled with a bunch of tedious code duplication.
While it may not be immediately obvious, it's possible to construct an
essentially similar searcher for any pattern. Such constructions can be
found in any text covering finite automata and regular languages.
.
- Follow-Ups:
- Re: the lowest number of comparisons in string matching
- From: Tor Myklebust
- Re: the lowest number of comparisons in string matching
- References:
- Prev by Date: Re: (Probably flawed) Polynomial time Graph Isomorphism
- Next by Date: Re: (Probably flawed) Polynomial time Graph Isomorphism
- Previous by thread: the lowest number of comparisons in string matching
- Next by thread: Re: the lowest number of comparisons in string matching
- Index(es):
Relevant Pages
|