Re: divide & conquer algorithm to implement string matching



misterio <giagimari@xxxxxxxxx> writes:

On 29 Mag, 14:58, Richard Heathfield <r...@xxxxxxxxxxxxxxx> wrote:
Maybe I'm missing something, but I really don't see how d&c works for
you in this situation.
I'm not trying to use divide & conquer to find any real benefit from
it in this case,I'm just trying to understand how to properly use it
for this problem,because I need to understand this to prepare an
exam.Usually I can use divide & conquer to solve problems,but this
particular one is giving me some headaches.If I would have wanted to
use an effective algorithm to implement string matching,I would have
used the KMP algorithm.
Thank you for any kind of advice that you can give me.

Of course, to apply any rule of thumb, you have to adapt it to the
current circumstances.

Instead of "dividing" by cutting the search string exactly in two, you
can cut it so as to avoid missing any occurences.

pattern: 1011

string: 10100101110001010110

cut into:
string 1: 10100101110
string 2: 110001010110

Actually, you don't really need to cut the string itself, only the
range you search.

string 1: 10100101110
^ ^
| |
start1 end1

string 2: 110001010110
^ ^
| |
start2 end2



(defun d&c-search (pattern string start end)
(if (= start end)
(and (string= pattern string :start2 start :end2 (+ start (length pattern)))
start)
(or (d&c-search pattern string start (truncate (+ start end) 2))
(d&c-search pattern string (1+ (truncate (+ start end) 2)) end))))

(defun my-search (pattern string)
(d&c-search pattern string 0 (- (length string) (length pattern) -1)))



C/USER[15]> (trace d&c-search)
;; Tracing function D&C-SEARCH.
(D&C-SEARCH)
C/USER[16]> (my-search "abc" "xyzxyzabcxyyzya")
1. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '0 '13)
2. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '0 '6)
3. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '0 '3)
4. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '0 '1)
5. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '0 '0)
5. Trace: D&C-SEARCH ==> NIL
5. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '1 '1)
5. Trace: D&C-SEARCH ==> NIL
4. Trace: D&C-SEARCH ==> NIL
4. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '2 '3)
5. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '2 '2)
5. Trace: D&C-SEARCH ==> NIL
5. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '3 '3)
5. Trace: D&C-SEARCH ==> NIL
4. Trace: D&C-SEARCH ==> NIL
3. Trace: D&C-SEARCH ==> NIL
3. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '4 '6)
4. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '4 '5)
5. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '4 '4)
5. Trace: D&C-SEARCH ==> NIL
5. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '5 '5)
5. Trace: D&C-SEARCH ==> NIL
4. Trace: D&C-SEARCH ==> NIL
4. Trace: (D&C-SEARCH '"abc" '"xyzxyzabcxyyzya" '6 '6)
4. Trace: D&C-SEARCH ==> 6
3. Trace: D&C-SEARCH ==> 6
2. Trace: D&C-SEARCH ==> 6
1. Trace: D&C-SEARCH ==> 6
6
C/USER[17]>



But of course, this is taking the "divide" part much too literally! :-)

You don't need to cut the problem in two equal parts. You can just
divide the problem in two smaller problems, and one may be exactly the
same as the original problem but for some monotonous progression.

So if the problem is to search a substring in a string, you can divide
it in these two simplier problems:

- test if a string is a prefix of another string,
- search a substring in a _smaller_ string.

(defun my-search (pattern string)
(cond
((< (length string) (length pattern)) ; part of testing pattern as a prefix.
nil) ; If string is shorter, no.
((string= pattern string :end2 (length pattern)) ; prefix? position = 0
0)
(t ; now, search the pattern in a smaller string, starting from 1:
(let ((pos (my-search pattern (subseq string 1))))
(when pos (1+ pos))))))


Of course, here we copy a lot of tail substrings, so we can apply the
same trick as above, and only keep an index to the start of the
"smaller" string:


(defun d&c-search (pattern string start)
(cond
((< (- (length string) start) (length pattern))
nil)
((string= pattern string :start2 start :end2 (+ start (length pattern)))
start)
(t
(d&c-search pattern string (1+ start)))))

(defun my-search (pattern string)
(d&c-search pattern string 0))


--
__Pascal Bourguignon__ http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
.



Relevant Pages

  • Re: Fastcode Ansi StringReplace B&V 0.2
    ... the possibility of a stray trim function). ... The new pattern string includes leading and/or trailing blanks (eliminate ...
    (borland.public.delphi.language.basm)
  • Re: Regular expression parsers
    ... These could be changed to a single whole-match string and a separate capture-buffer array. ... Another possible feature would be not having to explicitly call regcomp, but allow a derived type to be initialized with just the pattern, then do the actual regcompupon it's first use. ... I had one version that parses a string argument using perl-style character flags. ... Alternatively, the flags could be included directly in the pattern string, but that means including a delimiter that has to be escaped elsewhere. ...
    (comp.lang.fortran)
  • Re: [PATCH 2/4] tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
    ... of the source string instead of the length of the pattern string. ... matching ... I changed what value ptr_string passes the length param to match, ...
    (Linux-Kernel)
  • Re: re beginner
    ... I have an oldscript with the following task - takes a string I copy-pasted and wich always has the same format: ... the format of the items you are looking for seems to be "some text" followed by a tab character followed by an integer. ... now, to turn this into a dictionary, you could split the returned strings on a tab character, but RE provides a better mechanism: ... by adding to the pattern string, you can mark the sections you want returned: ...
    (comp.lang.python)
  • Re: [QUIZ] Longest Repeated Substring (#153)
    ... My hack at the substring problem is based on suffix ... in the original string. ... def initialize ...
    (comp.lang.ruby)