An O(NP) Sequence Comparison Algorithm
From: Michael B Allen (mba2000_at_ioplex.com)
Date: 04/14/04
- Next message: Michael Wojcik: "Re: future of programming languages"
- Previous message: Andy B: "Seeking text editor to provide text hiding/gathering.. and on reopening..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 14 Apr 2004 15:49:05 -0400
The code below implements the algorithm described in the following paper:
S. Wu, E. Myers, U. Manber, and W. Miller,
``An O(NP) Sequence Comparison Algorithm,''
Information Processing Letters 35, 6 (1990), 317-323.
http://www.cs.arizona.edu/people/gene/vita.html
This is more commonly known as the shortest edit script, edit distance,
and is a dual of the longest common subsequence problem and perhaps most
commonly known as the algorithm used to implement the diff(1) program [1].
Q: How can the below code be modified to permit scoring such as described
in 11.5.2 of Gusfield's Algorithms on Strings, Trees, and Sequences? I am
concerned that because this algorithm uses the compressed edit distance
that the calculation may not be as simple as modifying the fp function.
Thanks,
Mike
[1] Actually I think GNU diff implements the O(ND) algorithm in which
case this would actually be faster.
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define MAX(a,b) ((a) > (b) ? (a) : (b))
struct ses {
int fp[4096];
const char *s1;
const char *s2;
int n;
int m;
int delta;
};
int
snake(struct ses *ses, int k, int y)
{
int x = y - k;
while (x < ses->m && y < ses->n && ses->s1[y + 1] == ses->s2[x + 1]) {
x++;
y++;
}
return y;
}
int
fp(struct ses *ses, int k, int p)
{
if (p < 0 || k < -p || k > (ses->delta + p)) {
return -1;
}
return ses->fp[ses->m + k];
}
int
compare(const char *s1, int sn1, const char *s2, int sn2)
{
struct ses ses;
int p, k, y;
ses.s1 = s1;
ses.s2 = s2;
ses.n = sn1;
ses.m = sn2;
ses.delta = sn1 - sn2;
memset(ses.fp, -1, 4096);
p = -1;
do {
p++;
for (k = -p; k < ses.delta; k++) {
ses.fp[ses.m + k] = snake(&ses, k, MAX(fp(&ses, k - 1, p) + 1, fp(&ses, k + 1, p - 1)));
}
for (k = ses.delta + p; k > ses.delta; k--) {
ses.fp[ses.m + k] = snake(&ses, k, MAX(fp(&ses, k - 1, p - 1) + 1, fp(&ses, k + 1, p)));
}
k = ses.delta;
y = ses.fp[ses.m + k] = snake(&ses, k, MAX(fp(&ses, k - 1, p) + 1, fp(&ses, k + 1, p)));
} while (y != ses.n);
return ses.delta + 2 * p;
}
int
main(void)
{
const char *s1 = " acebdabbabed";
const char *s2 = " acbdeacbed";
int d;
d = compare(s1, strlen(s1) - 1, s2, strlen(s2) - 1);
printf("%d\n", d);
return EXIT_SUCCESS;
}
Permission is granted to all to use this code. This work in whatever
form will ultimately be freely available in my libmba library.
- Next message: Michael Wojcik: "Re: future of programming languages"
- Previous message: Andy B: "Seeking text editor to provide text hiding/gathering.. and on reopening..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|