Algorithm which returns rank of how similar 2 strings are



From: "Stephen Howe" <sjhoweATdialDOTpipexDOTcom>
I believe there are algorithms which return a rank for how
similar 2 strings are.

It all depends on what *definition* you use for a measure of similarity.
One possible measure is the cosine of the angle between two
vectors, where each string maps to such a vector.

Here's a simple math exercise for you: If you have two unit
vectors, and you know the length of the difference vector d between
them, what is the cosine of the angle C, expressed as a function of d?
Hint: Use either similar triangles or trigonometry to work out the
answer, but the answer comes out a really simple quadradic.

So my proposed answer is first to clearly define a vector space
model for your space of all possible strings, whereby you have a
metric that tells how *different* two vectors are, which is
computed simply as the Cartesian length of the difference vector.
Then to compute similarity, first divide each vector by its length
to get a pair of unit vectors, compute the difference between those
unit vectors, then plug d into that formula to compute C.
.