Re: Similarity searching
- From: "Gary L. Scott" <garyscott@xxxxxxx>
- Date: Sun, 03 Apr 2005 16:38:14 -0500
glen herrmannsfeldt wrote:
justabeginner wrote:
If the number of arrays is N, array cardinality is C (would be 10 for normal decimal arrays) and the array length is L, the number of evaluations is proportional to (N)(N-1)(C**2)(L**2).
Where do you get this formula from? Sorting the set introduces a factor of N*log(N), if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all.
(snip)
Not quite sure where the log might come from, wouldn't that introduce an irrational number into an integer set, therefore it is not an operation which is likely to be applicable here? Seems like introducing pi into integer arithmetic.
The log comes when you analyze the behavior for large N, and ignore smaller terms, and also ignore constant factors. There are sorting algorithms that compare each element to each other element, called O(n**2) (The leading term of N*(N-1)/2, ignoring the 1/2).
There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. Many of them are O(N*logN) for large N, though the low terms will be significant for very small N.
When you're comparing 2 arrays A & B of length L, A(1) needs to be compared with B(1) only. So it should be proportional to L.
Array cardinality probably isn't a factor because you're just comparing if an entry is the same.
Similarity searching is still an active science, but there are a lot of useful algorithms out there now. Approximate matching, or similarity searching as the title says, is one. Dynamic programming algorithms are popular if you need the best approximate match. Many other algorithms are faster but don't guarantee the best, but just a high probability of finding a good match.
-- glen
Anything to do with "fuzzy" logic (similarity searching)?
--
Gary Scott mailto:garyscott@xxxxxxx
Fortran Library: http://www.fortranlib.com
Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html
Why are there two? God only knows.
If electricity comes from electrons, does morality come from morons? .
- Follow-Ups:
- Re: Similarity searching
- From: Gordon Sande
- Re: Similarity searching
- References:
- Similarity searching
- From: justabeginner
- Re: Similarity searching
- From: James Van Buskirk
- Re: Similarity searching
- From: justabeginner
- Re: Similarity searching
- From: glen herrmannsfeldt
- Similarity searching
- Prev by Date: Re: Free Fortran compiler that handles Cray-style pointers available?
- Next by Date: Re: INTERFACE problem
- Previous by thread: Re: Similarity searching
- Next by thread: Re: Similarity searching
- Index(es):
Relevant Pages
|