Re: Similarity searching



glen herrmannsfeldt wrote:
justabeginner wrote:

If the number of arrays is N, array cardinality is C (would be 10 for
normal decimal arrays) and the array length is L, the number of
evaluations is proportional to (N)(N-1)(C**2)(L**2).


Where do you get this formula from?  Sorting the set introduces
a factor of N*log(N), if the arrays are big you won't in general
have to compare to the last element so I don't see the factor
of L**2, and I don't see where that C**2 factor is coming from
at all.



(snip)

Not quite sure where the log might come from, wouldn't that introduce
an irrational number into an integer set, therefore it is not an
operation which is likely to be applicable here? Seems like
introducing pi into integer arithmetic.


The log comes when you analyze the behavior for large N, and ignore smaller terms, and also ignore constant factors. There are sorting algorithms that compare each element to each other element, called O(n**2) (The leading term of N*(N-1)/2, ignoring the 1/2).

There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. Many of them are O(N*logN) for large N, though the low terms will be significant for very small N.

When you're comparing 2 arrays A & B of length L, A(1) needs to be
compared with B(1) only. So it should be proportional to L.


Array cardinality probably isn't a factor because you're just
comparing if an entry is the same.


Similarity searching is still an active science, but there are a lot of useful algorithms out there now. Approximate matching, or similarity searching as the title says, is one. Dynamic programming algorithms are popular if you need the best approximate match. Many other algorithms are faster but don't guarantee the best, but just a high probability of finding a good match.

-- glen

Anything to do with "fuzzy" logic (similarity searching)?

--

Gary Scott
mailto:garyscott@xxxxxxx

Fortran Library:  http://www.fortranlib.com

Support the Original G95 Project:  http://www.g95.org
-OR-
Support the GNU GFortran Project:  http://gcc.gnu.org/fortran/index.html

Why are there two?  God only knows.

If electricity comes from electrons, does morality come from morons?
.



Relevant Pages

  • Re: Similarity searching
    ... a factor of N*log, if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all. ... There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. ... Similarity searching is still an active science, but there are a lot of useful algorithms out there now. ...
    (comp.lang.fortran)
  • Re: Similarity searching
    ... a factor of N*log, if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all. ... There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. ... Dynamic programming algorithms are popular if you need the best approximate match. ...
    (comp.lang.fortran)
  • Re: Performance tuning
    ... implemented heuristics on algorithms to speed things up; ... on arrays - like sorting, but also like filtering, array ... I find it funny that switch uses a dictionary, ... then performance tuning for the sake of performance tuning ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Array comparison
    ... And how do you compare two classes? ... does not explain why it can't compare two like-typed arrays. ... Free Pascal supports operator overloading. ... more inclined to say the reason is lack of customer demand for the feature. ...
    (alt.comp.lang.borland-delphi)
  • Re: Option Compare Statement
    ... both arrays are always the same type. ... compare text,. ... Your first post regarding this in the vb.controls newsgroup was, ... strings, let's say A and B. Then I want to compare their values. ...
    (microsoft.public.vb.general.discussion)