Re: I need some guideance regarding parallel processing

On 2007-02-03 13:39:33 -0400, glen herrmannsfeldt <gah@xxxxxxxxxxxxxxxx> said:

Gordon Sande wrote:


You provide no hint of why you need to compare ALL pairs.

If you are doing metric comparisons then a typical algorithm of this
class will have an n log ( n ) setup cost with a log ( n ) cost for
each query. That makes finding the nearest neighbours of all elements
only n log ( n ). Even lousy code will run circles around a highly
polished n^2 code at your sizes.

In at least one post he mentions Euclidean distance. I am pretty
darn sure that there is an O(n log n) algorithm for that, even in
six dimensions. I am not sure if it gets faster or slower in more

The original paper on k-d trees by "Freidman, J. H., Bentley, J. L., and
Finkel, R. A. 1977. An Algorithm for Finding Best Matches in Logarithmic
Expected Time. ACM Trans. Math. Softw. 3, 3 (Sep. 1977), 209-226" was
for Euclidean as well as all the Minkowski metrics. There is lots of
literature since then. I have used k-d trees on ranks under the max
norm (L_infinity) n more than six dimensions so I know it works. The
coefficients grow a bit.

Any effort on optimizing code done before you work on choosing better
algorithms is a total complete waste of time unless you are a salesman
selling machine time. It is hard to be more emphatic on the issue of
getting good algorithms before you try to optimize things.

My quick calculation is that n/log2(n) is about 1,000,000, though it
might be a factor of two or three slower than that, but from centuries
down to minutes or hours is pretty good.

Algorithm change is often much better than just a few orders of
magnitude. It is a story that deserves to be repeated often and loudly.

At 1 microsecond each, n log2 n for n=3.6e7 is 15 minutes. Maybe
another factor of six since he wants the six closest point for each.

-- glen