Re: I need some guideance regarding parallel processing



On 2007-02-03 13:39:33 -0400, glen herrmannsfeldt <gah@xxxxxxxxxxxxxxxx> said:

Gordon Sande wrote:

(snip)

You provide no hint of why you need to compare ALL pairs.
(snip)

If you are doing metric comparisons then a typical algorithm of this
class will have an n log ( n ) setup cost with a log ( n ) cost for
each query. That makes finding the nearest neighbours of all elements
only n log ( n ). Even lousy code will run circles around a highly
polished n^2 code at your sizes.

In at least one post he mentions Euclidean distance. I am pretty
darn sure that there is an O(n log n) algorithm for that, even in
six dimensions. I am not sure if it gets faster or slower in more
dimensions.


The original paper on k-d trees by "Freidman, J. H., Bentley, J. L., and
Finkel, R. A. 1977. An Algorithm for Finding Best Matches in Logarithmic
Expected Time. ACM Trans. Math. Softw. 3, 3 (Sep. 1977), 209-226" was
for Euclidean as well as all the Minkowski metrics. There is lots of
literature since then. I have used k-d trees on ranks under the max
norm (L_infinity) n more than six dimensions so I know it works. The
coefficients grow a bit.


Any effort on optimizing code done before you work on choosing better
algorithms is a total complete waste of time unless you are a salesman
selling machine time. It is hard to be more emphatic on the issue of
getting good algorithms before you try to optimize things.

My quick calculation is that n/log2(n) is about 1,000,000, though it
might be a factor of two or three slower than that, but from centuries
down to minutes or hours is pretty good.

Algorithm change is often much better than just a few orders of
magnitude. It is a story that deserves to be repeated often and loudly.

At 1 microsecond each, n log2 n for n=3.6e7 is 15 minutes. Maybe
another factor of six since he wants the six closest point for each.

-- glen


.



Relevant Pages

  • Re: Surrogate factoring mysteries resolved
    ... > Now combinations of factors are important, as the proper algorithm ... does the number of divisors increase in a cubic fashion? ... No one doubts that factoring is hard. ...
    (sci.crypt)
  • Re: Surrogate factoring mysteries resolved
    ... > Now combinations of factors are important, as the proper algorithm ... does the number of divisors increase in a cubic fashion? ... No one doubts that factoring is hard. ...
    (sci.math)
  • Re: Bug/Gross InEfficiency in HeathFields fgetline program
    ... But exactly the opposite is true - clarity is lost in *your* version, ... If you think that's a complex algorithm, ... accordance with the Two Rules of Optimisation. ...
    (comp.lang.c)
  • Re: can this code be improved
    ... Why "numberOf" AND "Count"? ... Now we can see what the algorithm is within a few lines of code. ... refactor it further or rewrite it entirely. ...
    (comp.lang.java.programmer)
  • Re: fmincon and simulink
    ... That's okay, users can *snip* out unnecessary sections when they reply. ... This warning just means that FMINCON is going to use a different algorithm ... since the error message indicated that your nonlinear constraint ...
    (comp.soft-sys.matlab)