Re: Computational complexity of cluster detection



vincent64@xxxxxxxxx <datashaping@xxxxxxxxx> wrote:
It also suggests that the
computational complexity of computing whether a data set contains one
of more clusters is well below O(n), possibly O(n0.5), if one uses
sampling techniques.

My first inclination is to respond that it's misleading to claim an
asymptotic complexity for solving a problem before the problem has been
sufficiently specified. There are many, many problems for which solving
something close to the problem is much easier than solving the problem
itself. That's only valuable if one can say more about in what ways the
answer is "close", and what that means in terms of the problem one
intends to solve.

If I were you and wanted to proceed with your idea, I'd do the
following:

* Survey how statisticians define "clusters" of data points and find a
reasonable, common, and formalized mathematical characterization of
clusters.

* Investigate how your sample-based algorithms relate to the probability
of having identified a cluster for various kinds of random data
distributions for the complete data set.

* Be sure to pay attention to worst-case data; how would one design
a data set to fool your algorithm, and why is data that looks like
that going to be likely or unlikely in practical situations?

* State the results in a way that involves specific verifiable or
provable conclusions about well-defined situations. If you can make
a valid statement about even a very limited but well-specified model,
then you have something. If you can only say things that start with
"In general, ..." or similar, then you don't.

--
Chris Smith
.



Relevant Pages

  • Re: spatial autocorelation methods
    ... > a 'detect' in the other data set). ... If the real potential sum is ... Developing clusters is a different thing than correlation I should think. ... As far as lateral movement I meant if the satellite images are of the same ...
    (sci.image.processing)
  • Re: Reconstruction of a continuous function starting from a multidimensional data set
    ... > The region is connected and I do have a data set of points which lie ... nearby internal points into a hypersphere until such a hypersphere ... Then trying to find an overlapping ... eventually this is going to run into hypersphere clusters ...
    (sci.math)
  • R clustering using diana and Calinsky and Harabasz Index
    ... managed to create dendrogram for this data set using diana() in R, ... however this only gives me the tree and not the clusters themselves. ...
    (sci.stat.edu)
  • K-MEANS CLUSTER ANALYSIS
    ... taking among students. ... I am trying to create groups of drug users ... Do I choose the final clusters of these results? ... this column of clusters will appear to the sample of the data set. ...
    (sci.stat.math)

Loading