Re: On the complexity of determining whether n numbers are distinct




Googmeister wrote:
Here's my attempt.

I'll assume that we can only access the data via pairwise comparisons.
Since the algorithm must work if all of the elements are distinct,
let's assume a[1] < a[2] < ... < a[n]. Any correct algorithm for
element distinctness must compare a[i] with a[i+1] for each i. If not,
then the algorithm would produce the same output if we changed a[i+1]
to a[i] (but this would change the answer from no duplicates to has
duplicates). The set of comparisons the algorithm uses forms a DAG,
with an edge from the smaller element in the comparison to the larger
element. Finding the total order (in linear time) yields the elements
in sorted order. Thus, any comparison-based algorithm for element
distinctness can be used for sorting n distinct elements, and the
standard n log n comparison-based sorting lower bound applies.

The most suspicious step of the proof is the one that begins "Any
correct algorithm".

This is virtually the same proof I read which I state is bogus. Please
explain how "The set of comparisons the algorithm uses forms a DAG,
with an edge from the smaller element in the comparison to the larger
element."

The proof I read says that whenever a[i] != a[j], add an edge from a[i]
to a[j] in the digraph if a[i] < a[j]. (Note that I'm doing another
comparison here.) Once I'm finished, I look for the vertex in the
digraph with no incomming edges (which will require more comparisons).
This is the least element in the list. I remove this vertex and its
outgoing edges and again look for a vertex with no incomming edges.
This will be the second largest element. Etc..

I don't see how "Finding the total order (in linear time)" is possible
with so many comparisons all over the place.

.


Quantcast