Re: On the complexity of determining whether n numbers are distinct




Googmeister wrote:
eKo1 wrote:
Googmeister wrote:
eKo1 wrote:
Googmeister wrote:
Here's my attempt.

I'll assume that we can only access the data via pairwise comparisons.
Since the algorithm must work if all of the elements are distinct,
let's assume a[1] < a[2] < ... < a[n]. Any correct algorithm for
element distinctness must compare a[i] with a[i+1] for each i. If not,
then the algorithm would produce the same output if we changed a[i+1]
to a[i] (but this would change the answer from no duplicates to has
duplicates). The set of comparisons the algorithm uses forms a DAG,
with an edge from the smaller element in the comparison to the larger
element. Finding the total order (in linear time) yields the elements
in sorted order. Thus, any comparison-based algorithm for element
distinctness can be used for sorting n distinct elements, and the
standard n log n comparison-based sorting lower bound applies.

The most suspicious step of the proof is the one that begins "Any
correct algorithm".

This is virtually the same proof I read which I state is bogus. Please
explain how "The set of comparisons the algorithm uses forms a DAG,
with an edge from the smaller element in the comparison to the larger
element."

The proof I read says that whenever a[i] != a[j], add an edge from a[i]
to a[j] in the digraph if a[i] < a[j]. (Note that I'm doing another
comparison here.) Once I'm finished, I look for the vertex in the
digraph with no incomming edges (which will require more comparisons).
This is the least element in the list. I remove this vertex and its
outgoing edges and again look for a vertex with no incomming edges.
This will be the second largest element. Etc..

I don't see how "Finding the total order (in linear time)" is possible
with so many comparisons all over the place.

First, once you have the DAG, you don't need to make any more
comparisons between original elements.

You can find the total order in linear time by a linear time
topological sort routine, e.g., DFS.

Yes, you're right. Very good. However, I'm a little skeptical about the
argument in the proof. You're basically saying that I can take an
algorithm to solve problem A, change it a bit to solve problem B and
since problem B has lower-bound k, then A is bounded by this
lower-bound as well regardless if the change in the algorithm added
extra operations.

Am I missing something here?

The argument boils down to:

1. Suppose, for the sake of contradiction, that there is an o(n log n)
comparison-based algorithm A for element distinctness.

2. Then we could modify A to make an o(n log n) comparison-based
algorithm B for sorting.

3. Since B cannot exist, neither can A.

How is this the same as my comment? And how are you establishing that B
does not exists? I'm getting confused.

.


Quantcast