Re: On the complexity of determining whether n numbers are distinct




Googmeister wrote:
eKo1 wrote:
Googmeister wrote:
Here's my attempt.

I'll assume that we can only access the data via pairwise comparisons.
Since the algorithm must work if all of the elements are distinct,
let's assume a[1] < a[2] < ... < a[n]. Any correct algorithm for
element distinctness must compare a[i] with a[i+1] for each i. If not,
then the algorithm would produce the same output if we changed a[i+1]
to a[i] (but this would change the answer from no duplicates to has
duplicates). The set of comparisons the algorithm uses forms a DAG,
with an edge from the smaller element in the comparison to the larger
element. Finding the total order (in linear time) yields the elements
in sorted order. Thus, any comparison-based algorithm for element
distinctness can be used for sorting n distinct elements, and the
standard n log n comparison-based sorting lower bound applies.

The most suspicious step of the proof is the one that begins "Any
correct algorithm".

This is virtually the same proof I read which I state is bogus. Please
explain how "The set of comparisons the algorithm uses forms a DAG,
with an edge from the smaller element in the comparison to the larger
element."

The proof I read says that whenever a[i] != a[j], add an edge from a[i]
to a[j] in the digraph if a[i] < a[j]. (Note that I'm doing another
comparison here.) Once I'm finished, I look for the vertex in the
digraph with no incomming edges (which will require more comparisons).
This is the least element in the list. I remove this vertex and its
outgoing edges and again look for a vertex with no incomming edges.
This will be the second largest element. Etc..

I don't see how "Finding the total order (in linear time)" is possible
with so many comparisons all over the place.

First, once you have the DAG, you don't need to make any more
comparisons between original elements.

You can find the total order in linear time by a linear time
topological sort routine, e.g., DFS.

I may have spoken too soon. How would DFS "find the total order in
linear time?" In the worst case, you have to run DFS from every vertex
because if you don't pick the vertex with no incomming edges to run DFS
from, then the order will always be incomplete.

Also, the time complexity of DFS is proportional to n plus the number e
of edges and e is proportional to n^2 since there is a directed edge
from a[i] to all other vertices > a[i].

.


Quantcast