Re: eliminate duplicate rows in 2D array





zyf wrote:
Hi

I have a 2D real array A[1e5,2] where some rows are same like the following

| .... |
| x  y |
| .... |
| x  y |
| .... |
| x  y |
| .... |

the number of duplicate rows is not known in advance.
And the number of rows that is duplicated isn't known in advance too.

Are there any way to eliminate the duplicate rows very fast?

Seems like a rerun of a moderately recent question.

The usual answer is to sort and then compare the adjacent entries.
Sorting is n log ( n ).

(There is a (foolish) complexity result that if arithmetic is
free then the existence of duplicates can be determined in
constant time by evaluating a suitable interpolating polynomial.
No information on which are the duplicates. This result mostly
just shows that the complexity model matters!)

Since you do not specify what you intend to do once the duplicates
are removed one can guess that sorting will not upset the intended
uses. If you need to keep the current order then do a "formal" sort
where you set up an array of pointers (subscripts in practice) and
just move the pointers around. The sorting will have to be a two
field sort in which you sort on x and break ties by using y, or the
other way around.

But then when n is small, most things a reasonable modern computer
will do will be very fast so just comparing all pairs will get the job
done.
.