Re: eliminate duplicate rows in 2D array





zyf wrote:
Hi

I have a 2D real array A[1e5,2] where some rows are same like the following

| .... |
| x  y |
| .... |
| x  y |
| .... |
| x  y |
| .... |

the number of duplicate rows is not known in advance.
And the number of rows that is duplicated isn't known in advance too.

Are there any way to eliminate the duplicate rows very fast?

Seems like a rerun of a moderately recent question.

The usual answer is to sort and then compare the adjacent entries.
Sorting is n log ( n ).

(There is a (foolish) complexity result that if arithmetic is
free then the existence of duplicates can be determined in
constant time by evaluating a suitable interpolating polynomial.
No information on which are the duplicates. This result mostly
just shows that the complexity model matters!)

Since you do not specify what you intend to do once the duplicates
are removed one can guess that sorting will not upset the intended
uses. If you need to keep the current order then do a "formal" sort
where you set up an array of pointers (subscripts in practice) and
just move the pointers around. The sorting will have to be a two
field sort in which you sort on x and break ties by using y, or the
other way around.

But then when n is small, most things a reasonable modern computer
will do will be very fast so just comparing all pairs will get the job
done.
.



Relevant Pages

  • Re: Is there a method to quickly delete duplicate emails from a folder
    ... Is there a method to sort and delete the duplicates? ... using Outlook 2000. ... Maybe sorting by modified date would group together the items that got moved together by the auto-archive operation, then you could highlight them all and delete. ...
    (microsoft.public.outlook)
  • Re: updating filtered duplicates
    ... I do not want to delete the duplicates. ... I am putting the circnum and circdescr fields together into the circuit field ... He then changes either the circnum or circdescr field. ... > example deleting duplicate rows in a table. ...
    (microsoft.public.access.formscoding)
  • Re: Enforcing referential integrity in high freq RT table inserts
    ... "Erland Sommarskog" wrote in message ... I don't understand whether the duplicates ... resolution of the datetime data type, or are older rows somehow coming back? ... As you say we still need to know whether he can get duplicate rows from the same source, but if duplicate rows are coming in from different sources differentiating sources will help. ...
    (microsoft.public.sqlserver.programming)
  • Re: Merging Lists
    ... But if you're asking how to check for duplicates in a list and delete these, ... Select all these duplicate rows ... > zero value, ... > column, zero, would be value imported into the cell) ...
    (microsoft.public.excel.misc)
  • Delete Duplicates
    ... a query that will find duplicate rows in a main ... my duplicates query to find duplicates, I am able to identify the ... procedure on live data of 120,000 records, from the main table, I can find ...
    (microsoft.public.access.queries)