Re: Efficiently Extracting Identical Values From A List/Array

From: Karl Heinz Buchegger (kbuchegg_at_gascad.at)
Date: 02/21/05


Date: Mon, 21 Feb 2005 17:56:17 +0100

Adam Hartshorne wrote:
>
> I think you may have misunderstood,

Maybe

> there are no actual point
> coordinates. Simply a list of points, a list of lines and a list that is
> been used to link lines to the points.
>
> What I am concerned with is the linking list. So say the following
>
> I = {10,10,4,6,5,5}
>
> That says lines 1 and 2 are linked to node 10, line 3 to node 4 etc etc
>
> What I want is a result of the search that gives me
>
> O = {10{1,2}, 5{5,6}}

Same strategy.
Set up a helper datastructure

  struct SortHelper
  {
     int NodeIndex;
     int OriginalPosition;
  }

and create an array (or whatever) of that:

I = { 10, 4, 8, 10, 4, 5 }

becomes

  { 10, 1 }
  { 4, 2 }
  { 8, 3 }
  { 10, 4 }
  { 4, 5 }
  { 5, 6 }

Now sort that array according to NodeIndex:

  { 4, 2 }
  { 4, 5 }
  { 5, 6 }
  { 8, 3 }
  { 10, 1 }
  { 10, 4 }

and scan through it: there are 2 consecutive '4' Nodes in the list and they
appeared in the original I at positions 2 and 5. '5' is single and thus
of no interest to you (if I understand correctly), same for '8'. But then
there is 10 which occours 2 times in I at positions 1 and 4.

The strategy is always the same. If you need to compare each element with each
other element in a datastructure, you have a potential O(n^2) algorithm. If
possible (and often it is), sort that thing such that equal elements get
consecutive. Sorting is of order O(n*log(n)), plus an additional O(n) for
running through the data structure and sorting things out. Much better
then O(n^2) for large values of n.

-- 
Karl Heinz Buchegger
kbuchegg@gascad.at