Re: Deleting elments of a list which are in it more than once

From: bill hogan (sequitur_at_sonic.net)
Date: 02/19/05


Date: Sat, 19 Feb 2005 21:15:04 GMT

Salvador Fandiņo wrote:
> W wrote:
>
>> This algorithm is O(N**2) -- for each element in the list, you go through
>> the remainder of the list removing it.
>> suppose the list were [a,b,a,b,a,b,....,a,b] where .... stands for a
>> lot of
>> a's and b's.
>> then you would go throught the list once, removing a, and then go through
>> the list again (although 1/2 as big) removing b.
>> An alternative would be to keep track of what you have removed (eg a
>> and b)
>> and check that list as you go through the original list.
>> In other words, check each new element with the accumulating list rather
>> than the existing list.
>
>
> You are still on the O(N**2) side.
>
> A better algorithm is:
>
> - conveniently prepare the list for sorting [O(n)]
> - sort the list [O(NlogN)]
> - mark duplicates [O(N)] and
> - unsort the list while ignoring dups [O(N)]
>
> that is O(NlogN) globally:
>
>
> prepare([], []).
> prepare([H|T], [(H,_)|T1]) :-
> prepare(T, T1).
>
> mark_dups([]).
> mark_dups([(Key, no_del)|More]) :-
> mark_dups(Key, More).
>
> mark_dups(Key, [(Ele, del)|More]) :-
> Key == Ele, !,
> mark_dups(Key, More).
> mark_dups(_, L) :-
> mark_dups(L).
>
> clean([], []).
> clean([(_, del)|M], M1) :-
> !,
> clean(M, M1).
> clean([(H, no_del)|M], [H|M1]) :-
> clean(M, M1).
>
> rm_dups(L, O) :-
> prepare(L, L1),
> sort(L1, L2), /* In SWI-Prolog 'sort/2' removes duplicates; 'msort/2' does not. */
> mark_dups(L2),
> clean(L1, L3), /* Shouldn't this be 'clean(L2,L3)' ? */
> L3 = O.
>
> :- I = [a,a,b,h,a,j,b,y,a,h,h], rm_dups(I, O), writeln(O).
>
>
> Cheers,
>
> - Salvador
>

   Thank you!

   But why not simply

  rm_dups(L, O) :-
      msort(L, M), /* N.B. In SWI-Prolog 'sort/2' removes duplicates;
'msort/2' does not. */
      clean(M, O).

  clean([],[]).
  clean([Keep|Terms],[Keep|Keepers]) :-
      del_if_same(Keep, Terms, Keepers).

  del_if_same(Keep,[Next|Terms], Keepers) :-
      Keep == Next,
      !,
     del_if_same(Keep,Terms,Keepers).
  del_if_same(_,Terms,Keepers) :-
     clean(Terms, Keepers).

?

This gives

  ?- time((make_list(10000,I),rm_dups(I,O),length(O,N))).
  % 110,008 inferences, 0.31 CPU in 0.33 seconds (93% CPU, 354865 Lips)

  I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
  O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
  N = 10002

By contrast, my

  unique2(L1,L2) :-
     unique_members2(L1,[],L2).

  unique_members2([],In,In).
  unique_members2([X|Rest],In,Out) :-
     member(X,In),!,
     unique_members2(Rest,In,Out).
  unique_members2([X|Rest],In,Out) :-
     unique_members2(Rest,[X|In],Out).

gives

?- time((make_list(10000,I),unique2(I,O),length(O,N))).
% 50,125,005 inferences, 87.93 CPU in 88.12 seconds (100% CPU, 570056 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

  In an attempt to explain this huge difference to myself,
I was led to reformulate 'unique2' (above) as follows:

  unique3(L1,L2) :-
     msort(L1,M),
     unique_members3(M,[],L2).

  unique_members3([],In,In).
  unique_members3([X|Rest],In,Out) :-
     In=[X|_],
     !,
     unique_members3(Rest,In,Out).
  unique_members3([X|Rest],In,Out) :-
     unique_members3(Rest,[X|In],Out).

  This gives

?- time((make_list(10000,I),unique3(I,O),length(O,N))).
% 90,006 inferences, 0.34 CPU in 0.35 seconds (98% CPU, 264724 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...]
N = 10002

which I think compares favorably to the results I get from your 'rm_dups'
(replacing 'sort' with 'msort' and modified per my comment above):

  ?- time((make_list(10000,I),rm_dups(I,O),length(O,N))).
  % 150,011 inferences, 0.65 CPU in 0.67 seconds (97% CPU, 230786 Lips)

  I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
  O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
  N = 10002

  Altogether very interesting and not a lesson I am likely to soon forget.

--