# Re: Deleting elments of a list which are in it more than once

From: bill hogan (sequitur_at_sonic.net)
Date: 02/19/05

```Date: Sat, 19 Feb 2005 21:15:04 GMT

```

> W wrote:
>
>> This algorithm is O(N**2) -- for each element in the list, you go through
>> the remainder of the list removing it.
>> suppose the list were [a,b,a,b,a,b,....,a,b] where .... stands for a
>> lot of
>> a's and b's.
>> then you would go throught the list once, removing a, and then go through
>> the list again (although 1/2 as big) removing b.
>> An alternative would be to keep track of what you have removed (eg a
>> and b)
>> and check that list as you go through the original list.
>> In other words, check each new element with the accumulating list rather
>> than the existing list.
>
>
> You are still on the O(N**2) side.
>
> A better algorithm is:
>
> - conveniently prepare the list for sorting [O(n)]
> - sort the list [O(NlogN)]
> - mark duplicates [O(N)] and
> - unsort the list while ignoring dups [O(N)]
>
> that is O(NlogN) globally:
>
>
> prepare([], []).
> prepare([H|T], [(H,_)|T1]) :-
> prepare(T, T1).
>
> mark_dups([]).
> mark_dups([(Key, no_del)|More]) :-
> mark_dups(Key, More).
>
> mark_dups(Key, [(Ele, del)|More]) :-
> Key == Ele, !,
> mark_dups(Key, More).
> mark_dups(_, L) :-
> mark_dups(L).
>
> clean([], []).
> clean([(_, del)|M], M1) :-
> !,
> clean(M, M1).
> clean([(H, no_del)|M], [H|M1]) :-
> clean(M, M1).
>
> rm_dups(L, O) :-
> prepare(L, L1),
> sort(L1, L2), /* In SWI-Prolog 'sort/2' removes duplicates; 'msort/2' does not. */
> mark_dups(L2),
> clean(L1, L3), /* Shouldn't this be 'clean(L2,L3)' ? */
> L3 = O.
>
> :- I = [a,a,b,h,a,j,b,y,a,h,h], rm_dups(I, O), writeln(O).
>
>
> Cheers,
>
>

Thank you!

But why not simply

rm_dups(L, O) :-
msort(L, M), /* N.B. In SWI-Prolog 'sort/2' removes duplicates;
'msort/2' does not. */
clean(M, O).

clean([],[]).
clean([Keep|Terms],[Keep|Keepers]) :-
del_if_same(Keep, Terms, Keepers).

del_if_same(Keep,[Next|Terms], Keepers) :-
Keep == Next,
!,
del_if_same(Keep,Terms,Keepers).
del_if_same(_,Terms,Keepers) :-
clean(Terms, Keepers).

?

This gives

?- time((make_list(10000,I),rm_dups(I,O),length(O,N))).
% 110,008 inferences, 0.31 CPU in 0.33 seconds (93% CPU, 354865 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

By contrast, my

unique2(L1,L2) :-
unique_members2(L1,[],L2).

unique_members2([],In,In).
unique_members2([X|Rest],In,Out) :-
member(X,In),!,
unique_members2(Rest,In,Out).
unique_members2([X|Rest],In,Out) :-
unique_members2(Rest,[X|In],Out).

gives

?- time((make_list(10000,I),unique2(I,O),length(O,N))).
% 50,125,005 inferences, 87.93 CPU in 88.12 seconds (100% CPU, 570056 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

In an attempt to explain this huge difference to myself,
I was led to reformulate 'unique2' (above) as follows:

unique3(L1,L2) :-
msort(L1,M),
unique_members3(M,[],L2).

unique_members3([],In,In).
unique_members3([X|Rest],In,Out) :-
In=[X|_],
!,
unique_members3(Rest,In,Out).
unique_members3([X|Rest],In,Out) :-
unique_members3(Rest,[X|In],Out).

This gives

?- time((make_list(10000,I),unique3(I,O),length(O,N))).
% 90,006 inferences, 0.34 CPU in 0.35 seconds (98% CPU, 264724 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...]
N = 10002

which I think compares favorably to the results I get from your 'rm_dups'
(replacing 'sort' with 'msort' and modified per my comment above):

?- time((make_list(10000,I),rm_dups(I,O),length(O,N))).
% 150,011 inferences, 0.65 CPU in 0.67 seconds (97% CPU, 230786 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

Altogether very interesting and not a lesson I am likely to soon forget.

```--
```