Re: Comparing lists



On Sat, 15 Oct 2005 06:31:53 +0200, Christian Stapfer wrote:

> "jon" <jon_usenet@xxxxxxxxxxx> wrote in message
> news:1129216693.445956.155490@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>
>> To take the heat out of the discussion:
>>
>> sets are blazingly fast.
>
> I'd prefer a (however) rough characterization
> of computational complexity in terms of Big-Oh
> (or Big-whatever) *anytime* to marketing-type
> characterizations like this one...

Oh how naive.

The marketing department says: "It's O(N), so it is blindingly fast."

Translation: the amount of computation it does is linearly proportional
to N. The constant of proportionality is 1e10.

The marketing department says: "Our competitor's product is O(N**2), so it
runs like a three-legged dog."

Translation: the amount of computation it does is linearly proportional to
N squared. The constant of proportionality is 1e-100.

You do the maths.

Big O notation is practically useless for judging how fast a single
algorithm will be, or how one algorithm compares to another. It is only
useful for telling you how a single algorithm will scale as the input
increases.

It is very common for sensible programmers to fall back on a "less
efficient" O(N**2) or even O(2**N) algorithm for small amounts of data, if
that algorithm runs faster than the "more efficient" O(N) or O(log N)
algorithm. In fact, that's exactly what the sort() method does in Python:
for small enough lists, say, under 100 elements, it is quicker to run an
O(N**2) algorithm (shell sort I believe) than it is to perform the
complex set up for the merge-sort variant used for larger lists.

As for sets, they are based on dicts, which are effectively hash tables.
Hash tables are O(1), unless there are collisions, in which case the more
common algorithms degenerate to O(N). So, a very rough and ready estimate
of the complexity of the algorithm using sets would be somewhere between
O(1) and O(N) depending on the details of Python's algorithms.

So, let's do an experiment, shall we?

from sets import Set
import time

def compare_and_separate_with_sets(A, B):
AB = Set(A+B)
A = Set(A)
B = Set(B)
only_A = list(A-B)
only_B = list(B-A)
both_AB = list(AB - Set(only_A+only_B))
return (only_A, only_B, both_AB)

def timeit(f, args, n):
"""Time function f when called with *args. For timing purposes,
does n calls of f. Returns the average time used per call in seconds.
"""
loopit = range(n)
t = time.time()
for i in loopit:
results = f(*args)
t = time.time() - t
return t/n

def single_test(N):
print ("N = %-8d" % N),
A = range(N)
B = range(N/2, 3*N/2)
return timeit(compare_and_separate_with_sets, (A, B), 20)

def test_Order():
# test how compare_and_separate_with_sets scales with size
for i in range(7):
print single_test(10**i)


Now run the test code:

py> test_Order()
N = 1 0.000219106674194
N = 10 0.000135183334351
N = 100 0.000481128692627
N = 1000 0.0173740386963
N = 10000 0.103679180145
N = 100000 0.655336141586
N = 1000000 8.12827801704

In my humble opinion, that's not bad behaviour. It looks O(log N) to me,
and quite fast too: about 8 seconds to compare and separate two lists of
one million items each.

The craziest thing is, the amount of time it took to write and test two
different algorithms was probably 1% of the time it would take to hunt up
theoretical discussions of what the big O behaviour of the algorithms
would be.



--
Steven.

.



Relevant Pages

  • Re: Removing None objects from a sequence
    ... necessary and is algorithmically cleaner. ... def deNone2: ... lists to worry about. ... Here's another low-level algorithm, the classical delete items in place ...
    (comp.lang.python)
  • Re: Educated guesses for efficiency?
    ... it is comparatively rare for a program to run too slowly because an algorithm with the wrong big-O analysis was chosen. ... of things tended to be stored in lists). ... built-in sort function. ... Programmers *will* use algorithms with high complexities. ...
    (comp.lang.c)
  • Re: Cantor For Dummies ...
    ... algorithms that output a committee can be listed. ... committee output by my algorithm depends on its input. ... If it is an actual algorithm and uses valid input. ... other than lists made from the output. ...
    (sci.math)
  • Re: Comparing lists
    ... > of your faith that Big O is the be all and end all of algorithm planning. ... complexity measures, like Big-Oh. ... If experimental data and theory *disagree*: ... the conversion of lists to sets needs the insertion of N ...
    (comp.lang.python)
  • Re: coerce for arbitrary types
    ... the algorithm itself is essentially last-in first-out, ... via some sort of recursion? ... WaitStack <= new stack containing topnode ... As nested lists, there's only one number in a proper position. ...
    (comp.lang.lisp)