Re: Efficient Rank Ordering of Nested Lists



On Aug 3, 8:38 am, al...@xxxxxxx (Alex Martelli) wrote:
pablo.mitch...@xxxxxxxxx <pablo.mitch...@xxxxxxxxx> wrote:
A naive approach to rank ordering (handling ties as well) of nested
lists may be accomplished via:

def rankLists(nestedList):
def rankList(singleList):
sortedList = list(singleList)
sortedList.sort()
return map(sortedList.index, singleList)
return map(rankList, nestedList)

>>> unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2,
0, -1.1, 13 ] ]
>>> print rankLists(unranked)

[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]

This works nicely when the dimensions of the nested list are small.
It is slow when they are big. Can someone suggest a clever way to
speed it up?

Each use of sortedList.index is O(N) [where N is len(singleList)], and
you have N such uses in the map in the inner function, so this approach
is O(N squared). Neil's suggestion to use bisect replaces the O(N)
.index with an O(log N) search, so the overall performance is O(N log N)
[[and you can't do better than that, big-O wise, because the sort step
is also O(N log N)]].

"beginner"'s advice to use a dictionary is also good and may turn out to
be faster, just because dicts are SO fast in Python -- but you need to
try and measure both alternatives. One way to use a dict (warning,
untested code):

def rankList(singleList):
d = {}
for i, item in reversed(enumerate(sorted(singleList))):
d[item] = i
return [d[item] for item in singleList]

If you find the for-clause too rich in functionality, you can of course
split it up a bit; but note that you do need the 'reversed' to deal with
the corner case of duplicate items (without it, you'd end up with 1s
instead of 0s for the third one of the sublists in your example). If
speed is of the essence you might try to measure what happens if you
replace the returned expression with map(d.__getitem__, singleList), but
I suspect the list comprehension is faster as well as clearer.

Another potential small speedup is to replace the first 3 statements
with just one:

d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList))))

but THIS density of functionality is a bit above my personal threshold
of comfort ("sparse is better than dense":-).

Alex

Thanks for breaking it down so thoroughly. I try the different
recipes and see which gives the best trade offs between readability
and performance. Agreed that dict() approach looks promising.

.



Relevant Pages

  • Re: Efficient Rank Ordering of Nested Lists
    ... lists may be accomplished via: ... def rankLists: ... just because dicts are SO fast in Python -- but you need to ... If you find the for-clause too rich in functionality, ...
    (comp.lang.python)
  • Re: subclassing Python types
    ... list, dict. ... def MyList: ... The same way that you access plain strings, lists or dicts: ...
    (comp.lang.python)
  • Re: General question about Python design goals
    ... It only works on lists and strings, ... def iswhite: ... lists to tuples or vice versa, to get the functionality ... I also find the use of C-structs and tuples completly different. ...
    (comp.lang.python)
  • Re: subclassing Python types
    ... list, dict. ... def MyString: ... The same way that you access plain strings, lists or dicts: ... methods of the super class. ...
    (comp.lang.python)
  • Re: Clustering the keys of a dict according to its values
    ... dict a list of the keys of the old dict that have that very value. ... Another requirement is that it should also work on lists, ... def invert: ...
    (comp.lang.python)