Re: Yet another unique() function...



bearophileHUGS@xxxxxxxxx writes:
It's more terse, but my version is built to be faster in the more
common cases of all hashable or/and all sortable items (while working
in other cases too).
Try your unique on an unicode string, that's probably a bug (keepstr
is being ignored).
Version by Paul Rubin is very short, but rather unreadable too.

Bye,
bearophile

Unicode fix (untested):

def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

Case by case optimization (untested):

def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
try:
remaining = set(seq)
seen = set()
return t(c for c in seq if (c in remaining and
not remaining.remove(c)))
except TypeError: # hashing didn't work, see if seq is sortable
try:
from itertools import groupby
s = sorted(enumerate(seq),key=lambda (i,v):(v,i))
return t(g.next() for k,g in groupby(s, lambda (i,v): v))
except: # not sortable, use brute force
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

I don't have Python 2.4 available right now to try either of the above.

Note that all the schemes fail if seq is some arbitrary iterable,
rather one of the built-in sequence types.

I think these iterator approaches get more readable as one becomes
used to them.
.



Relevant Pages

  • urllib2.unquote() vs unicode
    ... def testEq: ... Tests with unquotecalled with utxt and stxt cast into str ... or unicode are also successful. ... Why does this test fail while others are successful? ...
    (comp.lang.python)
  • Re: Yet another unique() function...
    ... def unique: ... if t in (unicode, str): ... return t(c for c in seq if not (c in seen or seen.append(c))) ...
    (comp.lang.python)
  • Re: How to use list as key of dictionary?
    ... except TypeError: ... def tupleize: ... if isinstance(non_tuple, str): ... in case x is a unicode. ...
    (comp.lang.python)
  • Revised PEP 349: Allow str() to return unicode strings
    ... str() rather than adding a new built-in function. ... Allow strto return unicode strings ... write code that works with either string type and would also make ... We need to upgrade existing libraries, written for str instances, ...
    (comp.lang.python)
  • PEP: Generalised String Coercion
    ... Title: Generalised String Coercion ... This PEP proposes the introduction of a new built-in function, ... use the unicode type. ... that assumes that string data is represented as str instances. ...
    (comp.lang.python)