Re: Flexable Collating (feedback please)





This is how I changed it...

(I edited out the test and imports for posting here.)



locale.setlocale(locale.LC_ALL, '') # use current locale settings

class Collate(object):
""" A general purpose and configurable collator class.
"""
options = [ 'CAPS_FIRST', 'NUMERICAL', 'HYPHEN_AS_SPACE',
'UNDERSCORE_AS_SPACE', 'IGNORE_LEADING_WS',
'IGNORE_COMMAS', 'PERIOD_AS_COMMAS' ]

def __init__(self, flags=""):
if flags:
flags = flags.upper().split()
for value in flags:
if value not in self.options:
raise ValueError, 'Invalid option: %s' % value
self.txtable = []
if 'HYPHEN_AS_SPACE' in flags:
self.txtable.append(('-', ' '))
if 'UNDERSCORE_AS_SPACE' in flags:
self.txtable.append(('_', ' '))
if 'PERIOD_AS_COMMAS' in flags:
self.txtable.append(('.', ','))
if 'IGNORE_COMMAS' in flags:
self.txtable.append((',', ''))
self.flags = flags
self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE)

def transform(self, s):
""" Transform a string for collating.
"""
if not self.flags:
return locale.strxfrm(s)
for a, b in self.txtable:
s = s.replace(a, b)
if 'IGNORE_LEADING_WS' in self.flags:
s = s.strip()
if 'CAPS_FIRST' in self.flags:
s = s.swapcase()
if 'NUMERICAL' in self.flags:
slist = self.numrex.split(s)
for i, x in enumerate(slist):
try:
slist[i] = float(x)
except:
slist[i] = locale.strxfrm(x)
return slist
return locale.strxfrm(s)

def __call__(self, a):
""" This allows the Collate class to be used as a sort key.

USE: list.sort(key=Collate(flags))
"""
return self.transform(a)

def collate(slist, flags=[]):
""" Collate list of strings in place.
"""
slist.sort(key=Collate(flags).transform)

def collated(slist, flags=[]):
""" Return a collated list of strings.
"""
return sorted(slist, key=Collate(flags).transform)

.



Relevant Pages

  • Re: Flexible Collating (feedback please)
    ... Although it is still quite a bit slower than a bare list.sort, that is to be expected as collate is locale aware and does additional transformations on the data which you would need to do anyways. ... Changed the flag types from integer values to a list of named strings. ... The reason for this is it makes finding errors easier and you can examine the flags attribute and get a readable list of flags. ... It now separates numerals in the middle of the string. ...
    (comp.lang.python)
  • Collate Module
    ... I've made a few more changes to my little collate module. ... Collate.py - Sorts lists of strings in various ways depending ... To use collate with your user locale you need to call setlocale ... flags = flags.upper.split ...
    (comp.lang.python)
  • Re: Flexible Collating (feedback please)
    ... if HYPHEN_AS_SPACE in flags: ... I also want it to issue exceptions when the Collate object is created if invalid options are specified. ... So you can set an option strings as... ...
    (comp.lang.python)
  • Re: tracking down disk spinups.
    ... I implemented the BusyBox mount command last year. ... (The new flags _replace_ the old flags, ... of course, not all of them are flags, some remain strings, but you can't keep ... (because otherwise the kernel got log message diarrhea). ...
    (Linux-Kernel)
  • Re: Boost process and C
    ... For programs that use extensively strings, 32 bits multiplied by several thousand small strings can make a big difference in RAM used, specially for the more common short strings. ... I've yet to see software where short strings made up a significant portion of the memory footprint and saving the memory that avoiding the flags would be of real use. ... Personally I would say that using negative lengths was asking for problems because at some point a negative length will be checked without first changing it to positive. ...
    (comp.lang.c)