Re: Flexible Collating (feedback please)



bearophileHUGS@xxxxxxxxx wrote:
Ron Adam:

Insted of:

def __init__(self, flags=[]):
self.flags = flags
self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE)
self.txtable = []
if HYPHEN_AS_SPACE in flags:
self.txtable.append(('-', ' '))
if UNDERSCORE_AS_SPACE in flags:
self.txtable.append(('_', ' '))
if PERIOD_AS_COMMAS in flags:
self.txtable.append(('.', ','))
if IGNORE_COMMAS in flags:
self.txtable.append((',', ''))
self.flags = flags

I think using a not mutable flags default is safer, this is an
alternative (NOT tested!):

numrex = re.compile(r'[\d\.]* | \D*', re.LOCALE|re.VERBOSE)
dflags = {"hyphen_as_space": ('-', ' '),
"underscore_as_space": ('_', ' '),
"period_as_commas": ('_', ' '),
"ignore_commas": (',', ''),
...
}

def __init__(self, flags=()):
self.flags = [fl.strip().lower() for fl in flags]
self.txtable = []
df = self.__class__.dflags
for flag in self.flags:
if flag in df:
self.txtable.append(df[flag])
...

This is just an idea, it surely has some problems that have to be
fixed.

I think the 'if's are ok since there are only a few options that need to be handled by them.

I'm still trying to determine what options are really needed. I can get the thousand separator and decimal character from local.localconv() function. So ignore_commas isn't needed I think. And maybe change period_as_commas to period _as_sep and then split on periods before comparing.

I also want it to issue exceptions when the Collate object is created if invalid options are specified. That makes finding problems much easier. The example above doesn't do that, it accepts them silently. That was one of the reasons I went to named constants at first.

How does this look?

numrex = re.compile(r'([\d\.]* | \D*)', re.LOCALE|re.VERBOSE)
options = ( 'CAPS_FIRST', 'NUMERICAL', 'HYPHEN_AS_SPACE',
'UNDERSCORE_AS_SPACE', 'IGNORE_LEADING_WS',
'IGNORE_COMMAS', 'PERIOD_AS_COMMAS' )
def __init__(self, flags=""):
if flags:
flags = flags.upper().split()
for value in flags:
if value not in self.options:
raise ValueError, 'Invalid option: %s' % value
self.txtable = []
if 'HYPHEN_AS_SPACE' in flags:
self.txtable.append(('-', ' '))
if 'UNDERSCORE_AS_SPACE' in flags:
self.txtable.append(('_', ' '))
if 'PERIOD_AS_COMMAS' in flags:
self.txtable.append(('.', ','))
if 'IGNORE_COMMAS' in flags:
self.txtable.append((',', ''))
self.flags = flags



So you can set an option strings as...


import collate as C

collateopts = \
""" caps_first
hyphen_as_space
numerical
ignore_commas
"""
colatedlist = C.collated(somelist, collateopts)


A nice advantage with an option string is you don't have to prepend all your options with the module name. But you do have to validate it.

Cheers,
Ron
.



Relevant Pages

  • Re: Flexible Collating (feedback please)
    ... Although it is still quite a bit slower than a bare list.sort, that is to be expected as collate is locale aware and does additional transformations on the data which you would need to do anyways. ... Changed the flag types from integer values to a list of named strings. ... The reason for this is it makes finding errors easier and you can examine the flags attribute and get a readable list of flags. ... It now separates numerals in the middle of the string. ...
    (comp.lang.python)
  • Collate Module
    ... I've made a few more changes to my little collate module. ... Collate.py - Sorts lists of strings in various ways depending ... To use collate with your user locale you need to call setlocale ... flags = flags.upper.split ...
    (comp.lang.python)
  • Re: Flexable Collating (feedback please)
    ... # use current locale settings ... flags = flags.upper.split ... """ This allows the Collate class to be used as a sort key. ... """ Return a collated list of strings. ...
    (comp.lang.python)