Flexible Collating (feedback please)

Ron Adam rrr at ronadam.com
Thu Oct 19 10:33:55 EDT 2006


bearophileHUGS at lycos.com wrote:
> Ron Adam:
> 
> Insted of:
> 
>       def __init__(self, flags=[]):
>          self.flags = flags
>          self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE)
>          self.txtable = []
>          if HYPHEN_AS_SPACE in flags:
>              self.txtable.append(('-', ' '))
>          if UNDERSCORE_AS_SPACE in flags:
>              self.txtable.append(('_', ' '))
>          if PERIOD_AS_COMMAS in flags:
>              self.txtable.append(('.', ','))
>          if IGNORE_COMMAS in flags:
>              self.txtable.append((',', ''))
>          self.flags = flags
> 
> I think using a not mutable flags default is safer, this is an
> alternative (NOT tested!):
> 
>       numrex = re.compile(r'[\d\.]*  |  \D*', re.LOCALE|re.VERBOSE)
>       dflags = {"hyphen_as_space": ('-', ' '),
>                 "underscore_as_space": ('_', ' '),
>                 "period_as_commas": ('_', ' '),
>                 "ignore_commas": (',', ''),
>                 ...
>                }
> 
>       def __init__(self, flags=()):
>          self.flags = [fl.strip().lower() for fl in flags]
>          self.txtable = []
>          df = self.__class__.dflags
>          for flag in self.flags:
>              if flag in df:
>                 self.txtable.append(df[flag])
>          ...
> 
> This is just an idea, it surely has some problems that have to be
> fixed.

I think the 'if's are ok since there are only a few options that need to be 
handled by them.

I'm still trying to determine what options are really needed.  I can get the 
thousand separator and decimal character from local.localconv() function.  So 
ignore_commas isn't needed I think.  And maybe change period_as_commas to period 
_as_sep and then split on periods before comparing.

I also want it to issue exceptions when the Collate object is created if invalid 
options are specified. That makes finding problems much easier.  The example 
above doesn't do that, it accepts them silently.  That was one of the reasons I 
went to named constants at first.

How does this look?

     numrex = re.compile(r'([\d\.]* | \D*)', re.LOCALE|re.VERBOSE)
     options = ( 'CAPS_FIRST', 'NUMERICAL', 'HYPHEN_AS_SPACE',
                 'UNDERSCORE_AS_SPACE', 'IGNORE_LEADING_WS',
                 'IGNORE_COMMAS', 'PERIOD_AS_COMMAS' )
     def __init__(self, flags=""):
         if flags:
             flags = flags.upper().split()
             for value in flags:
                 if value not in self.options:
                     raise ValueError, 'Invalid option: %s' % value
             self.txtable = []
             if 'HYPHEN_AS_SPACE' in flags:
                 self.txtable.append(('-', ' '))
             if 'UNDERSCORE_AS_SPACE' in flags:
                 self.txtable.append(('_', ' '))
             if 'PERIOD_AS_COMMAS' in flags:
                 self.txtable.append(('.', ','))
             if 'IGNORE_COMMAS' in flags:
                 self.txtable.append((',', ''))
         self.flags = flags



So you can set an option strings as...


import collate as C

collateopts = \
     """ caps_first
         hyphen_as_space
         numerical
         ignore_commas
     """
colatedlist = C.collated(somelist, collateopts)


A nice advantage with an option string is you don't have to prepend all your 
options with the module name.  But you do have to validate it.

Cheers,
    Ron



More information about the Python-list mailing list