regexps to objects

Peter Otten __peter__ at web.de
Fri Jul 27 06:24:41 EDT 2012


andrea crotti wrote:

> I have some complex input to parse (with regexps), and I would like to
> create nice objects directy from them.
> The re module doesn't of course try to conver to any type, so I was
> playing around to see if it's worth do something as below, where I
> assign a constructor to every regexp and build an object from the
> result..
> 
> Do you think it makes sense in general or how do you cope with this
> problem?
> 
> import re
> from time import strptime
> TIME_FORMAT_INPUT = '%m/%d/%Y %H:%M:%S'
> 
> def time_string_to_obj(timestring):
>     return strptime(timestring, TIME_FORMAT_INPUT)
> 
> 
> REGEXPS = {
>     'num': ('\d+', int),
>     'date': ('[0-9/]+ [0-9:]+', time_string_to_obj),
> }
> 
> 
> def reg_to_obj(reg, st):
>     reg, constr = reg
>     found = re.match(reg, st)
>     return constr(found.group())
> 
> 
> if __name__ == '__main__':
>     print reg_to_obj(REGEXPS['num'], '100')
>     print reg_to_obj(REGEXPS['date'], '07/24/2012 06:23:13')

There is an undocumented Scanner class in the re module:

>>> from datetime import datetime
>>> from re import Scanner
>>> sc = Scanner([
... ("[0-9/]+ [0-9:]+", lambda self, s: datetime.strptime(s, "%m/%d/%Y %H:
%M:%S")),
... (r"\d+", lambda self, s: int(s)),
... ("\s+", lambda self, s: None)])

>>> sc.scan("07/24/2012 06:23:13")
([datetime.datetime(2012, 7, 24, 6, 23, 13)], '')
>>> sc.scan("07/24/2012 06:23:13 123")
([datetime.datetime(2012, 7, 24, 6, 23, 13), 123], '')

However:

>>> sc.scan("456 07/24/2012 06:23:13 123")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 322, in scan
    action = action(self, m.group())
  File "<stdin>", line 2, in <lambda>
  File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data '456 07' does not match format '%m/%d/%Y %H:%M:%S'





More information about the Python-list mailing list