RFC: Assignment as expression (pre-PEP)

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Apr 5 19:25:04 EDT 2007


En Thu, 05 Apr 2007 18:08:46 -0300, darklord at timehorse.com  
<TimeHorse at gmail.com> escribió:

> I am trying to write a parser for a text string.  Specifically, I am
> trying to take a filename that contains meta-data about the content of
> the A/V file (mpg, mp3, etc.).
>
> I first split the filename into fields separated by spaces and dots.
>
> Then I have a series of regular expression matches.  I like
> Cartesian's 'event-based' parser approach though the even table gets a
> bit unwieldy as it grows.  Also, I would prefer to have the 'action'
> result in a variable assignment specific to the test.  E.g.
>
> def parseName(name):
>     fields = sd.split(name)
>     fields, ext = fields[:-1], fields[-1]
>     year = ''
>     capper = ''
>     series = None
>     episodeNum = None
>     programme = ''
>     episodeName = ''
>     past_title = false
>     for f in fields:
>         if year_re.match(f):
>             year = f
>             past_title = True
>         else:
>             my_match = capper_re.match(f):
>             if my_match:
>                 capper = capper_re.match(f).group(1)
>                 if capper == 'JJ' or capper == 'JeffreyJacobs':
>                     capper = 'Jeffrey C. Jacobs'
>                 past_title = True
>             else:
>                 my_match = epnum_re.match(f):
>                 if my_match:
>                     series, episodeNum = my_match.group('series',
> 'episode')
>                     past_title = True
>                 else:
>                     # If I think of other parse elements, they go
> here.
>                     # Otherwise, name is part of a title; check for
> capitalization
>                     if f[0] >= 'a' and f[0] <= 'z' and f not in
> do_not_capitalize:
>                         f = f.capitalize()
>                     if past_title:
>                         if episodeName: episodeName += ' '
>                         episodeName += f
>                     else:
>                         if programme: programme += ' '
>                         programme += f
>
>     return programme, series, episodeName, episodeNum, year, capper,
> ext
>
> Now, the problem with this code is that it assumes only 2 pieces of
> free-form meta-data in the name (i.e. Programme Name and Episode
> Name).  Also, although this is not directly adaptable to Cartesian's
> approach, you COULD rewrite it using a dictionary in the place of
> local variable names so that the event lookup could consist of 3
> properties per event: compiled_re, action_method, dictionary_string.
> But even with that, in the case of the epnum match, two assignments
> are required so perhaps a convoluted scheme such that if
> dictionary_string is a list, for each of the values returned by
> action_method, bind the result to the corresponding ith dictionary
> element named in dictionary_string, which seems a bit convoluted.  And
> the fall-through case is state-dependent since the 'unrecognized
> field' should be shuffled into a different variable dependent on
> state.  Still, if there is a better approach I am certainly up for
> it.  I love event-based parsers so I have no problem with that
> approach in general.

Maybe it's worth using a class instance. Define methods to handle each  
matching regex, and keep state in the instance.

class NameParser:

     def handle_year(self, field, match):
         self.year = field
         self.past_title = True

     def handle_capper(self, field, match):
         capper = match.group(1)
         if capper == 'JJ' or capper == 'JeffreyJacobs':
             capper = 'Jeffrey C. Jacobs'
         self.capper = capper
         self.past_title = True

     def parse(self, name):
         fields = sd.split(name)
         for field in fields:
             for regex,handler in self.handlers:
                 match = regex.match(field)
                 if match:
                      handler(self, field, match)
                      break	

You have to build the handlers list, containing (regex, handler) items;  
the "unknown" case might be a match-all expression at the end.
Well, after playing a bit with decorators I got this:

class NameParser:

     year = ''
     capper = ''
     series = None
     episodeNum = None
     programme = ''
     episodeName = ''
     past_title = False
     handlers = []

     def __init__(self, name):
         self.name = name
         self.parse()

     def handle_this(regex, handlers=handlers):
         # A decorator; associates the function to the regex
         # (Not intended to be used as a normal method! not even a static  
method!)
         def register(function, regex=regex):
             handlers.append((re.compile(regex), function))
             return function
         return register

     @handle_this(r"\(?\d+\)?")
     def handle_year(self, field, match):
         self.year = field
         self.past_title = True

     @handle_this(r"(expression)")
     def handle_capper(self, field, match):
         capper = match.group(1)
         if capper == 'JJ' or capper == 'JeffreyJacobs':
             capper = 'Jeffrey C. Jacobs'
         self.capper = capper
         self.past_title = True

     @handle_this(r".*")
     def handle_unknown(self, field, match):
         if field[0] >= 'a' and field[0] <= 'z' and field not in  
do_not_capitalize:
             field = field.capitalize()
         if self.past_title:
             if self.episodeName: self.episodeName += ' '
             self.episodeName += field
         else:
             if self.programme: self.programme += ' '
             self.programme += field

     def parse(self):
         fields = sd.split(self.name)
         for field in fields:
             for regex,handler in self.handlers:
                 match = regex.match(field)
                 if match:
                     handler(self, field, match)
                     break


-- 
Gabriel Genellina




More information about the Python-list mailing list