template strings for matching?

MRAB google at mrabarnett.plus.com
Thu Oct 9 17:53:12 EDT 2008


On Oct 9, 5:20 pm, Joe Strout <j... at strout.net> wrote:
> Wow, this was harder than I thought (at least for a rusty Pythoneer  
> like myself).  Here's my stab at an implementation.  Remember, the  
> goal is to add a "match" method to Template which works like  
> Template.substitute, but in reverse: given a string, if that string  
> matches the template, then it should return a dictionary mapping each  
> template field to the corresponding value in the given string.
>
> Oh, and as one extra feature, I want to support a ".greedy" attribute  
> on the Template object, which determines whether the matching of  
> fields should be done in a greedy or non-greedy manner.
>
> ------------------------------------------------------------
> #!/usr/bin/python
>
> from string import Template
> import re
>
> def templateMatch(self, s):
>         # start by finding the fields in our template, and building a map
>         # from field position (index) to field name.
>         posToName = {}
>         pos = 1
>         for item in self.pattern.findall(self.template):
>                 # each item is a tuple where item 1 is the field name
>                 posToName[pos] = item[1]
>                 pos += 1
>
>         # determine if we should match greedy or non-greedy
>         greedy = False
>         if self.__dict__.has_key('greedy'):
>                 greedy = self.greedy
>
>         # now, build a regex pattern to compare against s
>         # (taking care to escape any characters in our template that
>         # would have special meaning in regex)
>         pat = self.template.replace('.', '\\.')
>         pat = pat.replace('(', '\\(')
>         pat = pat.replace(')', '\\)') # there must be a better way...
>
>         if greedy:
>                 pat = self.pattern.sub('(.*)', pat)
>         else:
>                 pat = self.pattern.sub('(.*?)', pat)
>         p = re.compile(pat)
>
>         # try to match this to the given string
>         match = p.match(s)
>         if match is None: return None
>         out = {}
>         for i in posToName.keys():
>                 out[posToName[i]] = match.group(i)
>         return out
>
> Template.match = templateMatch
>
> t = Template("The $object in $location falls mainly in the $subloc.")
> print t.match( "The rain in Spain falls mainly in the train." )
> ------------------------------------------------------------
>
> This sort-of works, but it won't properly handle $$ in the template,  
> and I'm not too sure whether it handles the ${fieldname} form,  
> either.  Also, it only escapes '.', '(', and ')' in the template...  
> there must be a better way of escaping all characters that have  
> special meaning to RegEx, except for '$' (which is why I can't use  
> re.escape).
>
> Probably the rest of the code could be improved too.  I'm eager to  
> hear your feedback.
>
> Thanks,
> - Joe

How about something like:

import re

def placeholder(m):
    if m.group(1):
        return "(?P<%s>.+)" % m.group(1)
    elif m.group(2):
        return "\\$"
    else:
        return re.escape(m.group(3))

regex = re.compile(r"\$(\w+)|(\$\$)")

t = "The $object in $location falls mainly in the $subloc."
print regex.sub(placeholder, t)



More information about the Python-list mailing list