template strings for matching?

Joe Strout joe at strout.net
Thu Oct 9 12:20:19 EDT 2008


Wow, this was harder than I thought (at least for a rusty Pythoneer  
like myself).  Here's my stab at an implementation.  Remember, the  
goal is to add a "match" method to Template which works like  
Template.substitute, but in reverse: given a string, if that string  
matches the template, then it should return a dictionary mapping each  
template field to the corresponding value in the given string.

Oh, and as one extra feature, I want to support a ".greedy" attribute  
on the Template object, which determines whether the matching of  
fields should be done in a greedy or non-greedy manner.

------------------------------------------------------------
#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
	# start by finding the fields in our template, and building a map
	# from field position (index) to field name.
	posToName = {}
	pos = 1
	for item in self.pattern.findall(self.template):
		# each item is a tuple where item 1 is the field name
		posToName[pos] = item[1]
		pos += 1
	
	# determine if we should match greedy or non-greedy
	greedy = False
	if self.__dict__.has_key('greedy'):
		greedy = self.greedy

	# now, build a regex pattern to compare against s
	# (taking care to escape any characters in our template that
	# would have special meaning in regex)
	pat = self.template.replace('.', '\\.')
	pat = pat.replace('(', '\\(')
	pat = pat.replace(')', '\\)') # there must be a better way...
	
	if greedy:
		pat = self.pattern.sub('(.*)', pat)
	else:
		pat = self.pattern.sub('(.*?)', pat)
	p = re.compile(pat)
	
	# try to match this to the given string
	match = p.match(s)
	if match is None: return None
	out = {}
	for i in posToName.keys():
		out[posToName[i]] = match.group(i)
	return out


Template.match = templateMatch

t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------

This sort-of works, but it won't properly handle $$ in the template,  
and I'm not too sure whether it handles the ${fieldname} form,  
either.  Also, it only escapes '.', '(', and ')' in the template...  
there must be a better way of escaping all characters that have  
special meaning to RegEx, except for '$' (which is why I can't use  
re.escape).

Probably the rest of the code could be improved too.  I'm eager to  
hear your feedback.

Thanks,
- Joe





More information about the Python-list mailing list