Reverse string-formatting (maybe?)

Tim Chase python.list at tim.thechases.com
Sun Oct 15 09:37:33 EDT 2006


>>>  >>> template = '%s, %s, %s'
>>>  >>> values = ('Tom', 'Dick', 'Harry')
>>>  >>> formatted = template % values
>>>  >>> import re
>>>  >>> unformat_string = template.replace('%s', '([^, ]+)')
>>>  >>> unformatter = re.compile(unformat_string)
>>>  >>> extracted_values = unformatter.search(formatted).groups()
>>>
>>> using '[^, ]+' to mean "one or more characters that aren't a
>>> comma or a space".
>>
>> One more thing (I forgot to mention this other situation earlier)
>> The %s characters are ints, and outside can be anything except int
>> characters. I do have one situation of '%s%s%s', but I can change it to
>> '%s', and change the output into the needed output, so that's not
>> important. Think something along the lines of "abckdaldj iweo%s
>> qwierxcnv !%sjd".
> 
> That was written in haste. All the information is true. The question:
> I've already created a function to do this, using your original
> deformat function. Is there any way in which it might go wrong?

Only you know what anomalies will be found in your data-sets.  If 
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of 
your format-string characters

-that you're not using funky characters/formatting in your format 
string (such as "%%" possibly followed by an "s" to get the 
resulting text of "%s" after formatting, or trying to use other 
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe.  It could also be possible (with my 
original replacement of "(.*)") if your values will never include 
any substring of your format string.  If you can't guarantee 
these conditions, you're trying to make a cow out of hamburger. 
Or a pig out of sausage.  Or a whatever out of a hotdog. :)

Conventional wisdom would tell you to create a test-suite of 
format-strings and sample values (preferably worst-case funkiness 
in your expected format-strings/values), and then have a test 
function that will assert that the unformatting of every 
formatted string in the set returns the same set of values that 
went in.  Something like

tests = {
	'I was %s but now I am %s' : [
		('hot', 'cold'),
		('young', 'old'),
		],
	'He has 3 %s and 2 %s' : [
		('brothers', 'sisters'),
		('cats', 'dogs')
		]
	}

for format_string, values in tests:
	unformatter = format.replace('%s', '(.*)')
	for value_tuple in values:
		formatted = format_string % value_tuple
		unformatted = unformatter.search(formatted).groups()
		if unformatted <> value_tuple:
			print "%s doesn't match %s when unformatting %s" % (
				unformatted,
				value_tuple
				format_string)

-tkc











More information about the Python-list mailing list