Reverse string-formatting (maybe?)
Tim Chase
python.list at tim.thechases.com
Sun Oct 15 09:37:33 EDT 2006
>>> >>> template = '%s, %s, %s'
>>> >>> values = ('Tom', 'Dick', 'Harry')
>>> >>> formatted = template % values
>>> >>> import re
>>> >>> unformat_string = template.replace('%s', '([^, ]+)')
>>> >>> unformatter = re.compile(unformat_string)
>>> >>> extracted_values = unformatter.search(formatted).groups()
>>>
>>> using '[^, ]+' to mean "one or more characters that aren't a
>>> comma or a space".
>>
>> One more thing (I forgot to mention this other situation earlier)
>> The %s characters are ints, and outside can be anything except int
>> characters. I do have one situation of '%s%s%s', but I can change it to
>> '%s', and change the output into the needed output, so that's not
>> important. Think something along the lines of "abckdaldj iweo%s
>> qwierxcnv !%sjd".
>
> That was written in haste. All the information is true. The question:
> I've already created a function to do this, using your original
> deformat function. Is there any way in which it might go wrong?
Only you know what anomalies will be found in your data-sets. If
you know/assert that
-the only stuff in the formatting string is one set of characters
-that stuff in the replacement-values can never include any of
your format-string characters
-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")
then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog. :)
Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like
tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}
for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <> value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)
-tkc
More information about the Python-list
mailing list