substitution

Wilbert Berendsen wbsoft at xs4all.nl
Thu Jan 21 10:08:21 EST 2010


Op donderdag 21 januari 2010 schreef MRAB:

> For longest first you need:
> 
>      keys = sorted(mapping.keys(), key=len, reverse=True)

Oh yes, I cut/pasted the wrong line :-)
Just for clarity:

import re

mapping = {
        "foo" : "bar",
        "baz" : "quux",
        "quuux" : "foo"
}

# sort the keys, longest first, so 'aa' gets matched before 'a', because
# in Python regexps the first match (going from left to right) in a
# |-separated group is taken
keys = sorted(mapping.keys(), key=len, reverse=True)

rx = re.compile("|".join(keys))
repl = lambda x: mapping[x.group()]
s = "fooxxxbazyyyquuux"
rx.sub(repl, s)

>> One thing remaining: if the replacement keys could contain non-alphanumeric 
>> characters, they should be escaped using re.escape:
>> rx = re.compile("|".join(re.escape(key) for key in keys))
>> 
>Strictly speaking, not all non-alphanumeric characters, but only the
>special ones.

True, although the re.escape function simply escapes all non-alphanumeric 
characters :)

And here is a factory function that returns a translator given a mapping. The 
translator can be called to perform replacements in a string:

import re

def translator(mapping):
	keys = sorted(mapping.keys(), key=len, reverse=True)
	rx = re.compile("|".join(keys))
	repl = lambda m: mapping[m.group()]
	return lambda s: rx.sub(repl, s)

#Usage:
>>> t = translator(mapping)
>>> t('fooxxxbazyyyquuux')
'barxxxquuxyyyfoo'


w best regards,
Wilbert Berendsen

-- 
http://www.wilbertberendsen.nl/



More information about the Python-list mailing list