substitution

Wilbert Berendsen wbsoft at xs4all.nl
Thu Jan 21 09:18:54 EST 2010


Op maandag 18 januari 2010 schreef Adi:
> keys = [(len(key), key) for key in mapping.keys()]
> keys.sort(reverse=True)
> keys = [key for (_, key) in keys]
> 
> pattern = "(%s)" % "|".join(keys)
> repl = lambda x : mapping[x.group(1)]
> s = "fooxxxbazyyyquuux"
> 
> re.subn(pattern, repl, s)

I managed to make it even shorted, using the key argument for sorted, not 
putting the whole regexp inside parentheses and pre-compiling the regular 
expression:

import re

mapping = {
        "foo" : "bar",
        "baz" : "quux",
        "quuux" : "foo"
}

# sort the keys, longest first, so 'aa' gets matched before 'a', because
# in Python regexps the first match (going from left to right) in a
# |-separated group is taken
keys = sorted(mapping.keys(), key=len)

rx = re.compile("|".join(keys))
repl = lambda x: mapping[x.group()]
s = "fooxxxbazyyyquuux"
rx.sub(repl, s)

One thing remaining: if the replacement keys could contain non-alphanumeric 
characters, they should be escaped using re.escape:

rx = re.compile("|".join(re.escape(key) for key in keys))


Met vriendelijke groet,
Wilbert Berendsen

-- 
http://www.wilbertberendsen.nl/
"You must be the change you wish to see in the world."
        -- Mahatma Gandhi



More information about the Python-list mailing list