substitution

MRAB python at mrabarnett.plus.com
Thu Jan 21 09:42:02 EST 2010


Wilbert Berendsen wrote:
> Op maandag 18 januari 2010 schreef Adi:
>> keys = [(len(key), key) for key in mapping.keys()]
>> keys.sort(reverse=True)
>> keys = [key for (_, key) in keys]
>>
>> pattern = "(%s)" % "|".join(keys)
>> repl = lambda x : mapping[x.group(1)]
>> s = "fooxxxbazyyyquuux"
>>
>> re.subn(pattern, repl, s)
> 
> I managed to make it even shorted, using the key argument for sorted, not 
> putting the whole regexp inside parentheses and pre-compiling the regular 
> expression:
> 
> import re
> 
> mapping = {
>         "foo" : "bar",
>         "baz" : "quux",
>         "quuux" : "foo"
> }
> 
> # sort the keys, longest first, so 'aa' gets matched before 'a', because
> # in Python regexps the first match (going from left to right) in a
> # |-separated group is taken
> keys = sorted(mapping.keys(), key=len)
> 
For longest first you need:

     keys = sorted(mapping.keys(), key=len, reverse=True)

> rx = re.compile("|".join(keys))
> repl = lambda x: mapping[x.group()]
> s = "fooxxxbazyyyquuux"
> rx.sub(repl, s)
> 
> One thing remaining: if the replacement keys could contain non-alphanumeric 
> characters, they should be escaped using re.escape:
> 
Strictly speaking, not all non-alphanumeric characters, but only the
special ones.

> rx = re.compile("|".join(re.escape(key) for key in keys))
> 




More information about the Python-list mailing list