substitution

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Mon Jan 18 11:26:48 EST 2010


On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote:

> On Jan 18, 2:17 pm, Adi Eyal <a... at digitaltrowel.com> wrote:
[...]
>> Using regular expressions the answer is short (and sweet)
>>
>> mapping = {
>>         "foo" : "bar",
>>         "baz" : "quux",
>>         "quuux" : "foo"
>>
>> }
>>
>> pattern = "(%s)" % "|".join(mapping.keys())
>> repl = lambda x : mapping.get(x.group(1), x.group(1)) 
>> s = "fooxxxbazyyyquuux"
>> re.subn(pattern, repl, s)
> 
> Winner! :)

What are the rules for being declared "Winner"? For the simple case 
given, calling s.replace three times is much faster: more than twice as 
fast.

But a bigger problem is that the above "winner" may not work correctly if 
there are conflicts between the target strings (e.g. 'a'->'X', 
'aa'->'Y'). The problem is that the result you get depends on the order 
of the searches, BUT as given, that order is non-deterministic. 
dict.keys() returns in an arbitrary order, which means the caller can't 
specify the order except by accident. For example:

>>> repl = lambda x : m[x.group(1)]
>>> m = {'aa': 'Y', 'a': 'X'}
>>> pattern = "(%s)" % "|".join(m.keys())
>>> subn(pattern, repl, 'aaa')  # expecting 'YX'
('XXX', 3)

The result that you get using this method will be consistent but 
arbitrary and unpredictable.




For those who care, here's my timing code:

from timeit import Timer

setup = """
mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"}
pattern = "(%s)" % "|".join(mapping.keys())
repl = lambda x : mapping.get(x.group(1), x.group(1))
repl = lambda x : mapping[x.group(1)]
s = "fooxxxbazyyyquuux"
from re import subn
"""

t1 = Timer("subn(pattern, repl, s)", setup)
t2 = Timer(
"s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')",
"s = 'fooxxxbazyyyquuux'")


And the results on my PC:

>>> min(t1.repeat(number=100000))
1.1273870468139648
>>> min(t2.repeat(number=100000))
0.49491715431213379



-- 
Steven



More information about the Python-list mailing list