any chance regular expressions are cached?

Arnaud Delobelle arnodel at googlemail.com
Mon Mar 10 17:52:45 EDT 2008


On Mar 10, 3:39 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
[...]
> Having said that, at least four out of the five examples you give are
> good examples of when you SHOULDN'T use regexes.
>
> re.sub(r'\n','\n'+spaces,s)
>
> is better written as s.replace('\n', '\n'+spaces). Don't believe me?
> Check this out:
>
> >>> s = 'hello\nworld'
> >>> spaces = "   "
> >>> from timeit import Timer
> >>> Timer("re.sub('\\n', '\\n'+spaces, s)",
>
> ... "import re;from __main__ import s, spaces").timeit()
> 7.4031901359558105>>> Timer("s.replace('\\n', '\\n'+spaces)",
>
> ... "import re;from __main__ import s, spaces").timeit()
> 1.6208670139312744
>
> The regex is nearly five times slower than the simple string replacement.

I agree that the second version is better, but most of the time in the
first one is spend compiling the regexp, so the comparison is not
really fair:

>>> s = 'hello\nworld'
>>> spaces = "   "
>>> import re
>>> r = re.compile('\\n')
>>> from timeit import Timer
>>> Timer("r.sub('\\n'+spaces, s)", "from __main__ import r,spaces,s").timeit()
1.7726190090179443
>>> Timer("s.replace('\\n', '\\n'+spaces)", "from __main__ import s, spaces").timeit()
0.76739501953125
>>> Timer("re.sub('\\n', '\\n'+spaces, s)", "from __main__ import re, s, spaces").timeit()
4.3669700622558594
>>>

Regexps are still more than twice slower.

--
Arnaud




More information about the Python-list mailing list