Article on the future of Python

Thu Sep 27 05:33:34 EDT 2012

On Wed, 26 Sep 2012 08:45:30 -0700, wxjmfauth wrote:

> Sorry guys, I'm "only" able to see this (with the Python versions an end
> user can download):

[snip timeit results]

While you have been all doom and gloom and negativity that Python has 
"destroyed" Unicode, I've actually done some testing. It seems that, 
possibly, there is a performance regression in the "replace" method.

This is on Debian squeeze, using the latest rc version of 3.3, 3.3.0rc3:

py> timeit.repeat("('b'*1000).replace('b', 'a')")
[28.308280900120735, 29.012173799797893, 28.834429003298283]

Notice that Unicode doesn't come into it, they are pure ASCII strings. 
Here's the same thing using 3.2.2:

py> timeit.repeat("('b'*1000).replace('b', 'a')")
[3.4444618225097656, 3.147739887237549, 3.132185935974121]

That's a factor of 9 slowdown in 3.3, and no Unicode. Obviously Python 
has "destroyed ASCII".

(I get similar slowdowns for Unicode strings too, so clearly Python hates 
all strings, not just ASCII.)

Now, for irrelevant reasons, here I swapped to Centos.

[steve at ando ~]$ python2.7 -m timeit "'b'*1000"
1000000 loops, best of 3: 0.48 usec per loop
[steve at ando ~]$ python3.2 -m timeit "'b'*1000"
1000000 loops, best of 3: 1.3 usec per loop
[steve at ando ~]$ python3.3 -m timeit "'b'*1000"
1000000 loops, best of 3: 0.397 usec per loop

Clearly 3.3 is the fastest at string multiplication, at least for this 
trivial example. Just to prove that the result also applies to Unicode:

[steve at ando ~]$ python3.3 -m timeit "('你'*1000)"
1000000 loops, best of 3: 1.38 usec per loop

Almost identical to 3.2. And the reason it is slower than the 3.3 test 
using 'b' above is almost certainly because the string uses four times 
more memory:

[steve at ando ~]$ python3.3 -m timeit "('abcd'*1000)"
1000000 loops, best of 3: 0.919 usec per loop

So a little slower that the pure-ASCII version for the same amount of 
memory, but not significantly so.

But add a call to replace, and things are very different:

[steve at ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 9.3 usec per loop
[steve at ando ~]$ python3.2 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 5.43 usec per loop
[steve at ando ~]$ python3.3 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 18.3 usec per loop

Three times slower, even for pure-ASCII strings. I get comparable results 
for Unicode. Notice how slow Python 2.7 is:

[steve at ando ~]$ python2.7 -m timeit -s "s = u'你'*1000" "s.replace(u'你', u'a')"
10000 loops, best of 3: 65.6 usec per loop
[steve at ando ~]$ python3.2 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
100000 loops, best of 3: 2.79 usec per loop
[steve at ando ~]$ python3.3 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
10000 loops, best of 3: 23.7 usec per loop

Even with the performance regression, it is still over twice as fast as 
Python 2.7.

Nevertheless, I think there is something here. The consequences are nowhere
near as dramatic as jmf claims, but it does seem that replace() has taken a
serious performance hit. Perhaps it is unavoidable, but perhaps not.

If anyone else can confirm similar results, I think this should be raised as
a performance regression.

-- 
Steven