[Python-Dev] 3.3 str timings
Terry Reedy
tjreedy at udel.edu
Wed Aug 22 00:08:57 CEST 2012
On 8/21/2012 9:04 AM, Victor Stinner wrote:
> 2012/8/18 Terry Reedy <tjreedy at udel.edu>:
>> The issue came up in python-list about string operations being slower in
>> 3.3. (The categorical claim is false as some things are actually faster.)
>
> Yes, some operations are slower, but others are faster :-)
Yes, that is what I wrote, showed, and posted to python-list :-)
I was and am posting here in response to a certain French writer who
dislikes the fact that 3.3 unicode favors text written with the first
256 code points, which do not include all the characters needed for
French, and do not include the euro symbol invented years after that set
was established. His opinion aside, his search for 'evidence' did turn
up a version of the example below.
> an important effort to limit the overhead of the PEP 393 (when the
> branch was merged, most operations were slower). I tried to fix all
> performance regressions.
Yes, I read and appreciated the speed-up patches by you and others.
> If you find cases where Python 3.3 is slower,
> I can investigate and try to optimize it (in Python 3.4) or at least
> explain why it is slower :-)
Replacement appears to be as much as 6.5 times slower on some Win 7
machines. (I factored out the setup part, which increased the ratio
since it takes the same time on both machines.)
ttr = timeit.repeat
# 3.2.3
>>> ttr("euroreplace('€', 'œ')", "euroreplace = ('€'*100).replace")
[0.385043233078477, 0.35294282203631155, 0.3468394370770511]
# 3.3.0b2
>>> ttr("euroreplace('€', 'œ')", "euroreplace = ('€'*100).replace")
[2.2624885911213823, 2.245330314124203, 2.2531118686461014]
How do this compare on *nix?
> As said by Antoine, use the stringbench tool if you would like to get
> a first overview of string performances.
I found it, ran it on 3.2 and 3.3, and posted to python-list that 3.3
unicode looks quite good. It is overall comparable to both byte
operations and 3.2 unicode operations. Replace operations were
relatively the slowest, though I do not remember any as bad as the
example above.
>> Some things I understand, this one I do not.
>>
>> Win7-64, 3.3.0b2 versus 3.2.3
>> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
>> # .6 in 3.2, 1.2 in 3.3
>
> On Linux with narrow build (UTF-16), I get:
>
> $ python3.2 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
> 100000 loops, best of 3: 4.25 usec per loop
> $ python3.3 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
> 100000 loops, best of 3: 3.21 usec per loop
The slowdown seems to be specific to (some?) windows systems. Perhaps we
as hitting a difference in the VC2008 and VC2010 compilers or runtimes.
Someone on python-list wondered whether the 3.3.0 betas have the same
compile optimization settings as 3.2.3 final. Martin?
> Can you reproduce your benchmark on other Windows platforms? Do you
> run the benchmark more than once? I always run a benchmark 3 times.
Always, and now I see the repeat does this for me.
> I don't like the timeit module for micro benchmarks, it is really
> unstable (default settings are not written for micro benchmarks).
I am reporting rounded lowest times. As other said, make timeit better
if you can.
>> print(timeit("a.encode()", "a = 'a'*1000"))
>> # 1.5 in 3.2, .26 in 3.3
>>
>> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
>> # 1.7 in 3.2, .51 in 3.3
>
> This test doesn't compare performances of the UTF-8 encoder: "encode"
> an ASCII string to UTF-8 in Python 3.3 is a no-op, it just duplicates
> the memory (ASCII is compatible with UTF-8)...
That is what I thought, and why I was puzzled, ...
> So your benchmark just measures the performances of
> PyArg_ParseTupleAndKeywords()...,
having forgotten about arg processing. I should have factored out the
.encode lookup (as I did with .replace). The following suggests that you
are correct. The difference, about .3, is independent of the length of
string being copied.
>>> ttr("aenc()", "aenc = ('a'*10000).encode")
[0.588499543029684, 0.5760222493490801, 0.5757037691037112]
>>> ttr("aenc(encoding='utf-8')", "aenc = ('a'*10000).encode")
[0.8973955632254729, 0.887000380270365, 0.884113153942053]
>>> ttr("aenc()", "aenc = ('a'*50000).encode")
[3.6618914099180984, 3.650091040467487, 3.6542183723140624]
>>> ttr("aenc(encoding='utf-8')", "aenc = ('a'*50000).encode")
[3.964849740958016, 3.9363826484832316, 3.937290440151628]
--
Terry Jan Reedy
More information about the Python-Dev
mailing list