[issue13624] UTF-8 encoder performance regression in python3.3

Martin v. Löwis report at bugs.python.org
Sat Dec 17 20:50:12 CET 2011


Martin v. Löwis <martin at v.loewis.de> added the comment:

Can you please provide your exact testing procedure? Standard iobench.py doesn't support testing for separate ASCII, UCS-1 and UCS-2 data, so you must have used some other tool. Exact code, command line parameters, hardware description and timing results would be appreciated.

Looking at the encoder, I think the first thing to change is to reduce the over-allocation for UCS-1 and UCS-2 strings. This may or may not help the run-time, but should reduce memory consumption.

I wonder whether making two passes over the string (one to compute the size, and the other one with an allocated result buffer) could improve the performance.

If there is further special-casing, I'd only special-case UCS-1. I doubt that the _READ() macro really is the bottleneck, and would rather expect that loop unrolling can help. Because of unallowed surrogates, unrolling is not practical for UCS-2.

----------
nosy: +loewis

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13624>
_______________________________________


More information about the Python-bugs-list mailing list