[issue13624] UTF-8 encoder performance regression in python3.3

Sat Dec 17 19:49:12 CET 2011

New submission from STINNER Victor <victor.stinner at haypocalc.com>:

iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string:

 * 8x faster (!) for a string of 50.000 ASCII characters
 * 1.5x slower for a string of 50.000 UCS-1 characters
 * 2.5x slower for a string of 50.000 UCS-2 characters

The bottleneck looks to be the the PyUnicode_READ() macro.

 * Python 3.2: s[i++]
 * Python 3.3: PyUnicode_READ(kind, data, i++)

Because encoding string to UTF-8 is a very common operation, performances do matter. Antoine suggests to have different versions of the function for each Unicode kind (1, 2, 4).

----------
components: Unicode
messages: 149695
nosy: ezio.melotti, haypo, pitrou
priority: normal
severity: normal
status: open
title: UTF-8 encoder performance regression in python3.3
type: performance
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13624>
_______________________________________