String performance regression from python 3.2 to 3.3

Sat Mar 16 18:00:32 EDT 2013

Steven D'Aprano:

> So while you might save memory by using "UTF-24" instead of UTF-32, it
> would probably be slower because you would have to grab three bytes at a
> time instead of four, and the hardware probably does not directly support
> that.

     Low-level string manipulation often deals with blocks larger than 
an individual character for speed. Generally 32 or 64-bits at a time 
using the CPU or 128 or 256 using the vector unit. Then there may be 
entry/exit code to handle initial alignment to a block boundary and 
dealing with a smaller than block-size tail.

    For an example of this kind of thing, see find_max_char in 
python\Objects\stringlib\find_max_char.h which can examine a char* 32 or 
64-bits at a time.

    24-bit is likely to be a win in many circumstances due to decreased 
memory traffic. a 12-bit implementation may also be worthwhile as the 
low 0x1000 characters of Unicode contains Latin (with extensions), 
Greek, Cyrillic, Arabic, Hebrew, and most Indic scripts.

    Neil