Unicode 7

Tue Apr 29 13:59:23 EDT 2014

On 2014-04-29 10:37, wxjmfauth at gmail.com wrote:
> >>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'")  
> [1.4027834829454946, 1.38714224331963, 1.3822586635296261]
> >>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y =
> >>> '\u0fce'")  
> [5.462776291480395, 5.4479432055423445, 5.447874284053398]
> >>> 
> >>> 
> >>> # more interesting
> >>> timeit.repeat("(x*1000 + y)[:-1]",\  
> ...     setup="x = 'abc'.encode('utf-8'); y =
> '\u0fce'.encode('utf-8')") [1.3496489533188765, 1.328654286266783,
> 1.3300913977710707]
> >>>   

While I dislike feeding the troll, what I see here is:  on your
machine, all unicode manipulations in the test should take ~5.4
seconds.  But Python notices that some of your strings *don't*
require a full 32-bits and thus optimizes those operations, cutting
about 75% of the processing time (wow...4-bytes-per-char to
1-byte-per-char, I wonder where that 75% savings comes from).

So rather than highlight any *problem* with Python, your [mostly
worthless microbenchmark non-realworld] tests show that Python's
unicode implementation is awesome.

Still waiting to see an actual bug-report as mentioned on the other
thread.

-tkc