[issue16334] Faster unicode-escape and raw-unicode-escape codecs

Serhiy Storchaka report at bugs.python.org
Fri Sep 2 10:27:36 EDT 2016


Serhiy Storchaka added the comment:

> Unicode escape encodecs were modified by the issue #25353 to use the
> _PyBytesWriter API. Sadly, I didn't benchmark my change before pushing it
> :-/

You can benchmark it now by checking out revisions with your patch and just 
before. But AFAIK the performance was not changed since 3.3 and the effect of 
your patch is the difference between columns 3.3 and 3.6 (very good).

I used scripts from https://bitbucket.org/storchaka/cpython-stuff/src/default/
bench/ .

> Your patch basically reverts my change.
> 
> > Py3.2        Py3.3        Py3.6        Py3.6+patch
> > 195 (+136%)  109 (+323%)  258 (+79%)   461    encode  unicode-escape 
> > 'A'*10000 391 (+1310%) 333 (+1556%) 575 (+859%)  5514   encode 
> > raw-unicode-escape  'A'*10000

> I'm surprised that the revert makes raw-unicode-escape encoder so much
> faster. Does it mean that the _PyBytesWriter API is so inefficient?

I don't remember all details, but it seems that after applying all 
optimizations _PyBytesWriter became just not needed (unlike to 
_PyUnicodeWriter that is used for widening a buffer).

The awesome difference in encoding for ascii-only data is not related to using 
_PyBytesWriter. It is caused by reordering checks in the inner loop.

> * Rename WRITECHAR macro to WRITE_ASCII_CHAR()

This is not correct name. This macro is used for writing non-ascii characters 
too.

> * Add WRITE_CHAR() macro to avoid "goto writechar;"
> * Drop the "store" label: use WRITE_CHAR() macro instead,

Did you benchmark this change? I afraid that this inflates execution code size 
and can have negative impact on the performance.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16334>
_______________________________________


More information about the Python-bugs-list mailing list