[pypy-dev] Unicode encode/decode speed
Eleytherios Stamatogiannakis
estama at gmail.com
Mon Feb 11 16:48:22 CET 2013
Hi,
We have been following the nightly builds of PyPy, with our testing
workload (first described in the "CFFI speed results" thread).
The news are very good. The performance of PyPy + CFFI has gone up
considerably (~30% faster) since the last time we wrote about it!
By adding on that speed up also our optimizations of the CFFI based
SQLite3 wrapper (MSPW) that we are developing, the end result is that
most of our test queries are at the same speed or faster than CPython +
APSW now.
Unfortunately, one of the queries where PyPy is slower [*] than CPython
+ APSW, is very central to all of our workflows, which means that we
cannot fully convert to using PyPy.
The main culprit of PyPy's slowness is the conversion (encoding,
decoding) from PyPy's unicodes to UTF-8. It is the only thing, with a
big percentage (~48%), remaining at the top of our performance profiles .
Right now we are using PyPy's "codecs.utf_8_encode" and
"codecs.utf_8_decode" to do this conversion.
It there a faster way to do these conversions (encoding, decoding) in
PyPy? Does CPython do something more clever than PyPY, like storing
unicodes with full ASCII char content, in an ASCII representation?
Thank you very much,
lefteris.
[*]
For 1M rows:
CPython + APSW: 10.5 sec
PyPy + MSPW: 15.5 sec
More information about the pypy-dev
mailing list