[issue19646] Use PyUnicodeWriter in repr(dict)

STINNER Victor report at bugs.python.org
Mon Nov 18 22:15:06 CET 2013


New submission from STINNER Victor:

Attached patch modify dict_repr() function to use the _PyUnicodeWriter API instead of building a list of short strings with PyUnicode_AppendAndDel() and calling PyUnicode_Join() at the end to join the list. PyUnicode_Append() is inefficient because it has to allocate a new string instead of reusing the same buffer.

_PyUnicodeWriter API has a different design. It overallocates a buffer to write Unicode characters and shrink the buffer at the end. It is faster according to my micro benchmark.


$ ./python ~/prog/HG/misc/python/benchmark.py compare_to pyaccu writer
Common platform:
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Python unicode implementation: PEP 393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Timer precision: 40 ns
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer: time.perf_counter

Platform of campaign pyaccu:
Date: 2013-11-18 21:37:44
Python version: 3.4.0a4+ (default:fc7ceb001eec, Nov 18 2013, 21:29:41) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
SCM: hg revision=fc7ceb001eec tag=tip branch=default date="2013-11-18 21:11 +0100"

Platform of campaign writer:
Date: 2013-11-18 22:10:40
Python version: 3.4.0a4+ (default:fc7ceb001eec+, Nov 18 2013, 22:10:12) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
SCM: hg revision=fc7ceb001eec+ tag=tip branch=default date="2013-11-18 21:11 +0100"

--------------------------------------+-------------+--------------
Tests                                 |      pyaccu |        writer
--------------------------------------+-------------+--------------
{"a": 1}                              |  603 ns (*) | 496 ns (-18%)
dict(zip("abc", range(3)))            | 1.05 us (*) | 904 ns (-14%)
{"%03d":"abc" for k in range(10)}     |  631 ns (*) | 501 ns (-21%)
{"%100d":"abc" for k in range(10)}    |  660 ns (*) | 484 ns (-27%)
{k:"a" for k in range(10**3)}         |  235 us (*) | 166 us (-30%)
{k:"abc" for k in range(10**3)}       |  245 us (*) | 177 us (-28%)
{"%100d":"abc" for k in range(10**3)} |  668 ns (*) | 478 ns (-28%)
{k:"a" for k in range(10**6)}         |  258 ms (*) | 186 ms (-28%)
{k:"abc" for k in range(10**6)}       |  265 ms (*) | 184 ms (-31%)
{"%100d":"abc" for k in range(10**6)} |  652 ns (*) | 489 ns (-25%)
--------------------------------------+-------------+--------------
Total                                 |  523 ms (*) | 369 ms (-29%)
--------------------------------------+-------------+--------------

----------
components: Unicode
files: dict_repr_writer.patch
keywords: patch
messages: 203322
nosy: ezio.melotti, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Use PyUnicodeWriter in repr(dict)
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file32694/dict_repr_writer.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19646>
_______________________________________


More information about the Python-bugs-list mailing list