[issue19087] bytearray front-slicing not optimized
STINNER Victor
report at bugs.python.org
Mon Sep 30 23:49:07 CEST 2013
STINNER Victor added the comment:
I adapted my micro-benchmark to measure the speedup: bench_bytearray2.py. Result on bytea_slice2.patch:
Common platform:
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
Python unicode implementation: PEP 393
Timer: time.perf_counter
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer precision: 40 ns
Platform of campaign original:
Date: 2013-09-30 23:39:31
Python version: 3.4.0a2+ (default:687dd81cee3b, Sep 30 2013, 23:39:27) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
SCM: hg revision=687dd81cee3b tag=tip branch=default date="2013-09-29 22:18 +0200"
Platform of campaign patched:
Date: 2013-09-30 23:38:55
Python version: 3.4.0a2+ (default:687dd81cee3b+, Sep 30 2013, 23:30:35) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
SCM: hg revision=687dd81cee3b+ tag=tip branch=default date="2013-09-29 22:18 +0200"
------------------------+-------------+------------
non regression | original | patched
------------------------+-------------+------------
concatenate 10**1 bytes | 1.1 us (*) | 1.14 us
concatenate 10**3 bytes | 46.9 us | 46.8 us (*)
concatenate 10**5 bytes | 4.66 ms (*) | 4.71 ms
concatenate 10**7 bytes | 478 ms (*) | 483 ms
------------------------+-------------+------------
Total | 482 ms (*) | 488 ms
------------------------+-------------+------------
----------------------------+-------------------+-------------
deleting front, append tail | original | patched
----------------------------+-------------------+-------------
buffer 10**1 bytes | 639 ns (*) | 689 ns (+8%)
buffer 10**3 bytes | 682 ns (*) | 723 ns (+6%)
buffer 10**5 bytes | 3.54 us (+428%) | 671 ns (*)
buffer 10**7 bytes | 900 us (+107128%) | 840 ns (*)
----------------------------+-------------------+-------------
Total | 905 us (+30877%) | 2.92 us (*)
----------------------------+-------------------+-------------
----------------------------+------------------+------------
Summary | original | patched
----------------------------+------------------+------------
non regression | 482 ms (*) | 488 ms
deleting front, append tail | 905 us (+30877%) | 2.92 us (*)
----------------------------+------------------+------------
Total | 483 ms (*) | 488 ms
----------------------------+------------------+------------
@Serhiy: I see "zero" difference in the append loop micro-benchmark. I added the final cast to bytes()
@Antoine: Your patch rocks, 30x faster! (I don't care of the 8% slowdown in the nanosecond timing).
----------
Added file: http://bugs.python.org/file31929/bench_bytearray2.py
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19087>
_______________________________________
More information about the Python-bugs-list
mailing list