[issue19087] bytearray front-slicing not optimized

Mon Sep 30 23:49:07 CEST 2013

STINNER Victor added the comment:

I adapted my micro-benchmark to measure the speedup: bench_bytearray2.py. Result on  bytea_slice2.patch:

Common platform:
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
Python unicode implementation: PEP 393
Timer: time.perf_counter
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer precision: 40 ns

Platform of campaign original:
Date: 2013-09-30 23:39:31
Python version: 3.4.0a2+ (default:687dd81cee3b, Sep 30 2013, 23:39:27) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
SCM: hg revision=687dd81cee3b tag=tip branch=default date="2013-09-29 22:18 +0200"

Platform of campaign patched:
Date: 2013-09-30 23:38:55
Python version: 3.4.0a2+ (default:687dd81cee3b+, Sep 30 2013, 23:30:35) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
SCM: hg revision=687dd81cee3b+ tag=tip branch=default date="2013-09-29 22:18 +0200"

------------------------+-------------+------------
non regression          |    original |     patched
------------------------+-------------+------------
concatenate 10**1 bytes |  1.1 us (*) |     1.14 us
concatenate 10**3 bytes |     46.9 us | 46.8 us (*)
concatenate 10**5 bytes | 4.66 ms (*) |     4.71 ms
concatenate 10**7 bytes |  478 ms (*) |      483 ms
------------------------+-------------+------------
Total                   |  482 ms (*) |      488 ms
------------------------+-------------+------------

----------------------------+-------------------+-------------
deleting front, append tail |          original |      patched
----------------------------+-------------------+-------------
buffer 10**1 bytes          |        639 ns (*) | 689 ns (+8%)
buffer 10**3 bytes          |        682 ns (*) | 723 ns (+6%)
buffer 10**5 bytes          |   3.54 us (+428%) |   671 ns (*)
buffer 10**7 bytes          | 900 us (+107128%) |   840 ns (*)
----------------------------+-------------------+-------------
Total                       |  905 us (+30877%) |  2.92 us (*)
----------------------------+-------------------+-------------

----------------------------+------------------+------------
Summary                     |         original |     patched
----------------------------+------------------+------------
non regression              |       482 ms (*) |      488 ms
deleting front, append tail | 905 us (+30877%) | 2.92 us (*)
----------------------------+------------------+------------
Total                       |       483 ms (*) |      488 ms
----------------------------+------------------+------------

@Serhiy: I see "zero" difference in the append loop micro-benchmark. I added the final cast to bytes()

@Antoine: Your patch rocks, 30x faster! (I don't care of the 8% slowdown in the nanosecond timing).

----------
Added file: http://bugs.python.org/file31929/bench_bytearray2.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19087>
_______________________________________