[Python-Dev] Inplace operations for PyLong objects

Manciu, Catalin Gabriel catalin.gabriel.manciu at intel.com
Thu Aug 31 19:35:34 EDT 2017


>On my machine, the more realistic code, with an implicit C loop,
>the_value = sum(the_increment for i in range(total_iters))
>gives the same value twice as fast as your explicit Python loop.
>(I cut total_iters down to 10**7).

Your code is faster due to a number of reasons:
    - range in Python 3 is implemented in C so it's quite faster
      and, because your range only goes up to 10 ** 7, the fastest iterator
      is used: rangeiterobject for which the 'next' function is implemented
      using native longs instead of CPython PyLongs:
      rangeiter_next(rangeiterobject *r) from rangeobject.c
    - my code also does some extra work to output a progress indicator
>You might check whether sum uses an in-place accumulator for ints.
    - you're right, sum actually works with native longs until it overflows or
      you stop adding PyLongs, then it falls back to PyNumber_Add, check:
        static PyObject * builtin_sum_impl(PyObject *module, PyObject *iterable, 
                                           PyObject *start)
        from bltinmodule.c

The focus of this experiment was inplace adds in general. While, as you've
shown, there are ways to write the loop optimally, the benchmark was written
as a huge loop just to showcase that there is an improvement using this
approach. The performance improvement is a result of not having to
allocate/deallocate a PyLong per iteration.

A huge Python program with lots of PyLong inplace operations (not just
adds, this can be applied to all PyLong inplace operations), regardless of them
being in a loop or not, might benefit from such an optimization.

Thank you,
Catalin



More information about the Python-Dev mailing list