[Python-Dev] [Python-checkins] cpython: In-line the append operations inside deque_inplace_repeat().

Brett Cannon bcannon at gmail.com
Tue Sep 15 01:36:09 CEST 2015


On Mon, 14 Sep 2015 at 15:37 Raymond Hettinger <raymond.hettinger at gmail.com>
wrote:

>
> > On Sep 14, 2015, at 12:49 PM, Brett Cannon <bcannon at gmail.com> wrote:
> >
> > Would it be worth adding a comment that the block of code is an inlined
> copy of deque_append()?
> > Or maybe even turn the append() function into a macro so you minimize
> code duplication?
>
> I don't think either would be helpful.  The point of the inlining was to
> let the code evolve independently from deque_append().
>

OK, commit message just didn't point that out as the reason for the
inlining (I guess in the future call it a fork of the code to know it is
meant to evolve independently?).

-Brett


>
> Once separated from the mother ship, the code in deque_inline_repeat()
> could now shed the unnecessary work.  The state variable is updated once.
> The updates within a single block are now in the own inner loop. The deque
> size is updated outside of that loop, etc.   In other words, they are no
> longer the same code.
>
> The original append-in-a-loop version was already being in-lined by the
> compiler but was doing way too much work.  For each item written in the
> original, there were 7 memory reads, 5 writes, 6 predictable
> compare-and-branches, and 5 add/sub operations.  In the current form, there
> are 0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub
> operations.
>
> FWIW, my work flow is that periodically I expand the code with new
> features (the upcoming work is to add slicing support
> http://bugs.python.org/issue17394), then once it is correct and tested, I
> make a series optimization passes (such as the work I just described
> above).  After that, I come along and factor-out common code, usually with
> clean, in-lineable functions rather than macros (such as the recent
> check-in replacing redundant code in deque_repeat with a call to the common
> code in deque_inplace_repeat).
>
> My schedule lately hasn't given me any big blocks of time to work with, so
> I do the steps piecemeal as I get snippets of development time.
>
>
> Raymond
>
>
> P.S. For those who are interested, here is the before and after:
>
> ---- before ---------------------------------
> L1152:
>     movq    __Py_NoneStruct at GOTPCREL(%rip), %rdi
>     cmpq    $0, (%rdi)                                   <
>     je  L1257
> L1159:
>     addq    $1, %r13
>     cmpq    %r14, %r13
>     je  L1141
>     movq    16(%rbx), %rsi                               <
> L1142:
>     movq    48(%rbx), %rdx                               <
>     addq    $1, 56(%rbx)                                 <>
>     cmpq    $63, %rdx
>     je  L1143
>     movq    32(%rbx), %rax                               <
>     addq    $1, %rdx
> L1144:
>     addq    $1, 0(%rbp)                                  <>
>     leaq    1(%rsi), %rcx
>     movq    %rdx, 48(%rbx)                                >
>     movq    %rcx, 16(%rbx)                                >
>     movq    %rbp, 8(%rax,%rdx,8)                          >
>     movq    64(%rbx), %rax                               <
>     cmpq    %rax, %rcx
>     jle L1152
>     cmpq    $-1, %rax
>     je  L1152
>
>
> ---- after ------------------------------------
> L777:
>     cmpq    $63, %rdx
>     je  L816
> L779:
>     addq    $1, %rdx
>     movq    %rbp, 16(%rsi,%rbx,8)                <
>     addq    $1, %rbx
>     leaq    (%rdx,%r9), %rcx
>     subq    %r8, %rcx
>     cmpq    %r12, %rbx
>     jl  L777
>
>     # outside the inner-loop
>     movq    %rdx, 48(%r13)
>     movq    %rcx, 0(%rbp)
>     cmpq    %r12, %rbx
>     jl  L780
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150914/be056253/attachment.html>


More information about the Python-Dev mailing list