[Python-Dev] [Python-checkins] cpython: In-line the append operations inside deque_inplace_repeat(). (original) (raw)

Brett Cannon bcannon at gmail.com
Tue Sep 15 01:36:09 CEST 2015


On Mon, 14 Sep 2015 at 15:37 Raymond Hettinger <raymond.hettinger at gmail.com> wrote:

> On Sep 14, 2015, at 12:49 PM, Brett Cannon <bcannon at gmail.com> wrote: > > Would it be worth adding a comment that the block of code is an inlined copy of dequeappend()? > Or maybe even turn the append() function into a macro so you minimize code duplication? I don't think either would be helpful. The point of the inlining was to let the code evolve independently from dequeappend().

OK, commit message just didn't point that out as the reason for the inlining (I guess in the future call it a fork of the code to know it is meant to evolve independently?).

-Brett

Once separated from the mother ship, the code in dequeinlinerepeat() could now shed the unnecessary work. The state variable is updated once. The updates within a single block are now in the own inner loop. The deque size is updated outside of that loop, etc. In other words, they are no longer the same code. The original append-in-a-loop version was already being in-lined by the compiler but was doing way too much work. For each item written in the original, there were 7 memory reads, 5 writes, 6 predictable compare-and-branches, and 5 add/sub operations. In the current form, there are 0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub operations. FWIW, my work flow is that periodically I expand the code with new features (the upcoming work is to add slicing support http://bugs.python.org/issue17394), then once it is correct and tested, I make a series optimization passes (such as the work I just described above). After that, I come along and factor-out common code, usually with clean, in-lineable functions rather than macros (such as the recent check-in replacing redundant code in dequerepeat with a call to the common code in dequeinplacerepeat). My schedule lately hasn't given me any big blocks of time to work with, so I do the steps piecemeal as I get snippets of development time.

Raymond P.S. For those who are interested, here is the before and after: ---- before --------------------------------- L1152: movq _PyNoneStruct at GOTPCREL(%rip), %rdi cmpq $0, (%rdi) <_ _je L1257_ _L1159:_ _addq $1, %r13_ _cmpq %r14, %r13_ _je L1141_ _movq 16(%rbx), %rsi <_ _L1142:_ _movq 48(%rbx), %rdx <_ _addq $1, 56(%rbx) <> cmpq $63, %rdx je L1143 movq 32(%rbx), %rax <_ _addq $1, %rdx_ _L1144:_ _addq $1, 0(%rbp) <> leaq 1(%rsi), %rcx movq %rdx, 48(%rbx) > movq %rcx, 16(%rbx) > movq %rbp, 8(%rax,%rdx,8) > _movq 64(%rbx), %rax <_ _cmpq %rax, %rcx_ _jle L1152_ _cmpq $-1, %rax_ _je L1152_ _---- after ------------------------------------_ _L777:_ _cmpq $63, %rdx_ _je L816_ _L779:_ _addq $1, %rdx_ _movq %rbp, 16(%rsi,%rbx,8) <_ _addq $1, %rbx_ _leaq (%rdx,%r9), %rcx_ _subq %r8, %rcx_ _cmpq %r12, %rbx_ _jl L777_ _# outside the inner-loop_ _movq %rdx, 48(%r13)_ _movq %rcx, 0(%rbp)_ _cmpq %r12, %rbx_ _jl L780_ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20150914/be056253/attachment.html>



More information about the Python-Dev mailing list