Issue 910929: Optimize list comprehensions (original) (raw)

Created on 2004-03-06 13:22 by rhettinger, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
listcomp.diff	rhettinger,2004-03-06 13:22	Py 2.4 patch

Messages (9)
msg45467 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-06 13:22
Save about 35% on the per pass overhead of list comprehensions. Adds a new opcode, LIST_APPEND, which is faster than the current call to CALL_FUNCTION 1 on the bound method, list.append(), and the subsequent call to POP_TOP to clear the returned None value. The resulting disassembled code is suprisingly light and concise.
msg45468 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-07 07:41
Logged In: YES user_id=80475 Applied to: Python/ceval.c 2.379 Python/compile.c 2.299 Include/opcode.h 2.44 Lib/opcode.py 1.5
msg45469 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-03-08 10:58
Logged In: YES user_id=4771 The patch also removes the final DELETE_FAST. Why? I think deleting this old reference to the list is a good idea. Another faster (but slightly less clear) approach would be to change the behavior of LIST_APPEND so that the '_' variable can be completely avoided. In stack manipulation notation: LIST_APPEND [list, ignored, value] -> [list, ignored] where 'ignored' is the iterator still on the stack. Admittedly this is quite an unexpected behavior, so maybe LIST_APPEND should be called LIST_COMPREHEND or somesuch. An intermediate solution with no change in LIST_APPEND would introduce a DUP_OVER stack manipulation opcode: DUP_OVER [a, b] -> [a, b, a] which could replace the LOAD_FAST '_' at the beginning of the loop, getting the list object from over the iterator.
msg45470 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-08 17:31
Logged In: YES user_id=80475 The actual checkin kept the final DELETE_FAST to allow all references to the list to be removable. Will take a look at the proposed variants. Off-hand, they do not have a clean feel to them, but they do save a trip around the eval loop.
msg45471 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2004-03-08 17:36
Logged In: YES user_id=33168 You may have problems with inner list comps: [x+y for x in list1 for y in list2] If you only have a single list comp, and LIST_APPEND didn't pop the list off the stack, I think you could do: 0 BUILD_LIST 0 3 LOAD_FAST 0 (iter) 6 GET_ITER >> 7 FOR_ITER 7 (to 17) 10 STORE_FAST 2 (elem) 13 LIST_APPEND 14 JUMP_ABSOLUTE 7 >> 17 RETURN_VALUE
msg45472 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2004-03-08 17:38
Logged In: YES user_id=33168 My last comment about "problems" referred to removing the DELETE_FAST by Armin.
msg45473 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-08 17:51
Logged In: YES user_id=80475 Armin, do you have a working patch (which handles nested list comps) that implements: LIST_APPEND [list, ignored, value] -> [list, ignored]. If you can get it to work, it would simplify and speedup the code a bit.
msg45474 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-03-08 20:35
Logged In: YES user_id=4771 No, Neal's comments made me realize I didn't think about nested loops. In these, you have several iterators lying around on the stack on top of the list. Definitely not clean. I guess your patch is the cleanest so far. More subtle optimizations should be left to a bytecode optimizer. (btw I wonder why we don't consider these more.)
msg45475 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-08 22:48
Logged In: YES user_id=80475 I am currently working on building out the byte code optimizer. Let me know if you would like to participate. The three infrastructure components needed for further buildouts are: * Check a range to see if it is a basic block * Allow a tempory NOP code to be eliminated on a final pass which makes jump target fixups. * Fixup all of the line number references. I made the first two but not the last. For example, A basic block containing: [BUILD_TUPLE 2 UNPACK_SEQUENCE 2] or [BUILD_LIST 2 UNPACK_SEQUENCE 2] should be directly replaced with [ROT_TWO NOP NOP NOP NOP NOP] and be compressed on a final pass to [ROT_TWO] This gives a three fold speedup for: a,b=b,a The other candidate transformations are: [BUILD_TUPLE 3 UNPACK_SEQUENCE 3] --> [ROT_THREE ROT_TWO] [LOAD_CONST 2 BINARY_MULTIPLY] --> [DUP BINARY_ADD] [RETURN_VALUE anyLoadInstruction RETURN_vALUE] --> [RETURN_VALUE] [STORE_FAST n LOAD_FAST n] --> [DUP STORE_FAST n] [UNARY_NOT JUMP_IF_FALSE tgt POP_TOP ... tgt: POP_TOP] --> [JUMP_IF_TRUE tgt POP_TOP ... tgt: POP_TOP] [UNARY_NOT JUMP_IF_TRUE tgt POP_TOP ... tgt: POP_TOP] --> [JUMP_IF_FALSE tgt POP_TOP ... tgt: POP_TOP]

History
Date	User	Action	Args
2022-04-11 14:56:03	admin	set	github: 40007
2004-03-06 13:22:45	rhettinger	create