[Python-Dev] Store x Load x --> DupStore (original) (raw)
Phillip J. Eby pje at telecommunity.com
Sun Feb 20 21:22:00 CET 2005
- Previous message: [Python-Dev] Store x Load x --> DupStore
- Next message: [Python-Dev] Store x Load x --> DupStore
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 06:38 PM 2/20/05 +0000, Michael Hudson wrote:
>> It folds the two steps into a new opcode. In the case of >> storename/loadname, it saves one three byte instruction, a trip around >> the eval-loop, two stack mutations, a incref/decref pair, a dictionary >> lookup, and an error check (for the lookup). While it acts like a dup >> followed by a store, it is implemented more simply as a store that >> doesn't pop the stack. The transformation is broadly applicable and >> occurs thousands of times in the standard library and test suite.
I'm still a little curious as to what code creates such opcodes...
A simple STORE+LOAD case:
dis.dis(compile("x=1; y=x*2","?","exec")) 1 0 LOAD_CONST 0 (1) 3 STORE_NAME 0 (x) 6 LOAD_NAME 0 (x) 9 LOAD_CONST 1 (2) 12 BINARY_MULTIPLY 13 STORE_NAME 1 (y) 16 LOAD_CONST 2 (None) 19 RETURN_VALUE
And a simple DUP+STORE case:
dis.dis(compile("x=y=1","?","exec")) 1 0 LOAD_CONST 0 (1) 3 DUP_TOP 4 STORE_NAME 0 (x) 7 STORE_NAME 1 (y) 10 LOAD_CONST 1 (None) 13 RETURN_VALUE
Of course, I'm not sure how commonly this sort of code occurs in places where it makes a difference to anything. Function call overhead continues to be Python's most damaging performance issue, because it makes it expensive to use abstraction.
Here's a thought. Suppose we split frames into an "object" part and a "struct" part, with the object part being just a pointer to the struct part, and a flag indicating whether the struct part is stack-allocated or malloc'ed. This would let us stack-allocate the bulk of the frame structure, but still have a frame "object" to pass around. On exit from the C routine that stack-allocated the frame struct, we check to see if the frame object has a refcount>1, and if so, malloc a permanent home for the frame struct and update the frame object's struct pointer and flag.
In this way, frame allocation overhead could be reduced to the cost of an alloca, or just incorporated into the stack frame setup of the C routine itself, allowing the entire struct to be treated as "local variables" from a C perspective (which might benefit performance on architectures that reserve a register for local variable access).
Of course, this would slow down exception handling and other scenarios that result in extra references to a frame object, but if the OS malloc is the slow part of frame allocation (frame objects are too large for pymalloc), then perhaps it would be a net win. On the other hand, this approach would definitely use more stack space per calling level.
- Previous message: [Python-Dev] Store x Load x --> DupStore
- Next message: [Python-Dev] Store x Load x --> DupStore
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]