msg45295 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-01-13 16:49 |
In ceval.c we find /* XXX Perhaps we should create a specialized PyFrame_New() that doesn't take locals, but does take builtins without sanity checking them. */ This patch takes that idea rather further than you might have expected... it creates a "light" subtype of frame that assumes certain things about the frame, gives this type its own free list (so it can assume more about objects on the freelist) and converts light frames into "heavy" frames when assumptions stop being true. Good for a ~5% improvement on "./python -s 'def f(): pass' 'f()'"; a bit less on pystone. It also conflicts slightly with my function reorg patch -- apply that first, apply this, ignore the reject and edit func_caller_nofrees in funcobject.c to call PyFrame_NewLight. All three patches I just submitted together get ~6% on pystone. |
|
|
msg45296 - (view) |
Author: Jeremy Hylton (jhylton)  |
Date: 2004-01-13 18:20 |
Logged In: YES user_id=31392 I don't see any files attached. |
|
|
msg45297 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-01-13 18:23 |
Logged In: YES user_id=6656 sigh |
|
|
msg45298 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-01-14 17:35 |
Logged In: YES user_id=4771 (Side note first: I'm not sure 'builtins = back->f_builtins' is right.) Is the whole subclassing complexity worth the effort, given that the invariants of light frames only seem to be that four specific fields are null? Changing the type of an object under Python code's feet is calling for troubles. Moreover it is bound to break code that expect 'type(frame)==FrameType', even if such code can be considered bad style. Moreover it requires a number of hacks here and there -- e.g. you turn a light frame into a "heavy" frame when f_trace is set; is it on purpose that you don't do it when f_locals is set? I cannot seem to get reliable performance results on my machine, but maybe you want to compare with the attached patch which speeds up the regular PyFrame_New by putting stronger invariants on all the frames in the free_list. |
|
|
msg45299 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-01-15 11:02 |
Logged In: YES user_id=6656 I'm fairly sure this made a difference on my iBook; haven't tried on x86. It's possible that the correct response to this patch is to add "... nah, not worth it" to the XXX comment in ceval.c... |
|
|
msg45300 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-01-27 17:03 |
Logged In: YES user_id=4771 Here is yet another try, which seems to perform better on my PentiumIII. I get the following speed improvements for this patch alone, for a loop calling an empty function: zombie-frames.diff: 11.4% (PyStone 3.8%) scary-frame-hacks.diff: 6.4% (PyStone 0.85%) The idea is to get rid of the free_list and instead store the most recently finished ("zombie") frame in an internal field of the code object. This saves half of the frame creation overhead because half of the fields are already correct when the frame is reused, e.g. f_code, f_nlocals, f_stacksize, f_valuestack... (you might need to cvs up frameobject.c before you can apply the patch) |
|
|
msg45301 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-02-02 11:20 |
Logged In: YES user_id=6656 Did I mention this idea to you or did you come up with it independently? I forget... I'll try to time stuff on my iBook tomorrow. |
|
|
msg45302 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-02-02 13:04 |
Logged In: YES user_id=4771 I guess the idea was just in the air, after your published attempts. Ideally I'd have liked to have the cached frame depend on the globals as well as the code object itself; I considered moving the cache field to function objects. This way you also save the f_globals and f_builtins initialization. There were problems but maybe we should try harder. |
|
|
msg45303 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-03-01 12:20 |
Logged In: YES user_id=80475 Armin's second patch gives gives the expected speedups on a Pentium3 running WinME, and the test suite runs without exception. I recommend accepting and applying this patch as is. Further improvements can be considered separately. |
|
|
msg45304 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-03-01 12:23 |
Logged In: YES user_id=6656 It slows recursive functions down a noticeable amount (did this get noted anywhere? Maybe Armin & I just talked about it on IRC), so that should be considered before this patch is applied. But I think it's probably worth it, FWIW. |
|
|
msg45305 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-03-01 13:25 |
Logged In: YES user_id=80475 The effect on recursive functions could be mitigated by restoring the freelist and falling back to it when code->co_zombieframe == NULL. I don't know if that is worth it. The current patch is a code simplication as well as an optimization. Adding back the freelist, adds a lot of clutter. Python is not especially friendly to recursive functions anyway. |
|
|
msg45306 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-03-01 13:59 |
Logged In: YES user_id=4771 On a small recursive example it slows down from 2.64s to 3.26s. This is a serious difference (20%). Is it bad enough to keep the freelist ? |
|
|
msg45307 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-03-01 15:50 |
Logged In: YES user_id=80475 I think that's a question for python-dev. |
|
|
msg45308 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2006-02-20 11:44 |
Logged In: YES user_id=849994 Can this be reviewed for 2.5? The relevant discussion on python-dev is at http://mail.python.org/pipermail/python-dev/2004-March/042871.html. |
|
|
msg45309 - (view) |
Author: Richard Jones (richard) *  |
Date: 2006-05-22 17:25 |
Logged In: YES user_id=6405 Patch modified and applied to python2.5. Mods: 1. updated to python2.5 2. reinstated use of free list See the "rjones-funccall" branch in SVN. |
|
|
msg45310 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2006-05-23 11:16 |
Logged In: YES user_id=31435 Closed as Accepted. While re-adding the free list removed the code simplification benefit, the measurable x-platform speedup is well worth getting. |
|
|