Issue 10576: Add a progress callback to gcmodule (original) (raw)

Created on 2010-11-29 12:33 by kristjan.jonsson, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (31)

msg122795 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-11-29 12:33

As discussed here: http://mail.python.org/pipermail/python-ideas/2010-November/008813.html:

Adding the ability to register callbacks to be invoked before and after garbage collection runs. This can be used to gather run-time statistics such as timing information and frequency of garbage collection runs, and to perform application-specific cleanup of uncollecatable objects from gc.garbage.

The first patch is the code as currently in use in our codebase at CCP (ported from 2.7). There is only one callback registered and the callback signature is perhaps a bit lame. Also, no error checking. But it is shown here for reference and as a basis for discussion.

msg122840 - (view)

Author: Daniel Stutzbach (stutzbach) (Python committer)

Date: 2010-11-29 18:32

These functions will be very useful for any long-running program. Thank you for the patch.

Would you be willing to write tests and documentation?

Would it make more sense for the callback to take a boolean instead of an integer as the first argument?

msg122844 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-11-29 18:40

Well, as you point out, it would make more sense to have two separate callbacks. Also, PyErr_WriteUnraisable() is better than PyErr_Clear().

Finally, you accidentally recoded the file; it should be kept utf-8, not latin-whatever.

msg122869 - (view)

Author: Jim Jewett (jimjjewett)

Date: 2010-11-29 20:47

I like the idea, but I do quibble with the signature. As nearly as I can tell, you're assuming

(a) Only one callback. I would prefer a sequence of callbacks, to make cooperation easier. (This does mean you would need a callback removal, instead of just setting it to None.)

(b) The callback will be called once before collecting generations, and once after (with the number of objects that weren't collected). Should these be separate callbacks, rather than the same one with a boolean? And why does it need the number of uncollected objects? (This might be a case where Practicality Beats Purity, but it is worth documenting.)

msg122901 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-11-30 13:16

Hi, as I stated, the original patch was simply our original implementation. Here is a new patch. It is simpler:

  1. it exposes a gc.callbacks list where users can register themselves, in the spirit of sys.meta_path
  2. One can have multiple callbacks
  3. Improve error handling
  4. The callback is called with a "phase" argument, currently 0 for start, and 1 for the end.

Let's start bikeshedding the calling signature. I like having a single callback, since multiple callables are a nuisance to manage.

Once we agree, I'll post a patch for the documentation, and unittests.

msg122902 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-11-30 13:24

Let's start bikeshedding the calling signature. I like having a single callback, since multiple callables are a nuisance to manage.

IMO the callback should have a second argument as a dict containing various statistics that we can expand over time. The generation number, for example, should be present.

As for the phase number, if 0 means start and 1 means end, you can't decently add another phase anyway (having 2 mean "somewhere between 0 and 1" would be completely confusing).

PS : please don't use C++-style comments in your patch

msg122903 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-11-30 13:27

You are right, Antoine. How about a string and a dict? the string can be "start" and "stop" and we can add interesting information to the dict as you suggest.

msg122913 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-11-30 16:10

You are right, Antoine. How about a string and a dict? the string can be "start" and "stop" and we can add interesting information to the dict as you suggest.

Looks good to me.

msg122914 - (view)

Author: Daniel Stutzbach (stutzbach) (Python committer)

Date: 2010-11-30 16:32

How about a string and a dict? the string can be "start" and "stop" and we can add interesting information to the dict as you suggest.

I like where this is headed. How about putting the string in the dict, too?

d['phase'] = 'start'

msg123224 - (view)

Author: Robert Lehmann (lehmannro) *

Date: 2010-12-03 10:06

A few issues I'd like to raise:

(1) Multiple callback chains. Is there any code in your existing use case of GC callbacks where you don't check for the phase argument and follow different code paths depending on it? If not, having two callback chains should be fine and takes the burden from the programmer to the implementors. (This is feasible if we only ever have two values for the phase.)

(2) Single collections. Currently, neither PyGC_Collect nor gc.collect() invoke the callbacks (since they do not call collect_generations). Is this an oversight or intentional?

(3) Error checking. What about callbacks which are bound to fail on each and every invocation, ie. because of wrong signatures. Should these be flat-out rejected in some way on registration, automagically removed when first encountered, or are we okay with errors slammed into the user's face every so often because he should REALLY fix them?

(4) Interop. Can this be supported as easily on other VMs? (That's perhaps a good reason for the statistics to be a dict, for GCs providing vastly different amounts of information.)

msg123225 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-12-03 10:13

  1. I'm not sure what you are asking. Does anyone think that it is simpler to register two different callbacks than one? IMHO it complicates the interface and creates so many oppertunities to do things incorrectly.

2)No, it is an oversight, let me verify the code.

3)I don't think we ought to be smart and try to remove callbacks. I don't think that is common practice, the developer will know soon enough if things don't work. Just keep it simple.

4)Interop, I think, is not an issue. the gc module is a C-Python only implementation detail. Every implementation has the freedom to collect garbage in its own way.

I'll post an updated version soon.

msg123253 - (view)

Author: Jim Jewett (jimjjewett)

Date: 2010-12-03 15:29

Does anyone think that it is simpler to register two different callbacks than one?

Moderately, yes.

Functions that actually help with cleanup should normally be run only in one phase; it is just stats-gathering and logging functions that might run both times, and I don't mind registering those twice.

For functions that are run only once (which I personally think is the more normal case), the choices are between

@register_gc
def my_callback(actually_run_flag, mydict):
    if not actually_run_flag:
        return
    ...

vs

@register_gc_before
def my_callback(mydict):
    ...

msg123415 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-12-05 08:45

Here is a third patch. The callback now gets two argument, phase and info. I've added documentation and unittests.

msg124508 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-12-22 13:56

Uh oh. I forgot about this and there now we have passed beta 2. Didn't anyone want to review the patch?

msg124570 - (view)

Author: Lukas Lueg (ebfe)

Date: 2010-12-23 21:26

Why not make the start-callback be able to return a boolean value to the gcmodule that indicates if garbage collection should take place or not.

For example, any value returned from the callback that evaluates to False (like null) will cause the module to evaluate any other callback and possibly collect garbage objects. Any value that evaluates to True (like True) returned from any callback causes all further callbacks to not be called and garbage collection not to take place now.

msg124631 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-12-25 06:52

Well, the idea is good and it did cross my mind. Particularly it could be useful for performance sensitive applications. However it complicates things.

  1. If a GC is rejected, when do we make the next attempt?
  2. If a callback cancels GC, then what about the other callbacks that have already been called? They would need to get some "canceled" call. But this is ok, I suppose.

Since we have already passed the beta 2, I'll see if I can come up with a simple change, particularly if we don't descend into some infinite loop wrt. 1) above.

msg124666 - (view)

Author: Lukas Lueg (ebfe)

Date: 2010-12-26 17:05

Collection may re-occur at any time, there is no promise to the callback code. However, the callback can disable the gc, preventing further collection.

I don't think we need the other callbacks to be informed. As the callbacks are worked down in the order they registered, whoever comes first is served first. Returning True from the callback is mereley a "I dont mind if gc happens now..."

msg124698 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2010-12-27 02:35

  1. what I mean is that if a callback rejects GC, the GC algorithm may find its condition for invoking GC in the first place to be still valid immediately afterwards, so doing a GC will be immediately retried. I have to check, but it could mean that more changes would be required.
  2. Of course callbacks have to know, e.g. those that intend to gather statisctic or measure the time of GC. They have started a timer on the "start" opcode, and expect a "stop" code to follow. They have to get some "canceled" code for their bookkeeping to work.

Then additionally we have the question: Should you be able to cancel a direct gc request (like calling gc.collect()) or just the automatic one?

This then starts to be a much more complicated change, perhaps one that requires a PEP so I don't think we should do all of that in one gulp. Once the callback mechanism is in, there is every oppertunity to extend it.

msg124705 - (view)

Author: Lukas Lueg (ebfe)

Date: 2010-12-27 07:25

Agreed, let's have the simple callback first.

To solve 2) later on, we could have the callback proposed here be the 'execution'-callback. It neither has nor will have the capability to prevent garbage-collection. We can introduce another 'prepare'-callback later which is called when the gc-modules decides that it is time for collection. Callbacks may react with a negative value so execution does not happen and the execution-callbacks are also never called.

msg156450 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-03-20 19:26

Hi! Michael Foord reminded me of this little gem. I'm getting this started again, hopefully for inclusion in 3.3

msg156453 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-03-20 20:05

Hm, actually it wasn't Michael, but Martin. No matter! Here is a proposed patch, as promised without all the bells and whistles, in particular, there is no feature to cancel the garbage collection.

Can we get this into 3.3?

msg156471 - (view)

Author: Jim Jewett (Jim.Jewett) * (Python triager)

Date: 2012-03-21 01:31

gccallback4a.patch is a few changes to gccallback4.patch. Most are just spelling or grammar in comments, but I did modify the test case so that the Uncollectable loop had multiple objects with del methods.

msg157172 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-03-31 11:36

Thanks, Jim. Unless anyone objects, I'll commit this then.

msg157176 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2012-03-31 11:57

Comments:

msg157187 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-03-31 13:27

Thank you Antoine. Your points:

msg157188 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2012-03-31 13:35

The way I read it, you don't fetch it from the module dictionary, though, you just use the static C variable, which shouldn't change when the dict is mutated.

msg157660 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-04-06 11:46

You are right, I was thinking more of pyobject attributes. I'll fix this then.

msg157733 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-04-07 13:00

Here is an updated patch, taking Jim's and Antoine's comments into account.

Jim, I´d like to comment that I think the reason del objects are uncollectable is more subtle than there being no defined order of calling the del functions. More significantly, no python code may be executed during an implicit garbage collection. Now, it is possible that one could clean up cycles containing only one del method during expcicit collections (calling gc.collect()) but it hardly seems worth the effort.

msg157734 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2012-04-07 13:48

Uploaded another review. I also notice you didn't really address my point, since self.visit is still initialized too early. IMO it should be initialized after the first gc.collect() at the beginning of each test (not in setUp()).

msg157788 - (view)

Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer)

Date: 2012-04-08 13:45

A new patch, taking Antoine's review and comments into account.

msg158320 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2012-04-15 11:42

New changeset 88f8ef5785d7 by Kristján Valur Jónsson in branch 'default': Issue #10576: Add a progress callback to gcmodule http://hg.python.org/cpython/rev/88f8ef5785d7

History

Date

User

Action

Args

2022-04-11 14:57:09

admin

set

github: 54785

2012-04-15 11:43:36

kristjan.jonsson

set

status: open -> closed
resolution: fixed

2012-04-15 11:42:22

python-dev

set

nosy: + python-dev
messages: +

2012-04-08 13:45:19

kristjan.jonsson

set

files: + gccallback.patch

messages: +

2012-04-07 13:48:35

pitrou

set

messages: +

2012-04-07 13:00:09

kristjan.jonsson

set

files: + gccallback.patch

messages: +

2012-04-06 11:46:29

kristjan.jonsson

set

messages: +

2012-03-31 13:35:27

pitrou

set

messages: +

2012-03-31 13:27:42

kristjan.jonsson

set

messages: +

2012-03-31 11:57:56

pitrou

set

messages: +

2012-03-31 11:36:55

kristjan.jonsson

set

messages: +

2012-03-21 01:31:45

Jim.Jewett

set

files: + gccallback4a.patch
nosy: + Jim.Jewett
messages: +

2012-03-20 20:05:01

kristjan.jonsson

set

files: + gccallback4.patch

messages: +

2012-03-20 19:26:57

kristjan.jonsson

set

nosy: + michael.foord
messages: +

2010-12-27 07:25:26

ebfe

set

nosy:jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: +

2010-12-27 02:35:59

kristjan.jonsson

set

nosy:jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: +

2010-12-26 17:05:06

ebfe

set

nosy:jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: +

2010-12-25 06:52:05

kristjan.jonsson

set

nosy:jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: +

2010-12-23 21:26:22

ebfe

set

nosy: + ebfe
messages: +

2010-12-22 13:56:14

kristjan.jonsson

set

nosy:jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, asvetlov, Yury.Selivanov
messages: +

2010-12-05 14:09:46

pitrou

set

stage: patch review
versions: + Python 3.3, - Python 3.2

2010-12-05 08:45:03

kristjan.jonsson

set

files: + gccallback3.patch

messages: +

2010-12-03 20:01:02

Yury.Selivanov

set

nosy: + Yury.Selivanov

2010-12-03 15:29:11

jimjjewett

set

messages: +

2010-12-03 10:13:52

kristjan.jonsson

set

messages: +

2010-12-03 10:06:58

lehmannro

set

nosy: + lehmannro
messages: +

2010-11-30 16:32:54

stutzbach

set

messages: +

2010-11-30 16:10:46

pitrou

set

messages: +

2010-11-30 13:27:40

kristjan.jonsson

set

messages: +

2010-11-30 13:24:19

pitrou

set

messages: +

2010-11-30 13:16:22

kristjan.jonsson

set

files: + gccallback2.patch

messages: +

2010-11-29 20:47:00

jimjjewett

set

nosy: + jimjjewett
messages: +

2010-11-29 18:40:19

pitrou

set

nosy: + pitrou
messages: +

2010-11-29 18:32:48

stutzbach

set

nosy: + stutzbach
messages: +

2010-11-29 14:06:36

asvetlov

set

nosy: + asvetlov

2010-11-29 12:33:06

kristjan.jonsson

create