Issue 3710: Reference leak in thread._local (original) (raw)

Created on 2008-08-28 00:09 by tamino, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
threadmodule.c.diff	tamino,2008-08-28 00:08	Possible patch?

Messages (9)
msg72052 - (view)	Author: Ben Cottrell (tamino)	Date: 2008-08-28 00:08
This is a copy of a message I sent to the python-dev mailing list; it was suggested in a reply that I file a bug for this issue. I'm filing it against Python 2.5 because that's where I noticed it, but it doesn't look like this code has changed much in trunk. I noticed that thread._local can leak references if objects are being stored inside the thread._local object whose destructors might release the GIL. The way this happens is that in Modules/threadmodule.c, in the _ldict() function, it does things like this: Py_CLEAR(self->dict); Py_INCREF(ldict); self->dict = ldict; If the Py_CLEAR ends up taking away the last reference to an object contained in the dict, and a thread context switch occurs during that object's deallocation, then self->dict might not be NULL on return from Py_CLEAR; another thread might have run, accessed something in the same thread._local object, and caused self->dict to be set to something else (and Py_INCREF'ed). So when we blindly do the assignment into self->dict, we may be overwriting a valid reference, and not properly Py_DECREFing it. The recent change (revision 64601 to threadmodule.c) did not address context switches during the Py_CLEAR call; only context switches during tp_init. The attached patch (against trunk) is my first attempt at fixing this. It detects if self->dict has been set to something else after the Py_CLEAR, and retries the Py_CLEAR (because _ldict really only cares about installing the proper value of self->dict for the currently running thread). However, I am still uncomfortable about the fact that local_getattro and local_setattro discard the value returned from _ldict, and instead hand off control to the PyObject_Generic layer and trust that by the time self->dict is actually used, it still has the correct value for the current thread. Would it be better to, say, inline a copy of the PyObject_Generic* functions inside local_getattro/local_setattro, and force the operations to be done on the actual dict returned by _ldict()?
msg72053 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-08-28 00:14
Hmm, rather than the while loop in your proposal, the proper idiom would be: PyObject *olddict = self->dict; Py_INCREF(ldict); self->dict = ldict; Py_XDECREF(olddict);
msg72054 - (view)	Author: Ben Cottrell (tamino)	Date: 2008-08-28 00:18
But then if there is a context switch during the last Py_XDECREF, then it could be the case that self->dict is not set properly on return from _ldict(). Functions like local_setattro() use _ldict() more for its side effect (setting self->dict) than for its return value. It's possible that this should be changed; see the last paragraph in my original report.
msg72055 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-08-28 00:37
> But then if there is a context switch during the last Py_XDECREF, then > it could be the case that self->dict is not set properly on return from > _ldict(). Well, C code is effectively locked in a single thread until the GIL is released. It means that a piece of C code which doesn't release the GIL behaves as a critical section. So the only thing to consider, IMO, is whether the GIL can still be released between "if (ldict == NULL)" and "self->dict = ldict".
msg72056 - (view)	Author: Ben Cottrell (tamino)	Date: 2008-08-28 00:51
The specific thing that was happening for me is that an _sqlite3.Connection object was in the dictionary. In Modules/_sqlite/connection.c, in pysqlite_connection_dealloc(), it uses Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. So it's the call to Py_DECREF that's interesting from my point of view. I believe that if _ldict() sets self->dict to what it should be for the current thread, and then calls Py_DECREF on the old value, and then returns, then _ldict() is no longer able to guarantee that self->dict will be set to the right thing for the current thread after it returns (because if the Py_DECREF ended up deallocating something like an sqlite3 connection, then it'd have released and reacquired the GIL). Hence, in the patch I attached, the assignment into self->dict is kept as the last thing that happens before _ldict() returns, and I believe this means _ldict() can still make that guarantee. Of course, I'd be all for changing local_getattro/local_setattro to not need _ldict to make that guarantee! _ldict always returns the correct pointer; it would be nice to make use of that somehow.
msg72061 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2008-08-28 04:44
fyi - This bug and #1868 (http://bugs.python.org/issue1868) seem related.
msg114563 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-21 20:49
> Of course, I'd be all for changing local_getattro/local_setattro > to not need _ldict to make that guarantee! _ldict always returns > the correct pointer; it would be nice to make use of that somehow. Indeed. Therefore, perhaps we could break the problem in two: - fix the reference leak by using the correct assignment idiom outlined earlier - fix local_getattro / local_setattro to always use the proper dict, regardless of thread switches
msg114564 - (view)	Author: Ben Cottrell (tamino)	Date: 2010-08-21 21:06
The latest patch over in #1868 is working fine for my company in production, and solves #3710 as well. I think the only thing left to do on that patch is to make it special case "__dict__".
msg114575 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-08-21 22:12
Ok, then let's continue in #1868.

History
Date	User	Action	Args
2022-04-11 14:56:38	admin	set	github: 47960
2011-02-14 06:58:55	ysj.ray	set	nosy: + ysj.ray
2010-08-21 22:12:46	pitrou	set	status: open -> closedresolution: duplicatesuperseder: threading.local doesn't free attrs when assigning thread exitsmessages: +
2010-08-21 21:06:26	tamino	set	messages: +
2010-08-21 20:49:23	pitrou	set	messages: +
2010-06-09 21:29:09	terry.reedy	set	versions: - Python 2.5
2010-05-11 20:39:48	terry.reedy	set	versions: + Python 3.1, Python 2.7, Python 3.2, - Python 3.0
2008-08-28 04:44:08	gregory.p.smith	set	messages: +
2008-08-28 04:36:06	gregory.p.smith	set	nosy: + gregory.p.smith
2008-08-28 00:51:20	tamino	set	messages: +
2008-08-28 00:37:59	pitrou	set	messages: +
2008-08-28 00:19:00	tamino	set	messages: +
2008-08-28 00:14:14	pitrou	set	priority: highnosy: + pitroumessages: + versions: + Python 2.6, Python 3.0
2008-08-28 00:09:03	tamino	create