[Python-Dev] Thread-safe file objects, the return (original) (raw)
Guido van Rossum guido at python.org
Wed Apr 2 02:09:24 CEST 2008
- Previous message: [Python-Dev] xmlrpclib and dates before 1900
- Next message: [Python-Dev] Thread-safe file objects, the return
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This is not something that keeps me awake at night, but I am aware of it. Your solution (a counter) seems fine except I think perhaps the close() call should not raise IOError -- instead, it should set a flag so that the thread that makes the counter go to zero can close the thread (after all the file got closed while it was being used).
There are of course other concurrency issues besides close -- what if two threads both try to do I/O on the file? What will the C stdio library do in that case? Are stdio files thread-safe at the C level? So (classically contradicting myself while I think the problem over more) perhaps any I/O operation should be disallowed while the file is in use by another thread?
--Guido
On Mon, Mar 31, 2008 at 1:09 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
Hello, It seems this subject has had quite a bit of history. Tim Peters demonstrated the problem in 2003 in this message: http://mail.python.org/pipermail/python-dev/2003-June/036537.html In short, Python file objects release the GIL before calling any C stdlib function on their embedded FILE pointer. Unfortunately, if another thread calls fclose on the FILE pointer concurrently, the contents pointed to can become garbage and the interpreter process crashes. Just by using the same file object in two threads running pure Python code, you can crash the interpreter. (another, easier-to-solve problem is that the FILE pointer stored in the file object could become NULL at the point it is used by another thread. If that was the only problem you could just store the FILE pointer in a local variable before releasing the GIL et voilĂ ) There was some discussion at the time about the possible resolution. I've tried to fix the problem, and I've come to what I think is a satisfying solution, which I can sum up as the following bullet points: * Each file object gets a dedicated counter, which is incremented before the bject releases the GIL and decremented after the GIL is taken again; thus this counter keeps track of how many running "unlocked" sections of code are using that particular file object. (please note the counter doesn't need its own lock, since it is only modified in GIL-protected sections) * In the close() method, if the aforementioned counter is greater than 0, we refuse to call fclose and instead raise an IOError. This may seem like a worrying semantic change, but I don't think it is, for the following reasons: 1) if we closed the FILE pointer anyway, the interpreter would likely crash because another thread would be using garbage data (that's what we are trying to fix after all!) 2) if close() raises an IOError, it can be called again later, or at worse fclose will be called when the file object is garbage collected 3) close() can already raise an IOError if fclose fails for whatever reason (although for sure it's probably very rare) 4) it doesn't seem wrong to notify the programmer that his code is very unsafe The patch is attached at http://bugs.python.org/issue815646 . It addresses (or at least I hope it does) all potential problems with pure Python code, threads, and the file object. It doesn't try to fix C extensions using the PyFileAsFile API and doing their own dirty things with the FILE pointer. It could be a second step if the approach is accepted, but as noted in the 2003 discussions it would probably involve a new API. Whether we want to introduce such an API in Python 2.x while Python 3.0 has a different IO model anyway is left open to discussion :) Regards Antoine.
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] xmlrpclib and dates before 1900
- Next message: [Python-Dev] Thread-safe file objects, the return
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]