Issue 815646: thread unsafe file objects cause crash (original) (raw)

Issue815646

process

Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: anthonybaxter, christian.heimes, georg.brandl, gregory.p.smith, jafo, janixia, jhylton, jyasskin, kevinwatters, loewis, ned.deily, pebolle, pitrou, tim.peters, trent
Priority: critical Keywords: patch

Created on 2003-10-01 06:22 by janixia, last changed 2022-04-10 16:11 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
filecrash.py anthonybaxter,2003-10-01 07:03 filecrash.py
filethread1.patch pitrou,2008-03-28 18:09
filethread2.patch pitrou,2008-03-29 00:30
filethread3.patch pitrou,2008-03-29 19:39
filethread4.patch pitrou,2008-03-29 20:39
filethread4-gps01.patch gregory.p.smith,2008-04-06 06:25
python-2.7-Doc-gil.patch pebolle,2010-10-14 20:48 minor documentation fixes
Messages (28)
msg18469 - (view) Author: Jan Olderdissen (janixia) Date: 2003-10-01 06:22
Note: This may be a dupe or a generalization of 595601. Running below code snippet on 2.3.1 release and debug build on Windows 2000/XP a few times will inevitably lead to a crash. 2.2.2 also exhibits this behavior. The crashing code: ------------ import thread f=open("tmp1", "w") def worker(): global f while 1: f.close() f=open("tmp1", "w") f.seek (0, 0) thread.start_new_thread(worker, ()) thread.start_new_thread(worker, ()) while 1: pass ------------- The issue appears to be this (and similar) code sections from fileobject.c: Py_BEGIN_ALLOW_THREADS errno = 0; ret = _portable_fseek(f->f_fp, offset, whence); Py_END_ALLOW_THREADS Note that due to Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS, f->f_fp can be set to NULL by file_close prior to entering _portable_fseek. Similar crashes can be observed with read, write and close itself when they are mixed with a concurrent close. Obviously, the offending python code is buggy and a lock should be used to prevent concurrent access to f. However, it seems unreasonable for Python to crash because of it. A mutex preventing concurrent access to the file object seems like a reasonable way to fix this.
msg18470 - (view) Author: Anthony Baxter (anthonybaxter) (Python triager) Date: 2003-10-01 07:03
Logged In: YES user_id=29957 Attaching the failing code as an attachment, because SF mangles source code.
msg18471 - (view) Author: Jan Olderdissen (janixia) Date: 2003-10-01 07:08
Logged In: YES user_id=877914 My apologies for the mangled source. I suppose there isn't a way for me to remedy the situation.
msg18472 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-10-04 08:01
Logged In: YES user_id=21627 janixia, don't worry about the formatting - this is partially SF's fault, too. Would you be interested in developing and testing a patch? I think it would be sufficient to move the f->f_fp access out of the GIL releasage.
msg18473 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2003-10-06 03:48
Logged In: YES user_id=31392 Patch 595601 is an attempt to address this problem, but it's incomplete. The file object API allows an extension to extract to FILE * and squirrel it away. That's clearly unsafe, because it can't participate in a locking scheme without re-writing extensions. Shane Hathaway proposed another solution here: http://mail.python.org/pipermail/python-dev/2003-June/036543.html The problem in this case is that we cause the call to close() to raise an exception. I'd prefer to see the exception raised elsewhere, because close() seldom fails and is often closed from routines that are cleaning up at the end. On the other hand, this solution would be easier to implementation, so I'm at least +0 on it. Let's do one or the other.
msg18474 - (view) Author: Jan Olderdissen (janixia) Date: 2003-10-06 23:02
Logged In: YES user_id=877914 I'm inclined to go with Shane's suggested solution of reference counting when file access is in progress. It requires no synchronisation (increment, decrement and check are outside global lock release) and should have the smallest performance impact. I don't think the FILE * extraction problem can be solved at all. Once the horse it out of the barn... However, for the "standard" case Shane's suggestion provides a neat and clean solution for the problem. If the community can agree on this solution, I could be talked into implementing it.
msg55960 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2007-09-17 10:15
If I read the 2003 python-dev thready correctly, there isn't a solution to this. Does this need to go back to python-dev, or do we just call it "wont fix"? Or...?
msg57338 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-09 22:14
I'm still able to reproduce the bug in Python 2.5 (svn) and 2.6 (trunk). import thread f=open("tmp1", "w") def worker(): global f while 1: f.close() f=open("tmp1", "w") f.seek(0,0) thread.start_new_thread(worker, ()) thread.start_new_thread(worker, ()) Unhandled exception in thread started by <function worker at 0xb7d01aac> Traceback (most recent call last): *** glibc detected *** ./python: malloc(): memory corruption: 0xb7efc008 *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0xb7dbe636] /lib/tls/i686/cmov/libc.so.6(__libc_malloc+0x90)[0xb7dbffc0] /lib/tls/i686/cmov/libc.so.6[0xb7dad03f] /lib/tls/i686/cmov/libc.so.6(fopen64+0x2c)[0xb7daf61c] ./python(PyTraceBack_Print+0x1a4)[0x80ef0f4] ./python(PyErr_Display+0x76)[0x80e73a6] ./python[0x80ed80d] ./python(PyObject_Call+0x27)[0x805c927] ./python(PyEval_CallObjectWithKeywords+0x6c)[0x80c151c] ./python(PyErr_PrintEx+0xbe)[0x80e7e9e] ./python[0x80f37b1] /lib/tls/i686/cmov/libpthread.so.0[0xb7ed146b] /lib/tls/i686/cmov/libc.so.6(clone+0x5e)[0xb7e276de] ======= Memory map: ======== 08048000-0813d000 r-xp 00000000 fe:01 10586072 /home/heimes/dev/python/release25-maint/python 0813d000-08162000 rw-p 000f4000 fe:01 10586072 /home/heimes/dev/python/release25-maint/python 08162000-081fe000 rw-p 08162000 00:00 0 [heap] b6a00000-b6a21000 rw-p b6a00000 00:00 0 b6a21000-b6b00000 ---p b6a21000 00:00 0 b6bc1000-b6bc2000 ---p b6bc1000 00:00 0 b6bc2000-b73c2000 rw-p b6bc2000 00:00 0 b73c2000-b73c3000 ---p b73c2000 00:00 0 b73c3000-b7bc3000 rw-p b73c3000 00:00 0 b7bc3000-b7bff000 r-xp 00000000 08:05 325941 /lib/libncurses.so.5.6 b7bff000-b7c07000 rw-p 0003b000 08:05 325941 /lib/libncurses.so.5.6 b7c07000-b7c4e000 r-xp 00000000 08:05 325837 /lib/libncursesw.so.5.6 b7c4e000-b7c56000 rw-p 00046000 08:05 325837 /lib/libncursesw.so.5.6 b7c56000-b7c82000 r-xp 00000000 08:05 325955 /lib/libreadline.so.5.2 b7c82000-b7c86000 rw-p 0002c000 08:05 325955 /lib/libreadline.so.5.2 b7c86000-b7c87000 rw-p b7c86000 00:00 0 b7c87000-b7c8a000 r-xp 00000000 fe:01 10716611 /home/heimes/dev/python/release25-maint/build/lib.linux-i686-2.5/readline.so b7c8a000-b7c8b000 rw-p 00003000 fe:01 10716611 /home/heimes/dev/python/release25-maint/build/lib.linux-i686-2.5/readline.so b7c8b000-b7c92000 r--s 00000000 08:05 557857 /usr/lib/gconv/gconv-modules.cache b7c92000-b7cd1000 r--p 00000000 08:05 570306 /usr/lib/locale/de_DE.utf8/LC_CTYPE b7cd1000-b7d54000 rw-p b7cd1000 00:00 0 b7d54000-b7e98000 r-xp 00000000 08:05 326311 /lib/tls/i686/cmov/libc-2.6.1.so b7e98000-b7e99000 r--p 00143000 08:05 326311 /lib/tls/i686/cmov/libc-2.6.1.so b7e99000-b7e9b000 rw-p 00144000 08:05 326311 /lib/tls/i686/cmov/libc-2.6.1.so b7e9b000-b7e9e000 rw-p b7e9b000 00:00 0 b7e9e000-b7ec1000 r-xp 00000000 08:05 326315 /lib/tls/i686/cmov/libm-2.6.1.so b7ec1000-b7ec3000 rw-p 00023000 08:05 326315 /lib/tls/i686/cmov/libm-2.6.1.so b7ec3000-b7ec5000 r-xp 00000000 08:05 326330 /lib/tls/i686/cmov/libutil-2.6.1.so b7ec5000-b7ec7000 rw-p 00001000 08:05 326330 /lib/tls/i686/cmov/libutil-2.6.1.so b7ec7000-b7ec8000 rw-p b7ec7000 00:00 0 b7ec8000-b7eca000 r-xp 00000000 08:05 326314 /lib/tls/i686/cmov/libdl-2.6.1.so b7eca000-b7ecc000 rw-p 00001000 08:05 326314 /lib/tls/i686/cmov/libdl-2.6.1.so b7ecc000-b7ee0000 r-xp 00000000 08:05 326325 /lib/tls/i686/cmov/libpthread-2.6.1.so b7ee0000-b7ee2000 rw-p 00013000 08:05 326325 /lib/tls/i686/cmov/libpthread-2.6.1.so b7ee2000-b7ee4000 rw-p b7ee2000 00:00 0 b7ef1000-b7efb000 r-xp 00000000 08:05 325908 /lib/libgcc_s.so.1 b7efb000-b7efc000 rw-p 0000a000 08:05 325908 /lib/libgcc_s.so.1 b7efc000-b7f01000 rw-p b7efc000 00:00 0 b7f01000-b7f1b000 r-xp 00000000 08:05 326530 /lib/ld-2.6.1.so b7f1b000-b7f1d000 rw-p 00019000 08:05 326530 /lib/ld-2.6.1.so bfcd2000-bfcee000 rw-p bfcd2000 00:00 0 [stack] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] However Python 3.0 doesn't crash: Unhandled exception in thread started by <function worker at 0x840860c> Traceback (most recent call last): File "", line 6, in worker File "/home/heimes/dev/python/py3k/Lib/io.py", line 1234, in seek self.buffer.seek(pos) File "/home/heimes/dev/python/py3k/Lib/io.py", line 877, in seek return self.raw.seek(pos, whence) IOError: [Errno 9] Bad file descriptor
msg64579 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-27 11:00
A small addition to Christian's code snippet allows me to reproduce the problem as well: import thread f=open("tmp1", "w") def worker(): global f while 1: f.close() f = open("tmp1", "w") f.seek(0,0) thread.start_new_thread(worker, ()) thread.start_new_thread(worker, ()) while 1: pass
msg64642 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-28 18:09
This is a preliminary patch which shows how things might be done better. It only addresses close(), seek() and dealloc right now. However, as mentioned in test_close_open_seek, if I raise the number of workers, I get crashes (while test_close_open is fine). Perhaps fseek() in the glibc is thread unsafe when operating on the same file descriptor?
msg64643 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-28 18:20
Another approach would be to add a dedicated lock for each PyFileObject. This sounds a bit bad performance-wise but after all the GIL itself is a lock, and we release and re-acquire it on each file operation, so why not?
msg64645 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-28 18:47
On the other hand, surprisingly enough, the flockfile/funlockfile manpage tells me that: The stdio functions are thread-safe. This is achieved by assigning to each FILE object a lockcount and (if the lockcount is nonzero) an owning thread. For each library call, these functions wait until the FILE object is no longer locked by a different thread, then lock it, do the requested I/O, and unlock the object again. This leaves me wondering what is happening in the above-mentioned test.
msg64654 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-28 22:45
Why hadn't I read #595601 in detail, it has an explanation: [quoting Jeremy Hylton] The universal newline code is squirrels the FILE * in a local variable, which is worse. If it happens that another thread closes the file, at best the local points to a closed FILE *. But that memory could get recycled and then there's no way to know what it points to. [/quoting] Even with careful coding, there's a small window between releasing the GIL on our side, and acquiring the FILE-specific lock in the glibc, during which the fclose() function can be invoked and release the FILE just before we invoke another function (e.g. fseek()) on it.
msg64661 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-29 00:30
Some good news: I've found a way to continue with my approach and make it working. The close() method now raises an appropriate IOError when another method is being called from another thread. Attaching a patch, which protects a bunch of file object methods (together with tests). Now the bad news: the protection logic in fileobject.c is, hmm, a bit contrived (and I'm not even sure it's 100% correct). If someone wants to read it and put his veto before I go further...
msg64667 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-03-29 01:27
Closed #595601 as a duplicate.
msg64719 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-29 19:39
Actually, my approach was not 100% correct, it failed in some rare cases. I've come to the conclusion that using an unlock count on the PyFileObject is the correct way to proceed. I'm now attaching a working and complete patch which protects all methods of the PyFileObject. The original test suite runs fine, as well as the added test cases and Tim Peters' crasher here: http://mail.python.org/pipermail/python-dev/2003-June/036537.html To sum up the changes brought by this patch: - no supplementary locking - but each time we release the GIL to do an operation on a FILE, we increase a dedicated counter on the PyFileObject - when close()ing a PyFileObject, if the aforementioned counter is non-zero, we throw an IOError rather than risking calling fclose(). By construction this cannot happen in the PyFileObject destructor, but if ever it happens (for example if a C extension decides to put its hands in the PyFileObject struct), we throw a SystemError instead.
msg64724 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-29 20:39
Ah, I had forgotten to protect the print statement as well. Here is a new patch :-)
msg65017 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-06 03:07
I'm reviewing this patch now and plan to commit it after some testing. A couple comments: I'd rename your sts variables to status. Also FYI: Your use of volatile on the int unlocked_count member of PyFileObject does not do what you think it does and isn't needed here anyways. Access to the variable is always protected by the GIL unlocking and locking of which should cause an implicit memory barrier guaranteeing that all other CPUs in the system will see the same value stored in the structure in memory. The C volatile keyword on the other hand does not guarantee this. volatile is useful for memory mapped IO but it makes no guarantees about cache coherent access between multiple CPUs. (the atomic types in the recent C++ standards are meant for that) Both of the above are trivial changes, no need for another patch.
msg65022 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-06 06:25
I've attached my patch that I want to commit. The main change from filethread4 is some cleanup in file_test to make it run a lot faster and add verbose mode output to indicate how well it is actually testing the problem (counting the times that close raises IOError). One concern holding up my commit: Will this test pass on windows? It is opening and closing the same file in 'w+' mode from multiple threads of the same process at once. Can someone with a windows dev environment please apply this patch and test it. If it dislikes the above file behavior, can you propose a fix for it (set windows file non-exclusive flags or whatever you're supposed to do... the worse alternative would be to use a new filename on each open but that could cause a nightmare of thousands of new files being created by the test which then have to be cleaned up)? thanks, -gps
msg65029 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2008-04-06 10:40
Patched and tested on one of my buildbots, test_file passes without error with your latest Patch Greg.
msg65031 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-04-06 11:09
Ok Greg, I wasn't sure locking/unlocking the GIL would create a memory barrier but it sounds logical after all. Your patch looks fine to me.
msg65056 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-06 23:14
Committed to trunk in revision 62195. Misc/NEWS entry added. I also added two new C API functions: PyFile_IncUseCount and PyFile_DecUseCount along with documentation. They should be used by any C extension code that uses PyFile_AsFile and wants to make use of the returned FILE* with the GIL released. The net effect of not using them is no change from the existing behavior (crashes would be possible) for those C extension modules.
msg71388 - (view) Author: Kevin Watters (kevinwatters) Date: 2008-08-18 21:48
I know this is long closed, but no one on the nosy list happens to have this fix backported to 2.5, do they? :) If not, I'll attach one here eventually...
msg71389 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-18 21:52
> I know this is long closed, but no one on the nosy list happens to have > this fix backported to 2.5, do they? :) I think that at the time no one was sure the patch was 100% harmless. It also subtly changes the behaviour of close() when called while another IO operation is in progress in another thread, which is arguably a bug fix but can still raise an exception it wouldn't have raised in 2.5. So all in all I'm not sure this should be backported, although it would probably be an improvement in most cases. I'll let someone else take the decision.
msg71474 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-08-19 19:57
The fix can not be committed to Python 2.5 because it breaks compatibility by adding another field to the PyFileObject struct and adding two new C API functions.
msg118715 - (view) Author: Paul Bolle (pebolle) Date: 2010-10-14 20:48
0) I ran into some (small) problems with the documentation added by revision 62195. It seems more efficient to reuse this issue to report these. Feel free to ask me to open another issue if that's not appreciated. 1) A small patch that addresses two problems with the current (ie, 2.7) documentation should be attached: - link three occurrences of "GIL" to the GIL entry in the glossary; and - add some example code to clarify the usage of PyFile_IncUseCount() andPyFile_DecUseCount(). 2) That patch also adds a link to the "Thread State and the Global Interpreter Lock" section to the GIL entry in the Documentation index. That is a separate problem. But fixing that minor problem could also be of help to people (like me) that need to better understand the GIL aspects of those two functions.
msg118723 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-10-14 22:09
Paul, please open a new issue and attach your patch(s) there; it should address issues in 2.7 and 3.2 only.
msg118744 - (view) Author: Paul Bolle (pebolle) Date: 2010-10-15 06:54
> please open a new issue and attach your patch(s) there Issue 10111 now tracks the documentation problems.
History
Date User Action Args
2022-04-10 16:11:30 admin set github: 39342
2010-10-15 06:54:52 pebolle set messages: +
2010-10-14 22:09:45 ned.deily set nosy: + ned.deilymessages: +
2010-10-14 20:48:58 pebolle set files: + python-2.7-Doc-gil.patchnosy: + pebollemessages: +
2008-08-19 19:57:27 gregory.p.smith set messages: +
2008-08-18 21:52:47 pitrou set messages: +
2008-08-18 21:48:43 kevinwatters set nosy: + kevinwattersmessages: +
2008-04-06 23:14:16 gregory.p.smith set status: open -> closedresolution: acceptedmessages: + versions: - Python 3.0
2008-04-06 11:09:26 pitrou set messages: +
2008-04-06 10:40:51 trent set nosy: + trentmessages: +
2008-04-06 06:25:10 gregory.p.smith set files: + filethread4-gps01.patchmessages: +
2008-04-06 03:07:15 gregory.p.smith set assignee: tim.peters -> gregory.p.smithversions: + Python 3.0messages: + nosy: + gregory.p.smith
2008-03-31 20:48:34 jyasskin set nosy: + jyasskin
2008-03-29 20:39:27 pitrou set files: + filethread4.patchmessages: +
2008-03-29 19:39:33 pitrou set files: + filethread3.patchmessages: +
2008-03-29 01:27:31 georg.brandl link issue595601 superseder
2008-03-29 01:27:19 georg.brandl set nosy: + georg.brandlmessages: +
2008-03-29 00:30:35 pitrou set files: + filethread2.patchmessages: +
2008-03-28 22:45:07 pitrou set messages: +
2008-03-28 18:47:37 pitrou set messages: +
2008-03-28 18:20:44 pitrou set messages: +
2008-03-28 18:09:52 pitrou set files: + filethread1.patchkeywords: + patchmessages: +
2008-03-27 11:00:07 pitrou set nosy: + pitroumessages: +
2008-01-04 00:06:38 christian.heimes set versions: - Python 2.3
2007-11-09 22:14:34 christian.heimes set nosy: + christian.heimesmessages: + versions: + Python 2.6, Python 2.5
2007-09-17 10:16:00 jafo set assignee: tim.petersmessages: + nosy: + jafo, tim.peters
2007-08-28 18:34:36 georg.brandl set priority: normal -> criticaltype: crashseverity: normal -> major
2007-08-28 18:33:45 georg.brandl link issue1778376 superseder
2003-10-01 06:22:02 janixia create