Issue 10207: test_file2k crashes under Windows (original) (raw)
Issue10207
Created on 2010-10-26 22:20 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (6) | ||
---|---|---|
msg119655 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-10-26 22:20 |
This recurrent crash appeared recently under the XP-4 buildbot, between r85816 and r85819. But none of these revisions looks related to potential file issues. Each time the test just finishes with: [...] test_file2k program finished with exit code -1073741819 (which according to Google means access violation) See http://www.python.org/dev/buildbot/builders/x86%20XP-4%202.7/ for test logs. | ||
msg119661 - (view) | Author: David Bolen (db3l) * | Date: 2010-10-27 00:19 |
Can't really say why it's just hitting now more consistently but the failure is an internal CRT exception during a file close, when it is handed what appears to be an invalid FILE * (the internal structure has bad data). I think it's more chance that it appears to be in the 86819-85819 change. I've been able to generate the same error in 85816 with 4-5 test runs. In all cases it's sporadic, but can trigger just by running test_file2k so doesn't require any other tests to have run. With a local build and the VS debugger, the failure point is the CRT during an internal flush as part of fclose(). The FILE * structure looks like it became corrupted prior to that point. The tail end of the failure traceback is: msvcr90d.dll!_write_nolock(int fh=3, const void * buf=0x00f4e008, unsigned int cnt=1) Line 322 + 0x6 bytes msvcr90d.dll!_write(int fh=3, const void * buf=0x00f4e008, unsigned int cnt=1) Line 75 + 0x11 bytes msvcr90d.dll!_flush(_iobuf * str=0x10311448) Line 154 + 0x1d bytes msvcr90d.dll!_fclose_nolock(_iobuf * str=0x10311448) Line 105 + 0x9 bytes msvcr90d.dll!fclose(_iobuf * stream=0x10311448) Line 57 + 0x9 bytes python27_d.dll!close_the_file(PyFileObject * f=0x00bef5d8) Line 451 + 0x7 bytes python27_d.dll!file_close(PyFileObject * f=0x00bef5d8) Line 651 + 0x9 bytes python27_d.dll!call_function(_object * * * pp_stack=0x0145f228, int oparg=12514776) Line 3996 + 0x10 bytes The last Python runtime code is in the close_the_file frame (fileobject.c, line 451), it's the line: sts = (*local_close)(local_fp) and local_fp has bad pointers inside its structure: local_fp 0x10311448 _iobuf * _ptr 0x00f4e009 char * _cnt 16383 int _base 0x00f4e008 char * _flag 34178 int _file 3 int _charbuf 0 int _bufsiz 16384 int _tmpfname 0x00000000 char * I've also seem some cases where _ptr isn't explicitly flagged as a Bad Ptr (e.g., value 0x00e56031 with r85816) but is still invalid to access. The local_close code runs outside the GIL, so maybe some sort of race condition with the parent Python file object, though the file object internal file point is NULL'd before releasing the GIL so I don't know if that seems plausible. I'm not sure of the best way to trace back across the C/Python interpreter boundary - the trace above this is really just a lot of PyEval_EvalFrameEx/fast_function/call_function entry points with Python objects that are fairly opaque to the VS debugger. One thing I do find interesting is that sometimes after continuing past the failure point the tests actually run to completion. Not sure if that means the issue might be more in test wrapper code than the tests themselves, or just that continuing past the flush failure in the debugger is something the test can't detect. I'll try to break down the test module later tonight when I have some more time. | ||
msg119668 - (view) | Author: David Bolen (db3l) * | Date: 2010-10-27 01:31 |
Ok, when it fails, the failure always appears to be one of the FileThreadingTests tests, with the affected close() call occurring within _close_file, called from _close_and_reopen_file, called from _test_close_open_io. It seems tough to backtrace the VS traces to the exact parent test due to the threading involved, but so far I've only been able to see the problem (with some debugging output added) while in test_close_open_print_buffered. It's a little tough to be positive, but I can seem to influence the odds of failure a bit with host I/O load, and I have yet to see a failure if I remove that test, while limiting the FileThreadingTests class to just that test fails with about the same frequency as the original full class did. So even if the issue is more general, that test seems the best way to exercise/analyze it. So there does seem to likely be a real race condition somewhere. | ||
msg119706 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-10-27 12:51 |
There's a patch in which might fix a similar problem ("test_close_open_print_buffered sometimes crashes"). Could you try it? | ||
msg119722 - (view) | Author: David Bolen (db3l) * | Date: 2010-10-27 17:14 |
Yes, the patch from (py27_fix_threaded_file_close.patch) seems to do the trick. I managed to get pretty good (90%+ success) at tickling the issue - interestingly by interactively clicking away from the test console window just after starting the test - and have run the test dozens of times with the patch with no problem. Oddly, I couldn't seem to create the issue with other tests by adding buffering to them, which I had expected if it was a general threading issue with file buffering - only the print test was crashing. Maybe since print is internally less atomic (with separate I/O due to softspace) it's more susceptible to thread switching. | ||
msg119800 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-10-28 16:14 |
I've committed the aforementioned patch (r85892) and this seems to be fixed. Thank you! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:07 | admin | set | github: 54416 |
2010-10-28 16:14:29 | pitrou | set | status: open -> closedresolution: fixedmessages: + |
2010-10-27 17:14:47 | db3l | set | messages: + |
2010-10-27 12:51:02 | pitrou | set | messages: + |
2010-10-27 01:31:22 | db3l | set | messages: + |
2010-10-27 00:19:17 | db3l | set | messages: + |
2010-10-26 22:20:23 | pitrou | create |