Issue 3014: file_dealloc() assumes errno is set when EOF is returned (original) (raw)

Created on 2008-05-30 21:54 by johansen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6088 merged CuriousLearner,2018-03-12 17:45
Messages (6)
msg67560 - (view) Author: johansen (johansen) Date: 2008-05-30 21:54
We're using Python to build the new packaging system for OpenSolaris. Yesterday, a user reported that when they ran the pkg command, piped the output to grep, and then typed ^C, sometimes they'd get this error: $ pkg list | grep office ^Cclose failed: [Errno 11] Resource temporarily unavailable We assumed that this might be a problem in the signal handling we've employed to catch SIGPIPE; however, it turns out that the problem is in the file_dealloc() code. For the perversely curious, additional details may be found in the original bug located here: http://defect.opensolaris.org/bz/show_bug.cgi?id=2083 Essentially we found the following: The error message is emitted from fileobject.c: file_dealloc() The relevant portion of the routine looks like this: static void file_dealloc(PyFileObject *f) { int sts = 0; if (f->weakreflist != NULL) PyObject_ClearWeakRefs((PyObject *) f); if (f->f_fp != NULL && f->f_close != NULL) { Py_BEGIN_ALLOW_THREADS sts = (*f->f_close)(f->f_fp); Py_END_ALLOW_THREADS if (sts == EOF) #ifdef HAVE_STRERROR PySys_WriteStderr("close failed: [Errno %d] %s\n", errno, strerror(errno)); In the cases we encountered, the function pointer f_close is actually a call to sysmodule.c: _check_and_flush() That routine looks like this: static int _check_and_flush (FILE *stream) { int prev_fail = ferror (stream); return fflush (stream) prev_fail ? EOF : 0; } check_and_flush calls ferror(3C) and then fflush(3C) on the FILE stream associated with the file object. There's just one problem here. If it finds an error that was previously encountered on the file stream, there's no guarantee that errno will be valid. Should an error be encountered in fflush(3C), errno will get set; however, the contents of errno are undefined should fflush() return successfully. Here's what happens in the code I observed: I set a write watchpoint on errno and observed the different times it was accessed. After sifting through a bunch of red-herrings, I found that a call to PyThread_acquire_lock() that sets errno to 11 (EAGAIN). This occurs when PyThread_acquire_lock() calls sem_trywait(3C) and finds the semaphore already locked. Errno doesn't get accessed again until a call to libc.so.1`isseekable() that simply saves and restores the existing errno. Since we've taken a ^C (SIGINT), the interpreter begins the finalization process and eventually calls file_dealloc(). This routine calls _check_and_flush(). In the case that I observed, ferror(3C) returns a non-zero value but fflush(3C) completes successfully. This causes the routine to return EOF to the caller. file_dealloc() assumes that since it received an EOF an error occurred and it should call strerror(errno). However, since this is just returning the state of a previous error, errno is invalid. This is what causes the spurious EAGAIN message. Just to be sure, I traced the return value and errno of failed syscalls that were invoked by the interpreter. I was unable to observe any syscalls returning EAGAIN. This is because (at least on OpenSolaris) sem_trywait(3C) calls sema_trywait(3C). The sema_trywait returns EBUSY if the semaphore is held and sem_trywait converts this to EAGAIN. None of these errors are passed out of the kernel. It's not clear to me whether _check_and_flush(), file_dealloc(), or both need modification. At a minimum, it's not safe for file_dealloc() to assume that errno is set correctly if the function underneath it is using ferror(3C) to find the presence of an error on the stream.
msg72072 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-28 09:34
The title of your bug report might be misleading. Is the problem that errno is misinterpreted in the error message, or that there is an error message at all?
msg72094 - (view) Author: johansen (johansen) Date: 2008-08-28 16:24
The problem is present in Python 2.4.4, the version that we're using here. I'm not familiar with the versions keyword that's used here, but that's the version for which I'm reporting this bug. To be clear, the problem is that when file_dealloc() sees an EOF, it assumes that an error has occurred. It then checks errno. However, it's possible for file_dealloc() to get an EOF under conditions where errno hasn't been set. In that case, it reports an error with a stale value of errno. This is explained in detail in the initial report. I don't think the title is misleading; it's incorrect for file_dealloc() to assume that errno is set every time it gets an EOF.
msg72095 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-28 16:33
Selon johansen <report@bugs.python.org>: > > The problem is present in Python 2.4.4, the version that we're using > here. I'm not familiar with the versions keyword that's used here, but > that's the version for which I'm reporting this bug. The only changes that may be brought now to the 2.4 branch are security fixes and nothing else. Bug fixes with a slight risk of changing legitimate behaviour like the present one, on the other hand, will happen either on the 2.6 branch or on the 2.7 one. > To be clear, the problem is that when file_dealloc() sees an EOF, it > assumes that an error has occurred. It then checks errno. However, > it's possible for file_dealloc() to get an EOF under conditions where > errno hasn't been set. In that case, it reports an error with a stale > value of errno. This is explained in detail in the initial report. You still haven't explained what the correct behaviour would be, though. Must it report an error or not? If it must, then how should this specific error be detected and how should the error message be phrased? Thanks in advance.
msg72098 - (view) Author: johansen (johansen) Date: 2008-08-28 17:04
As I said before: check_and_flush calls ferror(3C) and then fflush(3C) on the FILE stream associated with the file object. There's just one problem here. If it finds an error that was previously encountered on the file stream, there's no guarantee that errno will be valid. Should an error be encountered in fflush(3C), errno will get set; however, the contents of errno are undefined should fflush() return successfully. The problem is this: The caller of check_and_flush() will check errno if check_and_flush returns an error. However, check_and_flush() checks two different error sources. If ferror() returns an error, you can't check errno, because that routine doesn't set errno. However, if fflush() returns an error, it will set errno. In its current form, no caller of check_and_flush() should check errno if the check_and_flush doesn't return successfully, since there's no way of knowing whether errno will be set or not. That doesn't make a lot of sense to me, so I would probably re-write this code. However, since I'm not an expert in the inner workings of the Python interpreter, it's hard for me to suggest just how to do this.
msg393265 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-05-08 16:00
I believe this is out of date - there is no longer file_dealloc or anything similar to the code in the report in fileobject.c
History
Date User Action Args
2022-04-11 14:56:35 admin set github: 47264
2021-06-14 20:06:59 iritkatriel set status: pending -> closedstage: patch review -> resolved
2021-05-08 16:00:48 iritkatriel set status: open -> pendingnosy: + iritkatrielmessages: + resolution: out of date
2018-03-12 17:45:35 CuriousLearner set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest5849>
2011-06-01 20:42:24 schmir set nosy: + schmir
2008-08-28 22:27:45 pitrou set nosy: - pitrou
2008-08-28 17:04:49 johansen set messages: +
2008-08-28 16:33:07 pitrou set messages: +
2008-08-28 16:24:32 johansen set messages: + versions: + Python 2.4, - Python 2.7
2008-08-28 09:34:07 pitrou set priority: normalnosy: + pitroumessages: + versions: + Python 2.7, - Python 2.4
2008-08-28 06:19:51 laca set nosy: + laca
2008-05-30 21:54:10 johansen create