Issue 19017: selectors: towards uniform EBADF handling (original) (raw)
Issue19017
Created on 2013-09-14 11:01 by neologix, last changed 2022-04-11 14:57 by admin.
Messages (19) | ||
---|---|---|
msg197701 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-09-14 11:01 |
Currently, in case of invalid file descriptor (which can happen easily e.g. if the user closes the FD before unregistering it), selector.select() will react differently depending on the backend: SelectSelector fails with EBADF PollSelector returns EVENT_READ/EVENT_WRITE for the FD (translated from poll()'s POLLNVAL) EpollSelector returns nothing (epoll silently removes the FD from the set of monitored FD, there's no EPOLLNVAL) KqueueSelector behaves like EpollSelector One possibility would be for PollSelector.select() to raise EBADF in case of POLLNVAL. That would be consistent with select, but not with epoll and kqueue. Another possibility would be to silently fix those, i.e. in case of EBADF/POLLNVAL in select/poll, remove the offending FD from the list of monitored FDs. That's what libev does (see e.g. http://cvs.schmorp.de/libev/ev_poll.c?revision=1.39&view=markup and http://cvs.schmorp.de/libev/ev.c?revision=1.457&view=markup), and that's also what twisted does, at least for select (see http://twistedmatrix.com/trac/browser/trunk/twisted/internet/selectreactor.py#L77). One advantage is that once we've detected a bad FD, we remove it so we're sure we won't loop endlessly: e.g. for poll, if the user doesn't try to use the FD no exception will be raised and selector.select() keep reporting the FD ready. (Note that performing this cleanup is easy for poll since it's flagged with POLLNVAL, but for select this requires looping over the registered FDs, but since that's only called in case of error, it's not performance critical, see Twisted's code) I personally favor the later solution. Second point: if we opt for the later solution, we should probably update {EpollSelector,KqueueSelector}.unregister() to silently catch an error if the FD wasn't registered in the underlying epoll/kqueue. That's what libev does, and amusingly that's what the select module already does: http://hg.python.org/cpython/file/c9644c1f9c94/Modules/selectmodule.c#l1329 Third point: should the current select behavior (ignoring epoll.unregister() error) be kept? Thoughts? | ||
msg197702 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2013-09-14 11:11 |
> Another possibility would be to silently fix those, i.e. in case of > EBADF/POLLNVAL in select/poll, remove the offending FD from the list of > monitored FDs. Or we could remove the fd *and* still raise an error the first time, or at least a warning. It sounds like a programming error, so I'm not sure it's a good idea to silence it. Note that Twisted still logs an error message. Also, Tornado lets errors through: https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L652 https://github.com/facebook/tornado/blob/master/tornado/platform/select.py#L61 | ||
msg197703 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-09-14 11:20 |
> Or we could remove the fd *and* still raise an error the first time, or at least a warning. It sounds like a programming error, so I'm not sure it's a good idea to silence it. Yeah, sure. By "silently" I meant "without user intervention", sorry for the confusion. Now I'm not sure between raising an error or just logging it. | ||
msg197704 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2013-09-14 11:21 |
> Now I'm not sure between raising an error or just logging it. If "logged", it should be a Warning rather than an actual logging call, IMO. | ||
msg197718 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-09-14 16:27 |
Would it be possible to emit an event "invalid fd" with the fd? So the user can decide what to do on such case: ignore, log, raise an exception, ... | ||
msg197917 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-09-16 16:33 |
Interesting issue. ISTM that closing the FD before unregistering it is a programmer's mistake that shouldn't pass silently. And closing it in a separate thread while the selector is active sounds like an even bigger bug. Could we report an new event type for this situation? E.g. EVENT_CLOSED. The proper response would be to unregister the FD. (And yes, unregistering the FD when it was previously registered should not be an error, even if it has been closed.) I'm not sure I care that there will be an infinite loop if the caller doesn't take action -- it's the same for any event, if you don't read from an FD when you get an EVENT_READ event, you'll get an infinite loop too. Note that this is not exactly the same as the "invalid fd" that Victor proposes -- although they aren't always distinguishable, the logic errors in the app are different: in one case, closing a FD (that was valid and registered) without unregistering, in the other case, registering an FD that isn't valid. (Ideally the register() call should fail, but I suppose we can't detect that without an extra syscall -- although in some cases there's a syscall in register() anyway, and then we can use that to reject the register() call. | ||
msg197929 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-09-16 18:41 |
> Interesting issue. ISTM that closing the FD before unregistering it is a programmer's mistake that shouldn't pass silently. And closing it in a separate thread while the selector is active sounds like an even bigger bug. Agreed. > Could we report an new event type for this situation? E.g. EVENT_CLOSED. The proper response would be to unregister the FD. (And yes, unregistering the FD when it was previously registered should not be an error, even if it has been closed.) The problem is that for epoll (and kqueue I think) the FD is automagically removed from the backend, which means that we won't get any notification for this FD, hence we're unable to report it as closed. | ||
msg197932 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-09-16 19:04 |
> The problem is that for epoll (and kqueue I think) the FD is > automagically removed from the backend, which means that we won't get > any notification for this FD, hence we're unable to report it as > closed. That makes it sound like it will be hard to respond the same way on all platforms. :-( Even silently removing the registration for select and poll still leaves the behavior different: if you register a bad FD with select or poll, the register() call will accept it but the select() call will silently remove it; however if you register a bad FD with epoll or kqueue, register() will raise EBADF. I would prefer to see an EBADF exception in all four cases, even if it happens in register() for some platforms and in select() for others. There is also the issue of trying to unregister (or modify?) an FD that has gone bad after successful registration -- this will also have different behavior on the different platform, right? Maybe we'll get better design guidance if we look at this from the POV of a server loop that doesn't fully control the code that may be registering and unregistering FDs. It seems a good idea to make it so that if some bad piece of code tries to register a bad FD or close a FD without unregistering doesn't bring down the entire server, but the bad code always sees an exception. Maybe we should be silent in select() but cause unregister() to fail? | ||
msg198674 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-09-30 03:48 |
Charles-Francois, sorry to add you back to the bug, but (a) I thought you had agreed to a compromise patch that restarts signals in most cases but not for select(), poll() etc.; (b) I may have found a flaw in the idea. The flaw (if it is one) is related to Py_AddPendingCall(). This "schedules" a pending callback, mostly for signals, but doesn't AFAICT interrupt the mainthread in any way. (TBH, I only understand the code for Python 2.7, and in that version I'm sure it doesn't.) So is this a flaw? I'm nor sure. Can you think about it? | ||
msg202122 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-11-04 12:54 |
What is the status of this issue? | ||
msg202139 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-11-04 14:06 |
> What is the status of this issue? AFAICT, we haven't reached a consensus yet on the best way to handle EBADF. | ||
msg202146 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-11-04 15:48 |
> AFAICT, we haven't reached a consensus yet on the best way to handle EBADF. By which you mean that you still don't agree with my proposal? Which is to fix it for most syscalls but not for select(), poll() and similar (anything that the new selectors module interfaces to). | ||
msg202147 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-11-04 16:16 |
> By which you mean that you still don't agree with my proposal? Which is to > fix it for most syscalls but not for select(), poll() and similar (anything > that the new selectors module interfaces to). I agree with your proposal, but that's another issue :-) (this thread is about EBADF in selectors, not EINTR). | ||
msg202150 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-11-04 17:42 |
Eww, sorry. That's the second time I mistook this thread for the other. I re-read the original description and I now think that we should follow your original advice. There are two separate cases: (1) Registering an invalid FD; this succeeds for select/poll, fails for epoll/kqueue. (2) Registering a valid FD, then closing it, then calling .select(); this raises for select, returns a special event for poll, and is silently ignored for epoll/kqueue. I agree on making all selectors do the same thing for (2), and I agree that the best thing is to be silent and unregister the bad FD if we can (IIUC epoll/kqueue don't even tell us this happened?). We then need to make unregister() be silent when the FD isn't registered. Maybe make it return True when it actually unregistered something and False when there's nothing? An alternative would be to put the bad FD into a separate "quarantine" set so that it won't be used by the select() method but will still be recognized by unregister(). (And register() should look there too.) This still leaves case (1), where the FD is already bad when we register it. I am actually fine with sometimes raising and sometimes not; I don't want to pay the extra overhead of doing an fstat() or some other syscall just to verify that it is valid. (Although this would make the argument about wanting to choose a selector class that doesn't make extra syscalls less compelling. :-) And neither do I want to ignore the error in register() and pretend success. I guess we should in any case make it consistent so that if you successfully register() an FD you can always unregister() it without raising. I really wish there was an event to tell us that an FD has become invalid, but if epoll/kqueue really don't support that, i don't think it's worth it to introduce that only for select/poll. IOW let's do what those other systems you mention do. | ||
msg202161 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-11-04 20:02 |
To find an invalid FD when select() fails with EBAD, we can use something like: http://ufwi.org/projects/nufw/repository/revisions/b4f66edc5d4dc837f75857f8bffe9015454fdebc/entry/src/nuauth/tls_nufw.c#L408 | ||
msg205784 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-12-10 10:47 |
What is the status of this issue? | ||
msg205785 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-12-10 10:48 |
The issue #19876 has been closed. | ||
msg235388 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2015-02-04 16:41 |
> To find an invalid FD when select() fails with EBAD, we can use something like: > http://ufwi.org/projects/nufw/repository/revisions/b4f66edc5d4dc837f75857f8bffe9015454fdebc/entry/src/nuauth/tls_nufw.c#L408 Oh, the link is dead. Copy/paste of the code: --- /* errno == EBADF */ int i; /* A client disconnects between FD_SET and select. * Will try to find it */ for (i=0; imx; ++i){ struct stat s; if (FD_ISSET(i, &context->tls_rx_set)){ if (fstat(i, &s)<0) { log_message(CRITICAL, DEBUG_AREA_USER, "Warning: %d is a bad file descriptor.", i); FD_CLR(i, &context->tls_rx_set); } } } continue; /* retry select */ --- In short: call fstat() if check if the FD is valid or not. O(n) complexity. | ||
msg235389 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2015-02-04 16:48 |
Guido wrote: "This still leaves case (1), where the FD is already bad when we register it. I am actually fine with sometimes raising and sometimes not; I don't want to pay the extra overhead of doing an fstat() or some other syscall just to verify that it is valid. (Although this would make the argument about wanting to choose a selector class that doesn't make extra syscalls less compelling. :-) And neither do I want to ignore the error in register() and pretend success." The asyncio has a debug mode which can execute expensive checks. These checks would not be acceptable in release mode since they are an impact on performances. In debug mode, asyncio can explicitly call fstat() before calling selector.register() or selector.modify(). This solution may be suggested in the documentation if someone wants a portable behaviour for all selector. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:50 | admin | set | github: 63217 |
2015-02-04 16:48:35 | vstinner | set | messages: + |
2015-02-04 16:41:18 | vstinner | set | messages: + |
2015-02-03 15:01:42 | vstinner | set | nosy: + yselivanovcomponents: + asyncio |
2013-12-10 10:48:15 | vstinner | set | messages: + |
2013-12-10 10:47:24 | vstinner | set | messages: + |
2013-11-04 20:02:44 | vstinner | set | messages: + |
2013-11-04 17:42:58 | gvanrossum | set | messages: + |
2013-11-04 16:16:09 | neologix | set | messages: + |
2013-11-04 15:48:01 | gvanrossum | set | messages: + |
2013-11-04 14:06:56 | neologix | set | messages: + |
2013-11-04 12:54:01 | vstinner | set | messages: + |
2013-09-30 03:48:21 | gvanrossum | set | messages: + |
2013-09-16 19:04:54 | gvanrossum | set | messages: + |
2013-09-16 18:41:52 | neologix | set | messages: + |
2013-09-16 16:33:26 | gvanrossum | set | messages: + versions: + Python 3.4 |
2013-09-14 16:27:29 | vstinner | set | messages: + |
2013-09-14 11:21:41 | pitrou | set | messages: + |
2013-09-14 11:20:39 | neologix | set | messages: + |
2013-09-14 11:11:00 | pitrou | set | nosy: + glyphmessages: + |
2013-09-14 11:01:58 | neologix | create |