Issue 19465: selectors: provide a helper to choose a selector using constraints (original) (raw)
Issue19465
Created on 2013-10-31 22:36 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (14) | ||
---|---|---|
msg201855 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-10-31 22:36 |
multiprocess, telnetlib (and subprocess in a near future, see #18923) use the following code to select the best selector: # poll/select have the advantage of not requiring any extra file descriptor, # contrarily to epoll/kqueue (also, they require a single syscall). if hasattr(selectors, 'PollSelector'): _TelnetSelector = selectors.PollSelector else: _TelnetSelector = selectors.SelectSelector I don't like the principle of "a default selector", selectors.DefaultSelector should be removed in my opinion. I would prefer a function returning the best selector using constraints. Example: def get_selector(use_fd=True) -> BaseSelector: ... By default, it would return the same than the current DefaultSelector. But if you set use_fd=False, the choice would be restricted to select() or poll(). I don't want to duplicate code like telnetlib uses in each module, it's harder to maintain. The selectors module may get new selectors in the future, see for example #18931. Except use_fd, I don't have other ideas of constraints. I read somewhere that differenet selectors may have different limits on the number of file descriptors. I don't know if it's useful to use such constraint? | ||
msg201856 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-10-31 22:45 |
What's the use case for not wanting to use an extra FD? Nevertheless I'm fine with using a function to pick the default selector (but it requires some changes to asyncio too, which currently uses DefaultSelector). Something I would find useful would be a way to override the selector choice on the command line. I currently have to build this into the app's arg parser and main(), e.g. http://code.google.com/p/tulip/source/browse/examples/sink.py#64 | ||
msg201857 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-10-31 22:59 |
> What's the use case for not wanting to use an extra FD? A selector may be used a few millisecond just to check if a socket is ready, and then destroyed. For such use case, select() is maybe enough (1 syscall). Epoll requires more system calls: create the epoll FD, register the socket, poll, destroy the epoll FD (4 syscalls). | ||
msg201858 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-10-31 23:36 |
Hm... I'm trying to understand how you're using the selector in telnetlib.py (currently the only example outside asyncio). It seems you're always using it with a single file/object, which is always 'self' (which wraps a socket), except one place where you're also selecting on stdin. Sometimes you're using select(0) to check whether I/O is possible right now, using select(0), and then throw away the selector; other times you've got an actual loop. I wonder if you could just create the selector when the Telnet class is instantiated (or the first time you need the selector) and keep the socket permanently registered; IIUC selectors are level-triggered, and no resources are consumed when you're not calling its select() method. (I think this means that if the socket was ready at some point in the past, but you already read those bytes, and now you're calling select(), it won't be considered ready even though it was registered the whole time.) It still seems to me that this is pretty atypical use of selectors; the extra FD used doesn't bother me much, since it doesn't really scale anyway (that would require hooking multiple Telnet instances into the the same selector, probably using an asyncio EventLoop). If you insist on having a function that prefers poll and select over kqueue or epoll, perhaps we can come up with a slightly higher abstraction for the preference order? Maybe faster startup time vs. better scalability? (And I wouldn't be surprised if on Windows you'd still be better off using IocpProactor instead of SelectSelector -- but that of course has a different API altogether.) | ||
msg201859 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-10-31 23:37 |
Hm... I'm trying to understand how you're using the selector in telnetlib.py (currently the only example outside asyncio). It seems you're always using it with a single file/object, which is always 'self' (which wraps a socket), except one place where you're also selecting on stdin. Sometimes you're using select(0) to check whether I/O is possible right now, using select(0), and then throw away the selector; other times you've got an actual loop. I wonder if you could just create the selector when the Telnet class is instantiated (or the first time you need the selector) and keep the socket permanently registered; IIUC selectors are level-triggered, and no resources are consumed when you're not calling its select() method. (I think this means that if the socket was ready at some point in the past, but you already read those bytes, and now you're calling select(), it won't be considered ready even though it was registered the whole time.) It still seems to me that this is pretty atypical use of selectors; the extra FD used doesn't bother me much, since it doesn't really scale anyway (that would require hooking multiple Telnet instances into the the same selector, probably using an asyncio EventLoop). If you insist on having a function that prefers poll and select over kqueue or epoll, perhaps we can come up with a slightly higher abstraction for the preference order? Maybe faster startup time vs. better scalability? (And I wouldn't be surprised if on Windows you'd still be better off using IocpProactor instead of SelectSelector -- but that of course has a different API altogether.) | ||
msg201863 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-11-01 00:18 |
> It still seems to me that this is pretty atypical use of selectors I already implemented something similar to subprocess.Popen.communicate() when I was working on old Python versions without the timeout parameter of communicate(). http://ufwi.org/projects/edw-svn/repository/revisions/master/entry/trunk/src/nucentral/nucentral/common/process.py#L222 IMO calling select with a few file descriptors (between 1 and 3) and destroying quickly the "selector" is no a rare use case. If I would port my code to selectors, I don't want to rewrite it to keep the selector alive longer, just because selectors force me to use the super-powerful fast epoll/kqueue selector. (To be honest, I will probably not notice any performance impact. But I like reducing the number of syscalls, not the opposite :-)) | ||
msg201865 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-11-01 00:58 |
OK. Let's have a function to select a default selector. Can you think of a better name for the parameter? Or maybe there should be two functions? | ||
msg201866 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2013-11-01 01:03 |
> OK. Let's have a function to select a default selector. > Can you think of a better name for the parameter? Or > maybe there should be two functions? I prefer to leave the question to the author of the module, Charles-François :-) | ||
msg201889 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-11-01 11:07 |
There are actually two reasons to choosing poll over epoll/kqueue (i.e. no extra FD): - it's a bit faster (1 syscall vs 3) - but more importantly - and that's the main reason I did it in telnetlib/multiprocessing/subprocess - sometimes, you really don't want to use an extra FD: for example, if you're creating 300 telnet/subprocess instances, one more FD per instance can make you reach RLIMIT_NOFILE, which makes some syscalls fail with EMFILE (at work we have up to a 100 machines, and we spawn 1 subprocess per machine when distributing files with bittorrent). So I agree it would be nice to have a better way to get a selector not requiring any extra FD. The reason I didn't add such a method in the first place is that I don't want to end up like many Java APIs: Foo.getBarFactory().getInstance().initialize().provide() :-) > I read somewhere that differenet selectors may have different limits on the number of file descriptors. Apart from select(), all other selectors don't have an upper limit. As for the performance profiles, depending on the application usage, select() can be faster than poll(), poll() can be faster than epoll(), etc. But since it's really highly usage-specific - and of course OS specific - I think the current choice heuristic is fine: people with specific needs can just use PollSelector/EpollSelector themselves. To sum up, get_selector(use_fd=True) looks fine to me. | ||
msg201906 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-11-01 14:41 |
Hm. If you really are going to create 300 instances, you should probably use asyncio. Otherwise, how are you going to multiplex them? Create 300 threads each doing select() on 1 FD? That sounds like a poor architecture and I don't want to bend over backwards to support or encourage that. | ||
msg201910 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-11-01 15:13 |
Of course, when I have 300 connections to remote nodes, I use poll() to multiplex between them. But there are times when you can have a large number of threads running concurrently, and if many of them call e.g. subprocess.check_output() at the same time (which does call subprocess.communicate() behind the scene, and thus calls select/poll), then one extra FD per instance could be an issue. For example, in http://bugs.python.org/issue18756, os.urandom() would start failing when multiple threads called it at the same time. | ||
msg204108 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2013-11-23 21:26 |
I think this is more of a documentation issue. People who don't want a new fd can hardcode PollSelector (poll has been POSIX for a long time). | ||
msg204119 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-11-23 22:36 |
> Antoine Pitrou added the comment: > > I think this is more of a documentation issue. People who don't want a new fd can hardcode PollSelector (poll has been POSIX for a long time). That's also what I now think. I don't think that the use case is common enough to warrant a "factory", a default selector is fine. | ||
msg210994 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2014-02-11 17:53 |
It looks you rejected my idea, so I'm in favor of just closing the issue. Do you agree? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:52 | admin | set | github: 63664 |
2014-02-11 18:36:59 | gvanrossum | set | status: open -> closedresolution: wont fix |
2014-02-11 17:53:34 | vstinner | set | messages: + |
2013-11-23 22:36:57 | pitrou | set | assignee: docs@pythoncomponents: + Documentationnosy: + docs@python |
2013-11-23 22:36:14 | neologix | set | messages: + |
2013-11-23 21:26:43 | pitrou | set | nosy: + pitroumessages: + |
2013-11-01 15:13:03 | neologix | set | messages: + |
2013-11-01 14:41:05 | gvanrossum | set | messages: + |
2013-11-01 11:07:03 | neologix | set | messages: + |
2013-11-01 01:03:46 | vstinner | set | messages: + |
2013-11-01 00:58:39 | gvanrossum | set | messages: + |
2013-11-01 00🔞07 | vstinner | set | messages: + |
2013-10-31 23:37:13 | gvanrossum | set | messages: + |
2013-10-31 23:36:33 | gvanrossum | set | messages: + |
2013-10-31 22:59:13 | vstinner | set | messages: + |
2013-10-31 22:45:54 | gvanrossum | set | messages: + |
2013-10-31 22:36:21 | vstinner | create |