Issue 13215: multiprocessing Manager.connect() aggressively retries refused connections (original) (raw)
Created on 2011-10-18 19:16 by bgilbert, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (9)
Author: Benjamin Gilbert (bgilbert)
Date: 2011-10-18 19:16
multiprocessing.managers.BaseManager.connect() takes 20 seconds to return on failure, even if the server refuses the connection. This is because the function that creates the connection, multiprocessing.connection.SocketClient(), handles ECONNREFUSED by retrying the connection attempt every 10 ms for 20 seconds.
While a 20 second timeout may make sense for unresponsive servers, ECONNREFUSED probably indicates that the server is not listening on this port, so hammering it with 1,999 more connection attempts isn't going to help. In this case, SocketClient() should fail immediately, or -- at most -- retry a small number of times with exponential backoff.
Author: Antoine Pitrou (pitrou) *
Date: 2011-10-23 23:05
I'm not sure, but I think that would be for the case where you are spawning the server yourself and the child process takes time to start up.
Author: Charles-François Natali (neologix) *
Date: 2011-10-24 17:16
While a 20 second timeout may make sense for unresponsive servers, ECONNREFUSED probably indicates that the server is not listening on this port, so hammering it with 1,999 more connection attempts isn't going to help.
That's funny, I noticed this a couple days ago, and it also puzzled me...
I'm not sure, but I think that would be for the case where you are spawning the server yourself and the child process takes time to start up.
That's also what I think. But that's strange, since:
- this holds for every client/server communication (e.g. why not do that for smtplib, telnetlib, etc. ?)
- it's against the classical connect() semantics
- some code may prefer failing immediately (instead of "hammering" the remote host) if the remote server is down, or the address is incorrect: it can still handle the ECONNREFUSED if it wants to retry, with a custom retry timeout I removed the retry code and run test_multiprocessing and test_concurrent_futures in loop, and didn't see any failure (on Linux), so I'd say we could probably remove that. OTOH, I would feel bad if this broke someone's code (even though code relying on the automatic retries is probably broken). So I'm +1 on removing the retry logic altogether, unless of course someone comes up with a good reason to keep it (I digged a little through the logs to see when this was introduced, but apparently it was there in the original import). If we don't remove it, I agree we should at least reduce the timeout and increase the period (an exponential backoff may be a bit overkill).
Author: Charles-François Natali (neologix) *
Date: 2011-11-15 20:40
Here's a patch removing the automatic retry on ECONNREFUSED: I tested it on Linux and other Unices, and it seems to work just fine without this hammering. Note that there's a similar mechanism for Windows (ERROR_PIPE_BUSY), but it seems to be necessary there (test_multiprocessing blows up without this). Note that I'd rather only apply this to default (not 2.7 and 3.2), since this is more of a behavior change than a bug fix.
Author: Antoine Pitrou (pitrou) *
Date: 2011-11-17 00:32
Apparently you forgot to upload the patch...
Author: Charles-François Natali (neologix) *
Date: 2011-11-17 21:16
Apparently you forgot to upload the patch...
Told you I couldn't think straight :-)
Author: Antoine Pitrou (pitrou) *
Date: 2011-11-17 21:43
Ok for me. Nice to see that piece of ugliness removed.
Author: Roundup Robot (python-dev)
Date: 2011-11-19 09:00
New changeset 34fcc0d5c3c5 by Charles-François Natali in branch 'default': Issue #13215: multiprocessing.Connection: don't hammer the remote end with http://hg.python.org/cpython/rev/34fcc0d5c3c5
Author: Charles-François Natali (neologix) *
Date: 2011-11-19 14:58
Benjamin, thanks for the report.
History
Date
User
Action
Args
2022-04-11 14:57:22
admin
set
github: 57424
2011-11-19 14:58:11
neologix
set
status: open -> closed
resolution: fixed
messages: +
stage: patch review -> resolved
2011-11-19 09:00:27
python-dev
set
nosy: + python-dev
messages: +
2011-11-17 21:43:02
pitrou
set
messages: +
2011-11-17 21:16:58
neologix
set
files: + connection_retry.diff
messages: +
2011-11-17 00:32:55
pitrou
set
messages: +
2011-11-15 20:40:07
neologix
set
keywords: + patch, needs review
stage: patch review
messages: +
versions: + Python 3.3, - Python 2.7, Python 3.2
2011-10-24 17:16:04
neologix
set
messages: +
2011-10-23 23:05:01
pitrou
set
nosy: + neologix, pitrou
messages: +
2011-10-18 19:16:57
bgilbert
create