Issue 4336: Fix performance issues in xmlrpclib (original) (raw)

Created on 2008-11-17 14:33 by kristjan.jonsson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
xmlrpc.patch kristjan.jonsson,2008-11-17 14:33
xmlrpc.patch kristjan.jonsson,2008-11-25 11:13 A new version of the patch, taking into account various concerns raised.
xmlrpc.patch kristjan.jonsson,2008-11-27 16:06
xmlrpc.patch kristjan.jonsson,2008-11-28 10:53
xmlrpc_1.patch kristjan.jonsson,2008-12-01 10:12 a patch that focuses on the Nagle problem only
httplib.patch kristjan.jonsson,2008-12-08 15:33 a suggested improvement to the endheaders(data) feature
htmllibclients.patch kristjan.jonsson,2008-12-08 15:34 use endheaders(body) in stead of separate conn.send(body) in users of the httplib.py
Messages (19)
msg75962 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-17 14:33
There are two performance problems in xmlrpclib.py: POST requests use two send() calls, one to send the headers and one to send the data. This can invoke the Nagle/Delayed ACK performance problem. On many socket implementations (including some windows platforms) it can introduce delays of up to 200ms when talking over the wire (as opposed to localhost) The second is the use of no buffering when reading the xmlrpc response. It is done using the readline() call on a file-like object constructed from the socket. When no buffering is in effect, this causes a separate recv() call for each character. This patch fixes these issues separately: 1) We provide a new getheaderdata() function in httplib which just returns the data for the headers instead of sending it. This can be used instead of endheaders() if the intention is to subsequently send more data. xmlrpclib then uses this method of framing its POST messages. 2) We provide the named artgument bufsize=0 to the HTTPResponse class in httplib, and also so on for the HTTPConnection.getresponse() and HTTP.getreply(). xmlrpclib then passes bufsize=-1 to HTTP.getreply() to get default buffering for the response fileobject. This patch focuses on fixing the problems with xmlrpclib. but issue 1) above also has a number of other manifestations in the lib, there are other places where we can use getheaderdata() and send() instead of endheaders() to avoid possible Nagle/Ack problems, e.g. in urllib.py, distutils.py and loggin/handlers.py
msg76174 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-21 10:52
Just a thought here: Maybe it would be better just to change socket._fileobject to always use a minimum of 8k readbuffer for readline() just as read() already does. The read() documentation states that recv(1) is very inefficient, and so it is. The intent, when calling makefile(bufsize=0) is probably to make sure that buffering for write() is disabled. Any thoughts?
msg76343 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-11-24 17:32
This patch makes sense in principle, but some of the details need to change. The _send_output() method is used by some clients, merely because it can be used :-(. It's fairly easy to preserve this API for backwards compatibility. I am also worried about this new api call getheaderdata(). It complicates the API. Even if it were necessary, it needs a better name, because getheaderdata() doesn't sound like a method that changes the connection state or consumes buffered header data. I think it would be better to have the new behavior exposed only through HTTPConnection and not HTTP, since that's a Python 1.5.2 compatibility API(!). We can make some small changes to xmlrpclib to use the newer API. It would probably be a good change merely because py3k now uses the newer API. I've got a working local change, but it's still a little ugly.
msg76346 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-11-24 17:47
Just wanted to mention that the best solution is to update as much code as possible to use HTTPConnection instead of HTTP. I'm not sure how easy it is to do for xmlrpclib, since it exposes methods like send_content(). I guess we can preserve those APIs somehow, but it won't be pretty.
msg76349 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-11-24 17:59
Actually, the patch doesn't work :-). The httplib API allows you to pass a file or a string for the body of the request. The getheaderdata() + body case only works for the string. Easy to fix, but I worry that he still run into Nagle problems with the file case, which was presumably added for efficiency. Jeremy On Mon, Nov 24, 2008 at 12:32 PM, Jeremy Hylton <report@bugs.python.org> wrote: > > Jeremy Hylton <jeremy@alum.mit.edu> added the comment: > > This patch makes sense in principle, but some of the details need to > change. The _send_output() method is used by some clients, merely > because it can be used :-(. It's fairly easy to preserve this API for > backwards compatibility. > > I am also worried about this new api call getheaderdata(). It > complicates the API. Even if it were necessary, it needs a better name, > because getheaderdata() doesn't sound like a method that changes the > connection state or consumes buffered header data. > > I think it would be better to have the new behavior exposed only through > HTTPConnection and not HTTP, since that's a Python 1.5.2 compatibility > API(!). We can make some small changes to xmlrpclib to use the newer > API. It would probably be a good change merely because py3k now uses > the newer API. > > I've got a working local change, but it's still a little ugly. > > ---------- > assignee: -> jhylton > nosy: +jhylton > status: open -> pending > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4336> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu > >
msg76406 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-25 11:13
I have addressed some issues mentioned: 1) I have retained the _send_output() method. 2) the endheaders() method now takes an optional argument, send_data that defaults to True. It also returns any unsent data as a string. This simplifies any code wishing to send more data. 3) The old HTTP class needs no changes anymore. 4) I've fixed the test_xmlrpc.py test case to run all the tests on windows. The old concern with "WSAEWOULDBLOCK" seems to be no longer valid. 5) concatenating the result from endheaders() with the request body is valid, in xmlrpclib, because the assumption has already been made in send_content() that request_body is a string (str(len(request_body)). However, in httplib's request() method, we now special case this. I don't want to complicate the code for what appears to be a rare case. I chose not to mess with xmlrpclib and make it continue to use the old HTTP class in order to minimise the extent of this patch. A simple and clear performance patch has in my experience a much greater chance of finding its way into the codebase than a more extensive rewrite :) You may have concerns regarding point 3 above, and I am open to suggestions. Now, what remains is the question of the read buffering for the socket fileobject. Do you think that it is feasible to simply change the default read buffering for fileobjects to use buffering for readline() same as they do for read()? Then we wouldn't need the second half of this patch and we could make a separate "socket" performance patch.
msg76407 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-25 11:15
Sorry, I meant : "you may have concerns regarding point 2) above"
msg76489 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-11-26 21:53
I think we're making progress, but I'm still not sure about the new httplib api. My current worry is that endheaders() behaves very differently when send_data is false. My chief concern is that the __state variable is going to indicate that the request has been sent when we're really depending on the client to send the header. In general, I don't like the public send() method since it's exposing a raw socket that could be called at any time regardless of the _state of the object. What if endheaders() takes body as an optional argument and sends it along with the headers? This accomplishes the same basic goal as returning the header from endheaders(), but keeps the actual sending of data encapsulated within the http connection. Jeremy On Tue, Nov 25, 2008 at 6:13 AM, Kristján Valur Jónsson <report@bugs.python.org> wrote: > > Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment: > > I have addressed some issues mentioned: > 1) I have retained the _send_output() method. > 2) the endheaders() method now takes an optional argument, send_data > that defaults to True. It also returns any unsent data as a string. > This simplifies any code wishing to send more data. > 3) The old HTTP class needs no changes anymore. > 4) I've fixed the test_xmlrpc.py test case to run all the tests on > windows. The old concern with "WSAEWOULDBLOCK" seems to be no longer > valid. > 5) concatenating the result from endheaders() with the request body is > valid, in xmlrpclib, because the assumption has already been made in > send_content() that request_body is a string (str(len(request_body)). > However, in httplib's request() method, we now special case this. I > don't want to complicate the code for what appears to be a rare case. > > I chose not to mess with xmlrpclib and make it continue to use the old > HTTP class in order to minimise the extent of this patch. A simple and > clear performance patch has in my experience a much greater chance of > finding its way into the codebase than a more extensive rewrite :) > > You may have concerns regarding point 3 above, and I am open to > suggestions. > > Now, what remains is the question of the read buffering for the socket > fileobject. Do you think that it is feasible to simply change the > default read buffering for fileobjects to use buffering for readline() > same as they do for read()? Then we wouldn't need the second half of > this patch and we could make a separate "socket" performance patch. > > Added file: http://bugs.python.org/file12127/xmlrpc.patch > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4336> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu > >
msg76502 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-27 16:06
I like the suggestion of having endheaders() accept any extra data. I have uploaded a new patch which implements this idea. It simplifies things a lot. The issue with the read buffer size is still open. I have sent an email to python-dev requesting comments.
msg76523 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-28 10:53
Guido pointed out the problem with _fileobject.readline() being followed by socket.recv() if readline uses read buffering. xmlrpclib.py was attempting to directly use the underlying socket, although in actual fact it never did, (since HTTPConnectio had closed it when it returned the response from getresponse()) Never the less, it is prudent to make sure that we don't attempt this. There really should be no need to use the socket directly, a buffered read() call is just as efficient.
msg76578 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-11-29 01:52
I did the simple part of the patch, where the request and headers are sent at the same time. The applied patch didn't pass the test suite, and I want to think about the buffering change a bit more. It's definitely tricky. Jeremy On Fri, Nov 28, 2008 at 5:53 AM, Kristján Valur Jónsson <report@bugs.python.org> wrote: > > Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment: > > Guido pointed out the problem with _fileobject.readline() being > followed by socket.recv() if readline uses read buffering. > xmlrpclib.py was attempting to directly use the underlying socket, > although in actual fact it never did, (since HTTPConnectio had closed > it when it returned the response from getresponse()) Never the less, it > is prudent to make sure that we don't attempt this. > There really should be no need to use the socket directly, a buffered > read() call is just as efficient. > > Added file: http://bugs.python.org/file12145/xmlrpc.patch > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4336> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu > >
msg76682 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-01 10:12
Really? Works for me. Here is a simple patch where we leave the read buffering. passes test_xmlrpclib.py and test_httplib.py
msg76697 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-12-01 15:26
I submitted r67442, which combines the headers and body in a single send() call. We should look at the buffering issue now, although I probably won't have any time to check on it until Friday.
msg77024 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-05 15:06
I think it would have been better to have endheaders() (and then perhaps _send_output()) deal with the non-string (i.e. filebuffer) case, so that endheaders(body) is semantically equivalent to endheaders(); send(body). The version you checked in makes it necessary that the user of HTTPConnection.endheaders() is aware of the distinction.
msg77309 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-08 15:33
Please consider applying this patch. It moves the logic handling of non-string message_body inside of endheaders(), allowind clients of HTTPConnection to substitute endheader();send(data) with endheaders(data) without fear.
msg77310 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-08 15:34
In light of my previous patch, here is a an improvement to urllib.py and logging that uses endheaders(body)
msg79407 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-08 10:49
Reopening this, since I am not entirely happy with the implementation that was checked in. Please see my comments from Dec 5th onwards.
msg79502 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-09 20:27
Checked in revision: 68458 and revision: 68459
msg80153 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-19 10:29
note, this has been ported to Py3k in http://svn.python.org/view? view=rev&rev=68458
History
Date User Action Args
2022-04-11 14:56:41 admin set github: 48586
2009-01-19 10:29:43 kristjan.jonsson set keywords:patch, patch, easymessages: +
2009-01-09 20:27:40 kristjan.jonsson set status: open -> closedresolution: acceptedmessages: + keywords:patch, patch, easy
2009-01-08 10:49:01 kristjan.jonsson set status: pending -> openkeywords:patch, patch, easymessages: +
2008-12-08 15:34:59 kristjan.jonsson set keywords:patch, patch, easyfiles: + htmllibclients.patchmessages: +
2008-12-08 15:33:56 kristjan.jonsson set keywords:patch, patch, easyfiles: + httplib.patchmessages: +
2008-12-05 15:06:39 kristjan.jonsson set keywords:patch, patch, easymessages: +
2008-12-01 15:26:20 jhylton set keywords:patch, patch, easymessages: +
2008-12-01 10:32:02 hodgestar set nosy: + hodgestar
2008-12-01 10:13:03 kristjan.jonsson set keywords:patch, patch, easyfiles: + xmlrpc_1.patchmessages: +
2008-11-29 01:52:25 jhylton set messages: +
2008-11-28 10:53:26 kristjan.jonsson set keywords:patch, patch, easyfiles: + xmlrpc.patchmessages: +
2008-11-27 16:06:56 kristjan.jonsson set keywords:patch, patch, easyfiles: + xmlrpc.patchmessages: +
2008-11-26 21:53:58 jhylton set messages: +
2008-11-25 11:15:33 kristjan.jonsson set keywords:patch, patch, easymessages: +
2008-11-25 11:13:39 kristjan.jonsson set keywords:patch, patch, easyfiles: + xmlrpc.patchmessages: +
2008-11-24 18:02:56 vstinner set keywords:patch, patch, easynosy: + vstinner
2008-11-24 17:59:11 jhylton set messages: +
2008-11-24 17:47:28 jhylton set keywords:patch, patch, easymessages: +
2008-11-24 17:32:14 jhylton set status: open -> pendingassignee: jhyltonmessages: + keywords:patch, patch, easynosy: + jhylton
2008-11-21 10:52:53 kristjan.jonsson set keywords:patch, patch, easymessages: +
2008-11-17 14:33:40 kristjan.jonsson create