[Python-Dev] very bad network performance (original) (raw)

Gregory P. Smith greg at krypto.org
Mon Apr 21 20:10:24 CEST 2008


On Mon, Apr 14, 2008 at 4:41 PM, Curt Hagenlocher <curt at hagenlocher.org> wrote:

On Mon, Apr 14, 2008 at 4:19 PM, Guido van Rossum <guido at python.org> wrote: > > But why was imaplib apparently specifying 10MB? Did it know there was > that much data? Or did it just not want to bother looping over all the > data in smaller buffer increments (e.g. 64K, which is probably the max > of what most TCP stacks will give you)?

I'm going to guess that the code in question is size = int(self.mo.group('size')) if debug: if self.debug >= 4: self.mesg('read literal size %s' % size) data = self.read(size) It's reading however many bytes are reported by the server as the size. > If I'm right with my hunch that the TCP stack will probably clamp at > 64K, perhaps we should use min(system limit, max(requested size, > buffer size))? I have indeed missed the point of the read buffer size. This would work.

The 64K hunch is wrong. The system limit can be found using getsockopt(...SO_RCVBUF...). It can easily be (and often is) set to many megabytes either at a system default level or on a per socket level by the user using setsockopt. When the system default is that large, limiting by the system limit would not help the 10mb read case.

Even smaller allocations like 64K cause problems as mentioned in issue 1092502 linking to this twisted http://twistedmatrix.com/trac/ticket/1079bug. twisted's solution was to make the string object returned by a recv as short lived as possible by copying it into a StringIO. We could do the same in _fileobject.read() and readline().

I have attached a patch to issue 2632 that changes socket to use StringIO for its read buffer and keeps the lifetime of strings returned by recv() as short as possible when appropriate. It also refuses to call recv() with a size smaller than default_bufsize within read() [the source of the performance problem]. That changes internal recv() call behavior over the existing code after the issue 1092502 "fix" was applied to use min() rather than max(), but it is -not- a significant change over the pre-1092502 "fix" behavior that exists in all released versions of python (it already chose the larger of two values for recv sizes).

The main difference behind the scenes? StringIO is using realloc only to increase its size while recv() was using realloc to shrink the allocation size and many of these recv()ed shrunken strings were being held onto in a list before the final value was constructed.

I suggest continuing the discussion within issue 2632 to keep better track of it.

My socket-strio patch in 2632 needs more testing (it passed socket, http* and url* tests) and verification that both issue's problems are indeed gone but they should be.

-gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20080421/6f8c4938/attachment.htm



More information about the Python-Dev mailing list