[Python-Dev] very bad network performance

Gregory P. Smith greg at krypto.org
Mon Apr 21 20:10:24 CEST 2008


On Mon, Apr 14, 2008 at 4:41 PM, Curt Hagenlocher <curt at hagenlocher.org>
wrote:

> On Mon, Apr 14, 2008 at 4:19 PM, Guido van Rossum <guido at python.org>
> wrote:
> >
> > But why was imaplib apparently specifying 10MB? Did it know there was
> > that much data? Or did it just not want to bother looping over all the
> > data in smaller buffer increments (e.g. 64K, which is probably the max
> > of what most TCP stacks will give you)?
>
> I'm going to guess that the code in question is
>
>                size = int(self.mo.group('size'))
>                if __debug__:
>                    if self.debug >= 4:
>                        self._mesg('read literal size %s' % size)
>                data = self.read(size)
>
> It's reading however many bytes are reported by the server as the size.
>
> > If I'm right with my hunch that the TCP stack will probably clamp at
> > 64K, perhaps we should use min(system limit, max(requested size,
> > buffer size))?
>
> I have indeed missed the point of the read buffer size.  This would work.
>

The 64K hunch is wrong.  The system limit can be found using
getsockopt(...SO_RCVBUF...).  It can easily be (and often is) set to many
megabytes either at a system default level or on a per socket level by the
user using setsockopt.  When the system default is that large, limiting by
the system limit would not help the 10mb read case.

Even smaller allocations like 64K cause problems as mentioned in issue
1092502 linking to this twisted
http://twistedmatrix.com/trac/ticket/1079bug.  twisted's solution was
to make the string object returned by a recv as
short lived as possible by copying it into a StringIO.  We could do the same
in _fileobject.read() and readline().

I have attached a patch to issue 2632 that changes socket to use StringIO
for its read buffer and keeps the lifetime of strings returned by recv() as
short as possible when appropriate.  It also refuses to call recv() with a
size smaller than default_bufsize within read() [the source of the
performance problem].  That changes internal recv() call behavior over the
existing code after the issue 1092502 "fix" was applied to use min() rather
than max(), but it is -not- a significant change over the pre-1092502 "fix"
behavior that exists in all released versions of python (it already chose
the larger of two values for recv sizes).

The main difference behind the scenes?  StringIO is using realloc only to
increase its size while recv() was using realloc to shrink the allocation
size and many of these recv()ed shrunken strings were being held onto in a
list before the final value was constructed.

I suggest continuing the discussion within issue 2632 to keep better track
of it.

My socket-strio patch in 2632 needs more testing (it passed socket, http*
and url* tests) and verification that both issue's problems are indeed gone
but they should be.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20080421/6f8c4938/attachment.htm 


More information about the Python-Dev mailing list