[Python-Dev] very bad network performance

Guido van Rossum guido at python.org
Tue Apr 15 01:19:44 CEST 2008


On Mon, Apr 14, 2008 at 3:57 PM, Ralf Schmitt <schmir at gmail.com> wrote:
>
>
>
> On Tue, Apr 15, 2008 at 12:19 AM, Curt Hagenlocher <curt at hagenlocher.org>
> wrote:
> >
> > On Mon, Apr 14, 2008 at 2:29 PM, Ralf Schmitt <schmir at gmail.com> wrote:
> > >
> > > Sorry to reply on the mailing list. But this change is wrong.
> > > e.g. if you're using a buffer size of 16 bytes and try to read 256
> bytes, it
> > > should call recv with a value of 256 and not call recv 16 times with a
> value
> > > of 16.
> > > However, there should be an upper limit (as shown by the imap bug).
> >
> > There is an upper limit.  It's called "the buffer size".  If someone
> > specifies a buffer size of 16 bytes, it means "read 16 bytes at a
> > time".  I don't know why someone would want such a small buffer size,
> > but presumably they have their reasons.
> >
>
> No, I don't agree. To me buffer size means buffer up to buffer_size bytes in
> memory.
> It does not mean that it should read only buffer_size bytes at once when
> asked to read more bytes than buffer size.
>
>  The upper limit I was talking about is the buffer size limit of the
> operating system, i.e. the operating system will at a maximum return N bytes
> from recv call. It doesn't make sense to ask for more then, and the original
> problem with imaplip asking for 10MB of data and then realloc'ing that
> buffer would be gone.
>
>
> >
> > The only reason "min" is a problem is that there's standard library
> > code passing a zero to socket.makefile, which gets turned into a
> > bufsize of 1 by the constructor.  I actually agree with Bill Janssen
> > -- __init__ is where the real problem lies.  But I think the change to
> > read() is safer.
> >
>
> again no, if I pass in 4 as buffer size, I don't expect the system to make
> 1024 calls to recv when I want to read 4096 bytes.

But why was imaplib apparently specifying 10MB? Did it know there was
that much data? Or did it just not want to bother looping over all the
data in smaller buffer increments (e.g. 64K, which is probably the max
of what most TCP stacks will give you)?

If I'm right with my hunch that the TCP stack will probably clamp at
64K, perhaps we should use min(system limit, max(requested size,
buffer size))?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list