urllib slow on FreeBSD 4.7? sockets too

Andrew MacIntyre andymac at bullseye.apana.org.au
Sat Nov 23 23:15:12 EST 2002


On Sat, 23 Nov 2002, Mike Brown wrote:

> "Jarkko Torppa" <torppa at staff.megabaud.fi> wrote:
> > Seems that stdio is somehow confused, try this
> >
> > import urllib, time, os
> >
> > starttime = time.time()
> > u = urllib.urlopen('http://localhost/4m')
> > fn = u.fp.fileno()
> > bytea = [ ]
> > while 1:
> >     bytes = os.read(fn, 16 * 1024)
> >     if bytes == '':
> >         break
> >     bytea.append(bytes)
> > bytes = ''.join(bytea)
> > u.close()
>
> [...]
>
> Well, look at that...
>
> bytes: 4241.5K; time: 0.322s (13171 KB/s)
>
> That's much better. At least, it now seems to be hitting the socket speed
> cap.

I'm glad that I said (in my previous post) that realloc() _may_ be the
cause of what you're seeing, because its not.

I haven't gotten right to the bottom of the matter, however the following
patch against the 2.2.2 sources makes an enormous difference on my system:

---8<---8<---8<---8<---
*** Lib/httplib.py.orig Mon Oct  7 11:18:17 2002
--- Lib/httplib.py      Sun Nov 24 14:44:16 2002
***************
*** 210,216 ****
      # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.

      def __init__(self, sock, debuglevel=0, strict=0):
!         self.fp = sock.makefile('rb', 0)
          self.debuglevel = debuglevel
          self.strict = strict

--- 210,216 ----
      # See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.

      def __init__(self, sock, debuglevel=0, strict=0):
!         self.fp = sock.makefile('rb', -1)
          self.debuglevel = debuglevel
          self.strict = strict

---8<---8<---8<---8<---

With the 2.2.2 release source, I get about 113kB/s retrieving a 4MB file
from a localhost URL.  With the patch applied, I get 4-5.5MB/s.

This on a FreeBSD 4.4 SMP system (dual Celeron 300A, 128MB RAM) with
ATA66 drives.

The change turns the socket's file object from unbuffered, to buffered
with a default buffer size (which I believe is 1024 bytes).

I don't know what the implications of this change in other circumstances
are, so can't recommend this as a permanent patch.  There appears to be no
easy way to set this buffering option from the urllib or even httplib
APIs.

At the moment I don't have the FreeBSD library source readily accessible
to investigate the stdio (specifically fread()) implementation in the
unbuffered case.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au  | Snail: PO Box 370
        andymac at pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia





More information about the Python-list mailing list