urllib slow on FreeBSD 4.7? sockets too

Sat Nov 23 05:49:03 EST 2002

> I'm not sure it's a good experiment to eliminate too many things at once.
> I.e., how do you know how much you gained by going to os.read and how much
> you gained by buffering via bytea.append instead of having urllib do it
internally?

If I use a string or buffer object with "+=" instead of an array,
performance drops significantly:

import urllib, time, os
starttime = time.time()
u = urllib.urlopen('http://localhost/4MBfile')
fn = u.fp.fileno()
bytes = 1
allbytes = ''
while bytes:
    bytes = os.read(fn, 16 * 1024)
    allbytes += bytes
u.close()
endtime = time.time()
elapsed = endtime - starttime
length = len(allbytes)
print "bytes: %.1fK; time: %0.3fs (%0d KB/s)" % (length / 1024.0, elapsed,
length / 1024.0 / elapsed )

bytes: 4241.5K; time: 5.809s (730 KB/s)

If I use cStringIO, it's much better, but still about 2 to 5 MB/s slower
than Jarkko's byte array:

import urllib, time, os, cStringIO
starttime = time.time()
u = urllib.urlopen('http://localhost/4MBfile')
fn = u.fp.fileno()
bytes = 1
allbytes = cStringIO.StringIO()
while bytes:
    bytes = os.read(fn, 16 * 1024)
    allbytesf.write(bytes)
u.close()
allbytes = allbytesf.getvalue()
allbytesf.close()
endtime = time.time()
elapsed = endtime - starttime
length = len(allbytes)
print "bytes: %.1fK; time: %0.3fs (%0d KB/s)" % (length / 1024.0, elapsed,
length / 1024.0 / elapsed )

bytes: 4241.5K; time: 0.419s (10127 KB/s)

So the byte array.append() approach with ''.join() afterward (which is
surprisingly fast) seems to be the clear winner. On to the sockets...