How to make this unpickling/sorting demo faster?

Steve Bergman sbergman27 at gmail.com
Thu Apr 17 13:15:34 EDT 2008


I'm involved in a discussion thread in which it has been stated that:

"""
Anything written in a language that is > 20x slower (Perl, Python,
PHP) than C/C++ should be instantly rejected by users on those grounds
alone.
"""

I've challenged someone to beat  the snippet of code below in C, C++,
or assembler, for reading in one million pairs of random floats and
sorting them by the second member of the pair.  I'm not a master
Python programmer.  Is there anything I could do to make this even
faster than it is?

Also, if I try to write the resulting list of tuples back out to a
gdbm file, it takes a good 14 seconds, which is far longer than the
reading and sorting takes.  The problem seems to be that the 'f' flag
to gdbm.open() is being ignored and writes are being sync'd to disk
either on each write, or on close.  I'd really prefer to let the OS
decide when to actually write to disk.

I'm using python 2.5.2, libgdm 1.8.3, and python-gdbm 2.5.2 under
Ubuntu 8.4 beta and an  x86_64 architechture.

Thanks for any tips.

=====
import cPickle, gdbm, operator
dbmIn = gdbm.open('float_pairs_in.pickel')
print "Reading pairs..."
pairs = cPickle.loads(dbmIn['pairs'])
print "Sorting pairs..."
pairs.sort(key=operator.itemgetter(1))
print "Done!"
=====


The input file was created with this:

=====
import random, gdbm, cPickle
print "Creating pairs file..."
pairs = [(random.random(), random.random(),) for pair in
range(0,1000000)]
dbmOut = gdbm.open('float_pairs_in.pickel', 'nf')
dbmOut['pairs'] = cPickle.dumps(pairs, 2)
dbmOut.close()
print "Done!"
=====



More information about the Python-list mailing list