Complex sort on big files

Roy Smith roy at panix.com
Fri Aug 5 22:54:05 EDT 2011


Wow.

I was going to suggest using the unix command-line sort utility via 
popen() or subprocess.  My arguments were that it's written in C, has 30 
years of optimizing in it, etc, etc, etc.  It almost certainly has to be 
faster than anything you could do in Python.

Then I tried the experiment.  I generated a file of 1 million random 
integers in the range 0 to 5000.  I wrote a little sorting program:

numbers = [int(line) for line in open('numbers')]
numbers.sort()
for i in numbers:
    print i

and ran it on my MacBook Pro (8 Gig, 2 x 2.4 GHz cores), Python 2.6.1.

$ time ./sort.py  > py-sort
real  0m2.706s
user  0m2.491s
sys   0m0.057s

and did the same with the unix utility:

$ time sort -n numbers  > cli-sort
real  0m5.123s
user  0m4.745s
sys   0m0.063s

Python took just about half the time.  Certainly knocked my socks off.  
Hard to believe, actually.



More information about the Python-list mailing list