Complex sort on big files

Peter Otten __peter__ at web.de
Mon Aug 1 13:00:03 EDT 2011


aliman wrote:

> Apologies I'm sure this has been asked many times, but I'm trying to
> figure out the most efficient way to do a complex sort on very large
> files.
> 
> I've read the recipe at [1] and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and write
> sorted chunks to disk, then use heapq.merge to combine the chunks as
> you read them.
> 
> What I'm having trouble figuring out is what to do when I want to sort
> by one key ascending then another key descending (a "complex sort").
> 
> I understand that sorts are stable, so I could just repeat the whole
> sort process once for each key in turn, but that would involve going
> to and from disk once for each step in the sort, and I'm wondering if
> there is a better way.
> 
> I also thought you could apply the complex sort to each chunk before
> writing it to disk, so each chunk was completely sorted, but then the
> heapq.merge wouldn't work properly, because afaik you can only give it
> one key.

You can make that key as complex as needed:

>>> class Key(object):
...     def __init__(self, obj):
...             self.asc = obj[1]
...             self.desc = obj[2]
...     def __cmp__(self, other):
...             return cmp(self.asc, other.asc) or -cmp(self.desc, 
other.desc)
...
>>> sorted(["abc", "aba", "bbb", "aaa", "aab"], key=Key)
['aab', 'aaa', 'abc', 'bbb', 'aba']

See also

http://docs.python.org/library/functools.html#functools.total_ordering



More information about the Python-list mailing list