Complex sort on big files

sturlamolden sturlamolden at yahoo.no
Sat Aug 6 13:53:12 EDT 2011


On Aug 1, 5:33 pm, aliman <aliman... at googlemail.com> wrote:

> I've read the recipe at [1] and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and write
> sorted chunks to disk, then use heapq.merge to combine the chunks as
> you read them.

Or just memory map the file (mmap.mmap) and do an inline .sort() on
the bytearray (Python 3.2). With Python 2.7, use e.g. numpy.memmap
instead. If the file is large, use 64-bit Python. You don't have to
process the file in chunks as the operating system will take care of
those details.

Sturla




More information about the Python-list mailing list