Sorting in huge files

Jeremy Sanders jeremy+plusnews at jeremysanders.net
Wed Dec 8 06:51:39 EST 2004


On Tue, 07 Dec 2004 12:27:33 -0800, Paul wrote:

> I have a large database of 15GB, consisting of 10^8 entries of
> approximately 100 bytes each. I devised a relatively simple key map on
> my database, and I would like to order the database with respect to the
> key.

You won't be able to load this into memory on a 32-bit machine, even with
loads of swap. Maybe you could do this on x86-64 with lots of swap (or
loadsa memory), or other 64-bit hardware. It will be _really_ slow,
however.

Otherwise you could do an on-disk sort (not too hard with fixed-length
records), but this will require some coding. You'll probably need to do
some reading to work out which sorting algorithm accesses the data less
randomly. I think the key phrase is an "external sort" rather than an
"interal sort".

It's probably easiest to load it into the thing into a database (like
PostgreSQL), to do the work for you.

Jeremy



More information about the Python-list mailing list