[Tutor] Sorting large numbers of co-ordinate pairs

Lie Ryan lie.1296 at gmail.com
Fri Mar 13 07:57:03 CET 2009


Dinesh B Vadhia wrote:
> Have a large number (> 1bn) of integer co-ordinates (i, j).  The i are 
> ordered and the j unordered.
>  
> I want to create (j, i) with j ordered and i unordered ie.
>  
> from:
>  
> ...
> 6940, 22886
> 6940, 38277
> 6940, 43788
> ...
>  
> to:
> ...
> 38277, 567
> 38277, 90023
> 38277, 6940
> ...
>  
> I've tried the dictionary route and it works perfectly for small set of 
> co-ordinate pairs but not for large sets as it hits memory capacity.
>  
> Any ideas how I could do this?

Can you actually load the original (unsorted) data to memory?

If you can, the easiest way is to use list.sort (which is in-place sort) 
with key:

data = [...taken from somewhere...]
data.sort(key=lambda x: x[1])

if you can't load the whole thing into memory, the easiest way (without 
relying on external modules) might be to load it in fixed size chunks 
that can be loaded to memory, sort it, then write it to a temporary 
file. Next, load all of the files, then interleave them.



More information about the Tutor mailing list