[Tutor] Sorting large numbers of co-ordinate pairs

Kent Johnson kent37 at tds.net
Thu Mar 12 15:22:19 CET 2009


On Thu, Mar 12, 2009 at 9:43 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> Have a large number (> 1bn) of integer co-ordinates (i, j).  The i are
> ordered and the j unordered.
>
> I want to create (j, i) with j ordered and i unordered ie.
>
> from:
>
> ...
> 6940, 22886
> 6940, 38277
> 6940, 43788
> ...
>
> to:
> ...
> 38277, 567
> 38277, 90023
> 38277, 6940
> ...
>
> I've tried the dictionary route and it works perfectly for small set of
> co-ordinate pairs but not for large sets as it hits memory capacity.

I'm not sure what the dictionary route is. Holding the data in lists
might be more memory efficient but you still need enough memory to
hold one or more copies of the list.

Otherwise a file-based merge sort. I would do something like this:
- read or otherwise acquire as many coordinates as you can comfortably
reorder and sort
- reorder and sort them
- write the sorted list to a file, one coordinate pair per line
- repeat until all coordinates have been written, using a new file for
each group of coordinates

- Make an iterator (use a generator function) that will open one of
your files and yield individual coordinates.
- use this recipe: http://code.activestate.com/recipes/491285/ to sort
the coordinates, processing the result however you like (writing to a
file?)

Kent


More information about the Tutor mailing list