[Python-Dev] Python3 regret about deleting list.sort(cmp=...)

Daniel Stutzbach stutzbach at google.com
Sun Mar 13 19:05:25 CET 2011


On Sat, Mar 12, 2011 at 3:44 PM, Guido van Rossum <guido at python.org> wrote:

> I recently advised a Googler who was sorting a large dataset and
> running out of memory. My analysis of the situation was that he was
> sorting a huge list of short lines of the form "shortstring,integer"
> with a key function that returned a tuple of the form ("shortstring",
> integer).


As Raymond pointed out, a change I made for 3.2 significantly shrinks the
memory footprint of sorting with a key (although it's still more
memory-intensive than sorting with cmp).

He could reduce the memory footprint further by sorting in two passes
instead of using a tuple, leveraging the fact that Python guarantees a
stable sort.  In 3.2 or later, this technique will require roughly twice as
much memory as just storing the list:

biglist.sort(key=lambda s: int(s.split(',')[1]))  # Sort by the integer
biglist.sort(key=lambda s: s.split(',')[0])  # Sort by the shortstring

I think the use cases are pretty narrow where there's plenty of memory for
storing the list but not enough to store two copies.

-- 
Daniel Stutzbach
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110313/93fb7a0f/attachment.html>


More information about the Python-Dev mailing list