Optimizing list processing

MRAB python at mrabarnett.plus.com
Wed Dec 11 19:59:42 EST 2013


On 11/12/2013 23:54, Steven D'Aprano wrote:
> I have some code which produces a list from an iterable using at least
> one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
> looks something like this (simplified):
>
> table = sorted([(x, i) for i,x in enumerate(iterable)])
> table = [i for x,i in table]
>
> The problem here is that for large iterables, say 10 million items or so,
> this is *painfully* slow, as my system has to page memory like mad to fit
> two large lists into memory at once. So I came up with an in-place
> version that saves (approximately) two-thirds of the memory needed.
>
> table = [(x, i) for i,x in enumerate(iterable)]
> table.sort()

This looks wrong to me:

> for x, i in table:
>      table[i] = x

Couldn't it replace an item it'll need later on?

Let me see if I can find an example where it would fail.

Start with:

 >>> table = [('b', 0), ('a', 1)]

Sort it and you get:

 >>> table.sort()
 >>> table
[('a', 1), ('b', 0)]

Run that code:

 >>> for x, i in table:
	table[i] = x

	
Traceback (most recent call last):
   File "<pyshell#18>", line 1, in <module>
     for x, i in table:
ValueError: need more than 1 value to unpack

Why did it fail?

 >>> table
[('a', 1), 'a']

The 2 methods give different results anyway: the first returns a list
of indexes, and the second returns a list of items from the iterable.

[snip]




More information about the Python-list mailing list