Optimizing list processing
Terry Reedy
tjreedy at udel.edu
Wed Dec 11 21:26:51 EST 2013
On 12/11/2013 6:54 PM, Steven D'Aprano wrote:
> I have some code which produces a list from an iterable using at least
> one temporary list, using a Decorate-Sort-Undecorate idiom.
It is a non-standard version thereof, as DSU usually means to decorate
with a key that gets discarded.
A couple of examples of input and expected output would have been good ;-).
> The algorithm looks something like this (simplified):
>
> table = sorted([(x, i) for i,x in enumerate(iterable)])
This makes two temporaries when only one is needed, and shows why we
added generator expressions.
table = sorted((x, i) for i,x in enumerate(iterable))
is equivalent to the table; table.sort lines below.
The following (untested) should be faster as it avoids tuple unpacking
and repacking.
from itertools import count
table = sorted(t for t in zip(iterable, count))
> table = [i for x,i in table]
Did your original un-simplified use zip instead enumerate? Table now has
the original index of the items in iterable sorted by the items value.
The use for this is not obvious.
> The problem here is that for large iterables, say 10 million items or so,
> this is *painfully* slow, as my system has to page memory like mad to fit
> two large lists into memory at once. So I came up with an in-place
> version that saves (approximately) two-thirds of the memory needed.
With 1/3 saved by using a genex, it saves 1/2 of the remainder.
> table = [(x, i) for i,x in enumerate(iterable)]
> table.sorted()
(You meant table.sort().) This is an expansion of sorted(genex). It
might be slightly faster as list comp may be faster than list(equivalent
genex).
> for x, i in table:
> table[i] = x
I cannot understand what you are aiming for here. Besides the fact that
this does not work, it keeps x values rather than i indexes as before.
> table = [i for x,i in table]
done in place is
for j, (x,i) in enumerate(table):
table[j] = i
I expect the list comp be faster than in-place as long as the result
list can be allocated and held in memory without paging. (This of course
depends on system memory and other memory uses.) List iterators have a
__length_hint__ method giving the length of the underlying list, so the
list comp result list can potentially be allocated once and then filled
in by enumeration and replacement, but in C rather than Python code.
--
Terry Jan Reedy
More information about the Python-list
mailing list