Optimizing list processing

Thu Dec 12 07:25:37 EST 2013

On Thu, Dec 12, 2013 at 11:08 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> P.S. The algorithm I'm working on is a way of generating index and rank
> tables. Not that it really matters -- what matters is determining whether
> or not to shift from "make a copy of the list" to "modify the list in
> place".

So you're currently looking at...

if len(table) < ?????:
    table = [i for x,i in table]
else:
    for x, i in table:
        table[i] = x

Can I throw a spanner [1] in the works with other suggestions to try timing?

table[:] = [i for x,i in table]  # Does slice assignment get optimized?

SPLIT = 1048576  # Pick some useful cutoff
for part in range(0,len(table),SPLIT):
    table[part:part+SPLIT] = [i for x,i in table[part:part+SPLIT]]

If slice assignment is reasonably fast (which I suspect it is), the
one-liner should be practically identical to your small-table
one-liner. Then if the million-record splits can be done inside
memory, it ought to be possible to do this in comparable time, even if
the total table length is huge. The choice of SPLIT would then matter
a lot less than the cutoff that you're trying to find; you might have
been able to do it in half the number of sections, but that won't make
as big a difference as suddenly paging out. Ideally, what I'd like to
see is that a higher SPLIT improves performance slightly until it gets
too high, at which point you go bust and the dealer wins... but the
critical word there is "slightly", meaning that it wouldn't cost too
much for SPLIT to be lower than it needs to be. That's the criteria
for the experiment; do you have the data on which to try it?

[1] Monkey wrench, for the Yanks in the room

ChrisA