List problem
Alex Martelli
aleaxit at yahoo.com
Sun Oct 31 17:45:24 EST 2004
Bengt Richter <bokr at oz.net> wrote:
...
> > a[:] = [x for x in a if x != 2]
> >
> >is more concise, idiomatic, and speedy. No reason to do low-level
> >twiddling with indices and fiddly loops, in cases where a list
> >comprehension can do the job so simply.
> >
> Except maybe if the list is huge, and you don't want to generate a
duplicate, e.g.,
...
> Should also be faster for large lists, IWT.
'shud' is a 4-letter word -- i prefer to measure. Of course, it does
depend what you mean by large/huge -- I hope 10 million items is enough;
and what you're doing exactly -- I've kept 'deleting item 2' as it was
the example that had been used so far.
This is the module for measurement:
def delsome_br(n=1000000, todel=[2]):
a = range(n)
i = 0
for x in a:
if x not in todel:
a[i] = x
i += 1
del a[i:]
return a
def delsome_am(n=1000000, todel=[2]):
a = range(n)
a = [x for x in a if x not in todel]
and these are the numbers I see:
kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_br()'
10 loops, best of 3: 4.44 sec per loop
kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_am()'
10 loops, best of 3: 3.92 sec per loop
Python 2.4b1, of course -- somebody SO interested in performance, as to
use those six delicate, fragile low-level lines, instead of the solid,
no-space-for-bugs list comprehension, surely WANTS the performance
pluses of 2.4, anyway;-).
I'm sure platforms and tasks can be devised that make delsome_br faster.
However, personally, I'm not surprised that, on a task I tried to code
as impartially as possible, the "should be faster" fussy, low-level
solution ends up taking _longer_ than the simpler, higher-level one;
it's not always that way, but it matches my general Python experience...
"simple is good, simple is fast, high-level is simple" memes;-).
The tasks ARE a bit time-consuming, as you see (30 times 4+ seconds is
over a couple of minutes per variation...), so I'm not going to dwell
much deeper, but, of course, that's just why I posted everything -- so
ingenious people can sweat it out and find a way to show the lower level
version as faster, with the appropriate combination of platform and
tasks. Some suggestions...:
The trick would be to hit just the right size of list that will make the
working set of the LC exceed available physical memory, while leaving
the low-level approach with a working set that still fits; the thrashing
will then kill the performance of one, but not the other. To avoid
having to aim for many tens of millions of items, a machine with scarce
memory or very ineffective VM algorithms would be better -- some old PC
with 64M of RAM and win/98 might be ideal, for example. Unfortunately I
don't have any machine with less than 640 M any more, even the laptops
(RAM is so cheap, and SO precious to performance, that I can't justify
stinting on THAT, of all items!), so I can't really help much there.
Alex
More information about the Python-list
mailing list