how to remove multiple occurrences of a string within a list?

Alex Martelli aleax at mac.com
Sun Apr 8 18:02:52 EDT 2007


Ayaz Ahmed Khan <ayaz at dev.slash.null> wrote:
   ...
> I am getting varying results on my system on repeated runs.  What about
> itertools.ifilter()?

Calling itertools.ifilter returns an iterator; if you never iterate on
that iterator, that, of course, is going to be very fast (O(1), since it
does not matter how long the list you _don't_ iterate on is), but why
would you care how long it takes to do no useful work at all?

> $ python -m timeit -s "L = ['0024', 'haha', '0024']; import itertools"
"itertools.ifilter(lambda i: i != '1024', L)" 
> 100000 loops, best of 3: 5.37 usec per loop
> 
> $ python -m timeit -s "L = ['0024', 'haha', '0024']" 
> "[i for i in L if i != '0024']" 
> 100000 loops, best of 3: 5.41 usec per loop

Here are the numbers I see:

brain:~ alex$ python -m timeit -s "L = ['0024', 'haha', '0024'];
> import itertools" "itertools.ifilter(lambda i: i != '1024', L)"
1000000 loops, best of 3: 0.749 usec per loop

This is the "we're not doing any work" timing.

brain:~ alex$ python -m timeit -s "L = ['0024', 'haha', 
'0024']" "[i for i in L if i != '1024']"
1000000 loops, best of 3: 1.37 usec per loop

This is "make a list in the obvious way, excluding an item that's never
there" (like in your code's first case, I'm comparing with 1024, which
ain't there, rather than with 0024, which is).

brain:~ alex$ python -m timeit -s "L = ['0024', 'haha', 
'0024']; import itertools" "list(itertools.ifilter(lambda i: i !=
'1024', L))"
100000 loops, best of 3: 6.18 usec per loop

This is the "make it the hard way" (excluding a non-present item).
About 5/6 of the overhead comes from the list constructor.

When we exclude the item that IS there twice:

brain:~ alex$ python -m timeit -s "L = ['0024', 'haha', 
'0024']" "[i for i in L if i != '0024']"
1000000 loops, best of 3: 0.918 usec per loop

this is the "obvious way to do it",

brain:~ alex$ python -m timeit -s "L = ['0024', 'haha', 
'0024']; import itertools" "list(itertools.ifilter(lambda i: i !=
'0024', L))"
100000 loops, best of 3: 6.16 usec per loop

and this is the "hard and contorted way".


If you only want to loop, not to build a list, itertools.ifilter (or a
genexp) may be convenient (if the original list is long); but for making
lists, list comprehensions win hand-down.

Here are a few cases of "just looping" on lists of middling size:

brain:~ alex$ python -m timeit -s "L = 123*['0024', 'haha', 
'0024']" "for j in [i for i in L if i != '0024']: pass"
10000 loops, best of 3: 70 usec per loop

brain:~ alex$ python -m timeit -s "L = 123*['0024', 'haha', 
'0024']" "for j in (i for i in L if i != '0024'): pass"
10000 loops, best of 3: 70.9 usec per loop

brain:~ alex$ python -m timeit -s "L = 123*['0024', 'haha', 
'0024']; import itertools" "for j in itertools.ifilter(lambda i: i !=
'0024', L
): pass"
10000 loops, best of 3: 151 usec per loop

Here, the overhead of itertools.ifilter is only about twice that of a
genexp (or list comprehension; but the LC should cost relatively more as
L's length keeps growing, due to memory-allocation issues).

BTW, sometimes simplest is still best:

brain:~ alex$ python -m timeit -s "L = 123*['0024', 'haha', 
'0024']" "for i in L:                                  
>   if i != '0024':
>     pass"
10000 loops, best of 3: 52.5 usec per loop

I.e., when you're just looping... just loop!-)


Alex



More information about the Python-list mailing list