filter a list of strings

Jussi Piitulainen harvesting at is.invalid
Thu Dec 3 01:32:49 EST 2015


<c.buhtz at posteo.jp> writes:

> I would like to know how this could be done more elegant/pythonic.
>
> I have a big list (over 10.000 items) with strings (each 100 to 300
> chars long) and want to filter them.
>
> list = .....
>
> for item in list[:]:
>   if 'Banana' in item:
>      list.remove(item)
>   if 'Car' in item:
>      list.remove(item)
>
> There are a lot of more conditions of course. This is just example
> code.  It doesn't look nice to me. To much redundance.

Yes. The initial copy is redundant and the repeated .remove calls are
not only expensive (quadratic time loop that could have been linear),
they are also incorrect if there are duplicates in the list. You want to
copy and filter in one go:

list = ...
list = [ item for item in list
         if ( 'Banana' not in item and
              'Car' not in item ) ]

It's better to use another name, since "list" is the name of a built-in
function. It may be a good idea to define a complex condition as a
separate function:

def isbad(item):
    return ( 'Banana' in item or
             'Car' in item )

def isgood(item)
    return not isbad(item)

items = ...
items = [ item for item in items if isgood(item) ]

Then there's also filter, which is easy to use now that the condition is
already a named function:

items = list(filter(isgood, items))

> btw: Is it correct to iterate over a copy (list[:]) of that string
> list and not the original one?

I think it's a good idea to iterate over a copy if you are modifying the
original during the iteration, but the above suggestions are better for
other reasons.



More information about the Python-list mailing list