[Tutor] scratching my head - still
Peter Otten
__peter__ at web.de
Wed Aug 5 09:35:09 CEST 2015
Cameron Simpson wrote:
> On 05Aug2015 12:46, Steven D'Aprano <steve at pearwood.info> wrote:
>>On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote:
>>> As seen below (closely), some filenames are not being removed while
>>> others are, such as in the first stanza, some pdfs are removed, some
>>> aren't. In the second stanza, Thumbs.db makes it through, but was caught
>>> in the first stanza. (Thanks for those who have proffered solutions to
>>> date!) I see no logic in the results. What am I missing???
>>
>>You are modifying the list of files while iterating over it, which plays
>>all sorts of hell with the process. Watch this:
> [... detailed explaination ...]
>>The lesson here is that you should never modify a list while iterating
>>over it. Instead, make a copy, and modify the copy.
>
> What Steven said. Yes indeed.
>
> Untested example suggestion:
>
> all_filenames = set(filenames)
> for filename in filenames:
> if .. test here ...:
> all_filenames.remove(filename)
> print(all_filenames)
>
> You could use a list instead of a set and for small numbers of files be
> fine. With large numbers of files a set is far faster to remove things
> from.
If the list size is manageable, usually the case for the names of files in
one directory, you should not bother about removing items. Just build a new
list:
all_filenames = [...]
matching_filenames = [name for name in all_filenames if test(name)]
If the list is huge and you expect that most items will be kept you might
try reverse iteration:
for i in reversed(range(len(all_filenames))):
name = all_filenames[i]
if test(name):
del all_filenames[i]
This avoids both copying the list and the linear search performed by
list.remove().
More information about the Tutor
mailing list