Iterating over files of a huge directory

Paul Rudin paul.nospam at rudin.co.uk
Mon Dec 17 12:27:13 EST 2012


Chris Angelico <rosuav at gmail.com> writes:

> On Tue, Dec 18, 2012 at 2:28 AM, Gilles Lenfant
> <gilles.lenfant at gmail.com> wrote:
>> Hi,
>>
>> I have googled but did not find an efficient solution to my
>> problem. My customer provides a directory with a huuuuge list of
>> files (flat, potentially 100000+) and I cannot reasonably use
>> os.listdir(this_path) unless creating a big memory footprint.
>>
>> So I'm looking for an iterator that yields the file names of a
>> directory and does not make a giant list of what's in.
>
> Sounds like you want os.walk. 

But doesn't os.walk call listdir() and that creates a list of the
contents of a directory, which is exactly the initial problem?

> But... a hundred thousand files? I know the Zen of Python says that
> flat is better than nested, but surely there's some kind of directory
> structure that would make this marginally manageable?
>

Sometimes you have to deal with things other people have designed, so
the directory structure is not something you can control. I've run up
against exactly the same problem and made something in C that
implemented an iterator.

It would probably be better if listdir() made an iterator rather than a
list.



More information about the Python-list mailing list