waling a directory with very many files

Hrvoje Niksic hniksic at xemacs.org
Mon Jun 15 07:47:08 EDT 2009


Terry Reedy <tjreedy at udel.edu> writes:

> You did not specify version.  In Python3, os.walk has become a
> generater function.  So, to answer your question, use 3.1.

os.walk has been a generator function all along, but that doesn't help
OP because it still uses os.listdir internally.  This means that it
both creates huge lists for huge directories, and holds on to those
lists until the iteration over the directory (and all subdirectories)
is finished.

In fact, os.walk is not suited for this kind of memory optimization
because yielding a *list* of files (and a separate list of
subdirectories) is specified in its interface.  This hasn't changed in
Python 3.1:

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs



More information about the Python-list mailing list