waling a directory with very many files

rkl rklein at tpg.com.au
Sun Jun 21 05:57:33 EDT 2009


On Jun 15, 2:35 am, tom <f... at thefsb.org> wrote:
> i can traverse adirectoryusing os.listdir() or os.walk(). but if adirectoryhas a very large number of files, these methods produce very
> large objects talking a lot of memory.
>
> in other languages one can avoid generating such an object by walking
> adirectoryas a liked list. for example, in c, perl or php one can
> use opendir() and then repeatedly readdir() until getting to the end
> of the file list. it seems this could be more efficient in some
> applications.
>
> is there a way to do this in python? i'm relatively new to the
> language. i looked through the documentation and tried googling but
> came up empty.

I might be a little late with my comment here.

David Beazley in his PyCon'2008 presentation "Generator Tricks
For Systems Programmers" had this very elegant example of handling an
unlimited numbers of files:


import os, fnmatch

def gen_find(filepat,top):
    """gen_find(filepat,top) - find matching files in directory tree,
                               start searching from top

    expects: a file pattern as string, and a directory path as string
    yields:  a sequence of filenames (including paths)
    """
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)


for file in gen_find('*.py', '/'):
    print file



More information about the Python-list mailing list