dirwalk.py generator version of os.path.walk
Tom Good
Tom_Good1 at excite.com
Wed Feb 27 19:21:39 EST 2002
jimd at vega.starshine.org (Jim Dennis) wrote in message news:<a5d36e$1daf$1 at news.idiom.com>...
> This function could probably use a bit of polishing,
> and it certainly could use some enhancement (some options to
> control if, and how we follow symlinks, to how to handle
> exceptions on listdir(), whether to be depth first, and an
> option to avoid crossing mount boundaries with os.path.ismount(),
> etc).
>
> However, it seems to work.
>
> dirwalk() simply takes an optional top level directory/path name
> as an argument and instantiates a generator which will walk down
> that tree and return every filename that it can access.
>
> It's late and I need sleep. So I'm just going to post this in
> it's rough (and probably buggy) form and let y'all thrash on it
> a bit.
>
> I guess there's some sort of statcache module that might let me
> cache the stat() tuples. I guess I'm implicitly incurring a stat()
> system call for each node by checking islink() and isdir() on it
> so it seems like I ought to cache that and make it available to
> my caller (without forcing them to make an additional stat system
> call).
>
> I hope that something like this (a simple dirwalk() or other
> greatly simplified alternative to os.path.walk()) makes it into
> Python 2.3 or later.
>
> #!/usr/bin/env python2.2
> from __future__ import generators
> import os
>
> def dirwalk(startdir=None):
> if not startdir:
> startdir="."
> if not os.path.isdir(startdir):
> raise ValueError ## Is this the right exception?
> stack = [startdir]
> while stack:
> cwd = stack.pop(0)
> try:
> current = os.listdir(cwd)
> except (OSError):
> continue # Skip it if we don't have access
> for each in current:
> each = os.path.join(cwd,each)
> if os.path.islink(each):
> pass
> elif os.path.isdir(each):
> stack.append(each)
> yield(each)
>
> if __name__ == "__main__":
> # import unittest?
> # test suite should consist of:
> # dirwalk() vs. os.listdir()
> # dirwalk("/") vs. os.path.walk()
> # dirwalk("/etc/passwd") (should raise exception)
> import sys
> for i in sys.argv[1:]:
> for j in dirwalk(i):
> print j
> # should compare this to os.popen("find ....") and
> # or to os.path.walk(...)
Hi,
I wrote a different implementation of this general concept at:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/105873
You don't really need to keep a stack of directories and push/pop
things, because with generators you can recurse instead.
Tom
More information about the Python-list
mailing list