dirwalk.py generator version of os.path.walk
Jim Dennis
jimd at vega.starshine.org
Thu Feb 28 05:00:41 EST 2002
In article <ac677656.0202271621.5a134d44 at posting.google.com>, Tom Good wrote:
>jimd at vega.starshine.org (Jim Dennis) wrote in message news:<a5d36e$1daf$1 at news.idiom.com>...
>> This function could probably use a bit of polishing,
>> and it certainly could use some enhancement (some options to
>> control if, and how we follow symlinks, to how to handle
>> exceptions on listdir(), whether to be depth first, and an
>> option to avoid crossing mount boundaries with os.path.ismount(),
>> etc).
>> However, it seems to work.
>> dirwalk() simply takes an optional top level directory/path name
>> as an argument and instantiates a generator which will walk down
>> that tree and return every filename that it can access.
>> It's late and I need sleep. So I'm just going to post this in
>> it's rough (and probably buggy) form and let y'all thrash on it
>> a bit.
>> I guess there's some sort of statcache module that might let me
>> cache the stat() tuples. I guess I'm implicitly incurring a stat()
>> system call for each node by checking islink() and isdir() on it
>> so it seems like I ought to cache that and make it available to
>> my caller (without forcing them to make an additional stat system
>> call).
>> I hope that something like this (a simple dirwalk() or other
>> greatly simplified alternative to os.path.walk()) makes it into
>> Python 2.3 or later.
>> #!/usr/bin/env python2.2
>> from __future__ import generators
>> import os
>> def dirwalk(startdir=None):
>> if not startdir:
>> startdir="."
>> if not os.path.isdir(startdir):
>> raise ValueError ## Is this the right exception?
>> stack = [startdir]
>> while stack:
>> cwd = stack.pop(0)
>> try:
>> current = os.listdir(cwd)
>> except (OSError):
>> continue # Skip it if we don't have access
>> for each in current:
>> each = os.path.join(cwd,each)
>> if os.path.islink(each):
>> pass
>> elif os.path.isdir(each):
>> stack.append(each)
>> yield(each)
>> if __name__ == "__main__":
>> # import unittest?
>> # test suite should consist of:
>> # dirwalk() vs. os.listdir()
>> # dirwalk("/") vs. os.path.walk()
>> # dirwalk("/etc/passwd") (should raise exception)
>> import sys
>> for i in sys.argv[1:]:
>> for j in dirwalk(i):
>> print j
>> # should compare this to os.popen("find ....") and
>> # or to os.path.walk(...)
>Hi,
> I wrote a different implementation of this general concept at:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/105873
> You don't really need to keep a stack of directories and push/pop
> things, because with generators you can recurse instead.
>Tom
But recursion is likely to cost more. The only state I need
to keep is my current "todo" list of directories. A recursion
would store functional state (unless Python supported tail-end
recursion). So the append/pop (total cost, 3 lines of code)
seems like the lightest weight way to do this.
More information about the Python-list
mailing list