Limitation of os.walk

Tim Chase python.list at tim.thechases.com
Tue May 11 22:38:02 EDT 2010


  05/11/2010 09:07 PM, Terry Reedy wrote:
>> PS: I never understood why os.walk does not support hooks for key
>> events during such a tree traversal.
>
> Either 1) it is intentionally simple, with the expectation that people
> would write there own code for more complicated uses or 2) no one has
> submitted a 'full-featured' version or 3) both.
>
> If os.walk were rewritten, it should be as an iterator (generator).
> Directory entry and exit functions could still be added as params.

It *is* an iterator/generator.  However, I suspect you mean that 
it should slurp the dirs/files iteratively instead of using 
listdir() as was discussed on c.l.p a few months back.

The patch to os.py would be fairly simple, something like
--------------------------------------------
--- /usr/lib/python2.5/os.py
+++ ~/tmp/os.py
@@ -220,7 +220,7 @@

  __all__.extend(["makedirs", "removedirs", "renames"])

-def walk(top, topdown=True, onerror=None):
+def walk(top, topdown=True, onerror=None, pre=None, post=None):
      """Directory tree generator.

      For each directory in the directory tree rooted at top 
(including top
@@ -296,15 +296,19 @@
          else:
              nondirs.append(name)

+    if pre is not None:
+        top, dirs, nondirs = pre(top, dirs, nondirs)
      if topdown:
          yield top, dirs, nondirs
      for name in dirs:
          path = join(top, name)
          if not islink(path):
-            for x in walk(path, topdown, onerror):
+            for x in walk(path, topdown, onerror, pre=pre, 
post=post):
                  yield x
      if not topdown:
          yield top, dirs, nondirs
+    if post is not None:
+        post(top, dirs, nondirs)

--------------------------------------------

which would allow you to do things like

   def pre(top, dirs, nondirs):
       dirs = [d for d in dirs
               if d not in ('RCS', '.svn', '.hg', '.git')]
       return top, dirs, nondirs
   def post(top, dirs, nondirs):
       complex_process(top)
   for top, dirs, nondirs in my.walk(PATH, pre=pre, post=post):
       do_stuff(...)

I suspect if I thought about it much longer, only one would 
really be needed, the other accommodated by the "topdown" parameter.

-tkc






More information about the Python-list mailing list