[Python-Dev] os.path.walk() lacks 'depth first' option

Noah Spurrier noah@noah.org
Sun, 20 Apr 2003 01:20:57 -0700


Hello,

Recently I realized that there is no easy way to
walk a directory tree and rename each directory and file.
The standard os.path.walk() function does a breadth first walk.
This makes it hard to write scripts that modify directory names
as they walk the tree because you need to visit subdirectories before
you rename their parents. What is really needed is a depth first walk.
For example this naive code would not work with breadth first walk:

     """Renames all directories and files to lower case."""
     import os.path

     def visit (arg, dirname, names):
         for name in names:
             print os.path.join (dirname, name)
             oldname = os.path.join (dirname, name)
             newname = os.path.join (dirname, name.lower())
             os.rename (oldname, newname)

     os.path.walk ('.', visit, None)

The library source posixpath.py defined os.path.walk on my system.
A comment in that file mentions that the visit function may
modify the filenames list to impose a different order of visiting,
but this is not possible as far as I can tell.

Perhaps future versions of Python could include an option
to do a depth first walk instead of the default breadth first.
Modifying os.path.walk() to allow for optional depth first
walking is simple. I have attached a patch to posixpath.py
that demonstrates this. This adds an if conditional at
the beginning and end of the walk() function.
I have not checked to see if other platforms share the posixpath.py
module this for the walk() function, but if there
is interest then I'd be happy to cross reference this.

Yours,
Noah

*** posixpath.py 2003-04-19 22:26:08.000000000 -0700
--- posixpath_walk_depthfirst.py 2003-04-19 22:12:48.000000000 -0700
***************
*** 259,265 ****
   # The func may modify the filenames list, to implement a filter,
   # or to impose a different order of visiting.

! def walk(top, func, arg):
       """Directory tree walk with callback function.

       For each directory in the directory tree rooted at top (including top
--- 259,265 ----
   # The func may modify the filenames list, to implement a filter,
   # or to impose a different order of visiting.

! def walk(top, func, arg, depthfirst=False):
       """Directory tree walk with callback function.

       For each directory in the directory tree rooted at top (including top
***************
*** 272,284 ****
       order of visiting.  No semantics are defined for, or required of, arg,
       beyond that arg is always passed to func.  It can be used, e.g., to pass
       a filename pattern, or a mutable object designed to accumulate
!     statistics.  Passing None for arg is common."""

       try:
           names = os.listdir(top)
       except os.error:
           return
!     func(arg, top, names)
       for name in names:
           name = join(top, name)
           try:
--- 272,287 ----
       order of visiting.  No semantics are defined for, or required of, arg,
       beyond that arg is always passed to func.  It can be used, e.g., to pass
       a filename pattern, or a mutable object designed to accumulate
!     statistics.  Passing None for arg is common. The optional depthfirst
!     argument may be set to True to walk the directory tree depth first.
!     The default is False (walk breadth first)."""

       try:
           names = os.listdir(top)
       except os.error:
           return
!     if not depthfirst:
!         func(arg, top, names)
       for name in names:
           name = join(top, name)
           try:
***************
*** 287,293 ****
               continue
           if stat.S_ISDIR(st.st_mode):
               walk(name, func, arg)
!

   # Expand paths beginning with '~' or '~user'.
   # '~' means $HOME; '~user' means that user's home directory.
--- 290,297 ----
               continue
           if stat.S_ISDIR(st.st_mode):
               walk(name, func, arg)
!     if depthfirst:
!         func(arg, top, names)

   # Expand paths beginning with '~' or '~user'.
   # '~' means $HOME; '~user' means that user's home directory.
***************
*** 416,420 ****
       return filename

   supports_unicode_filenames = False
-
-
--- 420,422 ----