pythonic tree-walking idioms

Terry Reedy reedy37 at home.com
Thu May 17 12:53:19 EDT 2001


Quoting Fredrik Lundh (in part):
> os.path.walk is pretty unpythonic, imo [1]
...
> a third alternative is to forget about walk, and use a directory
> walker object instead.  here's an example from the eff-bot guide
> to the standard library:

This contrast brought me to the following realization: there are two
fundamentally different approaches to applying code blocks to items in a
structure.  One, as with os.path.walk is to package the code block into a
function and pass it down to the structure or to a function that explores
the structure.  In this approach, the current code level never sees the
items of the structure.

> for file in DirectoryWalker("."):
     # do whatever to file

The other approach,  as with Lundh's DirectoryWalker, is to bring structure
items up to the code block, which then need not be wrapped as a function.

> 1) a function that I have to look up in the reference documentation
> everytime I want to use it cannot possibly be called pythonic ;-)

DirectoryWalker() views the filesystem as a directory/file tree with files
as leaf nodes and interior directory nodes being unimportant except as
means to get to the files.  It might better have been called
FileLeafWalker().

>From the directory/file-tree view, os.path.walk(dir, walkfunc(arg, dir,
filelist), arg)
is an odd down-up hybrid.  It passes the function down only as far as the
directories and then requires that walkfunc() contain a loop to bring the
items in the filelist up to the encapsulated code block.  However, walk()
is better understood as viewing the filesystem as a directory-only tree
with each node having a filelist as its value.  The arguments to walkfunc
are arg, node, and nodevalue.  Within this view, walk() strictly implements
a down approach, and the corresponding up approach would be to write a
DirOnlyWalker() used as follows:

for dirname,filelist in DirOnlyWalker(".")
  # do whatever to dirname and filelist

Understanding how walk() views the filesystem should make it easier to use
and remember.

Whichever way (down or up) the code block is applied to directory nodes and
their filelist values, it may toss away the structural information by
decomposing the list and simply visiting each file one at a time, as with a
filenode code block.  But it does not have to.  It can instead operate on
filelist as a value in itself.  A simple example would be to call
len(filelist) either to print it or to arg.append(dirname, len(filelist)).
File-at-a-time applications cannot do that without laboriously
reconstructing the discarded structural information.  So DirectoryWalker is
not a complete alternative to os.path.walk.

It might be nice to have a function/iterator/walker for both views of the
filesystem so there is no need to press one into the use best served by the
other.  Given that the author of os.path only gave us one, he arguably made
the right choice since it is much easier to toss information than to
reconstruct it.  However, since most people mostly want to visit files
rather than directories with filelists, some confusion is understandable.

Terry J. Reedy






More information about the Python-list mailing list