Hack with os.walk()

Tim Peters tim.peters at gmail.com
Sat Feb 12 19:05:28 EST 2005


[Frans Englich]
...
> My problem, AFAICT, with using os.walk() the usual way, is that in order to
> construct the /hierarchial/ XML document, I need to be aware of the directory
> depth, and a recursive function handles that nicely; os.walk() simply
> concentrates on figuring out paths to all files in a directory, AFAICT.

walk() does a pre- (default) or post-order traversal of a directory
tree; that's it.

> I guess I could solve it with using os.walk() in a traditional way, by somehow
> pushing libxml2 nodes on a stack, after keeping track of the directory levels
> etc(string parsing..). Or, one could write ones own recursive directory
> parser..
>
> My question is: what is the least ugly? What is the /proper/ solution for my
> problem? How would you write it in the cleanest way?

Well, it's enough just to keep a dict mapping directory path to
directory object.  That doesn't require parsing anything.  For
example, here are some dumb base classes (not XML -- the XML part
distracts from the essence for me):

class HasPath:
    def __init__(self, path):
        self.path = path

    def __lt__(self, other):
        return self.path < other.path

class Directory(HasPath):
    def __init__(self, path):
        HasPath.__init__(self, path)
        self.files = [] # list of File objects
        self.subdirs =  [] # list of sub-Directory objects

class File(HasPath):
    pass

Then a generic build_tree function.  It accepts optional arguments to
specify factory functions for creating directory and file objects. 
The guts require that a directory object have .path, .files and
.subdirs attributes; you probably want to season to taste, to require
XML-ish spellings.  It returns a directory-ish object, representing
the path passed to it:

def build_tree(path, Directory=Directory, File=File):
    top = Directory(path)
    path2dir = {path: top}
    for root, dirs, files in os.walk(path):
        dirobj = path2dir[root]
        for name in dirs:
            subdirobj = Directory(os.path.join(root, name))
            path2dir[subdirobj.path] = subdirobj
            dirobj.subdirs.append(subdirobj)
        for name in files:
            dirobj.files.append(File(os.path.join(root, name)))
    return top

That looks short and sweet to me.  It could be made shorter, but not
without losing clarity to my eyes.

As a concrete example, here's a subclass of Directory (to be passed as
the Directory factory to build_tree) with a display() method that
dumps a sensibly indented listing of the tree; note that sort() gets
the intended order via the inherited HasPath.__lt__() implementation:

class ListingDirectory(Directory):
    # Display directory tree as a tree, with 4-space indents.
    # Files listed before subdirectories, both in alphabetical order.
    # Full path displayed for topmost directory, base names for all
    # other entries.  Directories listed with trailing os.sep.
    def display(self, level=0):
        self.subdirs.sort()
        self.files.sort()
        name = self.path
        if level:
            name = os.path.basename(name)
        print "%s%s%s" % (' ' * level, name, os.sep)
        for f in self.files:
            print "%s%s" % (' ' * (level + 4),
                            os.path.basename(f.path))
        for d in self.subdirs:
            d.display(level + 4)



More information about the Python-list mailing list