Hack with os.walk()
Tim Peters
tim.peters at gmail.com
Sat Feb 12 19:05:28 EST 2005
[Frans Englich]
...
> My problem, AFAICT, with using os.walk() the usual way, is that in order to
> construct the /hierarchial/ XML document, I need to be aware of the directory
> depth, and a recursive function handles that nicely; os.walk() simply
> concentrates on figuring out paths to all files in a directory, AFAICT.
walk() does a pre- (default) or post-order traversal of a directory
tree; that's it.
> I guess I could solve it with using os.walk() in a traditional way, by somehow
> pushing libxml2 nodes on a stack, after keeping track of the directory levels
> etc(string parsing..). Or, one could write ones own recursive directory
> parser..
>
> My question is: what is the least ugly? What is the /proper/ solution for my
> problem? How would you write it in the cleanest way?
Well, it's enough just to keep a dict mapping directory path to
directory object. That doesn't require parsing anything. For
example, here are some dumb base classes (not XML -- the XML part
distracts from the essence for me):
class HasPath:
def __init__(self, path):
self.path = path
def __lt__(self, other):
return self.path < other.path
class Directory(HasPath):
def __init__(self, path):
HasPath.__init__(self, path)
self.files = [] # list of File objects
self.subdirs = [] # list of sub-Directory objects
class File(HasPath):
pass
Then a generic build_tree function. It accepts optional arguments to
specify factory functions for creating directory and file objects.
The guts require that a directory object have .path, .files and
.subdirs attributes; you probably want to season to taste, to require
XML-ish spellings. It returns a directory-ish object, representing
the path passed to it:
def build_tree(path, Directory=Directory, File=File):
top = Directory(path)
path2dir = {path: top}
for root, dirs, files in os.walk(path):
dirobj = path2dir[root]
for name in dirs:
subdirobj = Directory(os.path.join(root, name))
path2dir[subdirobj.path] = subdirobj
dirobj.subdirs.append(subdirobj)
for name in files:
dirobj.files.append(File(os.path.join(root, name)))
return top
That looks short and sweet to me. It could be made shorter, but not
without losing clarity to my eyes.
As a concrete example, here's a subclass of Directory (to be passed as
the Directory factory to build_tree) with a display() method that
dumps a sensibly indented listing of the tree; note that sort() gets
the intended order via the inherited HasPath.__lt__() implementation:
class ListingDirectory(Directory):
# Display directory tree as a tree, with 4-space indents.
# Files listed before subdirectories, both in alphabetical order.
# Full path displayed for topmost directory, base names for all
# other entries. Directories listed with trailing os.sep.
def display(self, level=0):
self.subdirs.sort()
self.files.sort()
name = self.path
if level:
name = os.path.basename(name)
print "%s%s%s" % (' ' * level, name, os.sep)
for f in self.files:
print "%s%s" % (' ' * (level + 4),
os.path.basename(f.path))
for d in self.subdirs:
d.display(level + 4)
More information about the Python-list
mailing list