Abstracting algorithms (graph depth first search)

Thu May 8 16:45:19 EDT 2003

I've had an interest in graph algorithms for a long time, mainly
because graphs are highly useful data structures with interesting
behaviour, simple enough to code directly in Python, but complex
enough to *not* be built in, or part of the standard library.

I've been trying to work out a good way of encapsulating the standard
depth first search algorithm in a library module (and other graph
traversal algorithms). Using Guido's representation of graphs (which
is basically the standard adjacency list representation), it's easy to
code DFS:

# Build a random graph, as a dictionary mapping vertex -> list of successors
# (as in Guido's article)

g = {}
vertices = range(10)
for v in vertices:
    g[v] = []

# Add a random set of edges, encoding edge v->u as 10*v+u
from random import shuffle
edges = range(100)
shuffle(edges)
for n in edges[:20]: # 20 edges should be enough...
    v1, v2 = divmod(n, 10)
    g[v1].append(v2)

# Depth first search, using the standard recursive algorithm
def dfs(g):
    seen = {}
    def visit(g, v):
        # Print v here to get preorder
        print v
        seen[v] = 1
        for u in g[v]:
            if u not in seen:
                visit(g, u)
        # Print v here to get postorder
        # print v

    for v in g:
        if v not in seen:
            visit(g, v)

# Print the graph and the DFS results
for v in g:
    print v, "->", ", ".join([str(u) for u in g[v]])

print

dfs(g)

However, this function *only* prints the vertices in preorder. To
print in postorder, you need to modify the algorithm to move the print
statement as indicated. To put the results in a list, you need to pass
the list as a parameter and change the print to append. To return the
results as an iterator, you need to add "yield" statements, and
replace calls to visit with

    for v in visit(...):
        yield v

And many graph algorithms make even more subtle changes to the basic
DFS algorithm.

But the DFS pattern is constant in all this (and just a little bit
subtle, so I'd prefer not to recode it each time). It would be nice to
be able to encapsulate it somehow.

The "obvious" way is to pass in a callback function - but you'd need
2, one for preorder and one for postorder. And maybe others if extra
hooks are needed, as they are for other algorithms. And the whole
thing gets slowed down by extra tests (if callback is not None...).

Or I could write a DFS class, with dummy methods for the hooks (the
default implementation is "pass"). Users then subclass to add their
own behaviour. But that seems overcomplicated, and probably also adds
overhead due to the class mechanism.

Can anyone suggest a good way of encapsulating DFS that's better than
either of these ideas? Or even good ways of implementing either of the
above 2 methods which work well in practice?

Thanks for any pointers,
Paul.
-- 
This signature intentionally left blank