Arranging a dependency tree

Paul Rubin http
Thu Aug 12 04:19:56 EDT 2004


kyleroot at gmail.com (Kyle Root) writes:
> I've created a jumble of python modules, in each one is a tuple that
> goes something like,
> 
> deps = ("file10","file7","file3").
> 
> And some have no dependencies at all.  I have one file called "start",
> and the whole thing makes a tree of dependencies as each file has deps
> and their deps might have deps, and so on.  What I'm trying to do is
> "straighten" them or put them in order as if they were to be
> installed.  Basically I'm trying to replicate the way portage does it. :)

Suppose that deps list is for file5.  I'm reading your example to mean
file5 depends on file10, file7, and file3, and not the other way around.

As others have mentioned, what you want to do is called topological
sorting.  Since nobody's described the algorithm, it goes something
like this:

   1) Make a dictionary indexed by filename, whose entry deps[f] will
   be the set of files f depends on.  In the above example,
   deps["file5"] = Set(["file10", "file7", "file3"]).  You can
   also use a dict instead of a Set in the obvious way.

   1) Make another dictionary indexed by filename, whose entry d[f] for
   filename f will be a list of the files that depend on f.  For
   example, if the deps for file5 are ("file10","file7","file3"), then
   you'll put "file5" into d["file10"], d["file7"], and d["file3"].
   You can straightforwardly build this dict when you scan the deps lists.

   2) Also when you scan the deps lists, make a list U of all the files
   that don't depend on anything. 

   3. while U is not empty, do the following:
       choose an f from U
       output f
       remove f from U
       for g in d[f]:  # for each file depending on f
          remove f from deps[g]           
          if deps[g] is now empty, add g to U

If there are no circular dependencies and if I haven't goofed up (it's
late here), then the above scheme should output all the filenames in
topologically sorted order, in running time linear in the # of
filenames plus the # of dependencies.  

You can probably find a clearer explanation in any intro CS text with
a chapter about graph algorithms.



More information about the Python-list mailing list