Standard graph API?

Fri Sep 10 19:13:34 EDT 2004

Magnus Lie Hetland wrote:
 > Yup. Just go ahead and add stuff.

I tried.  I can view the page and get to the edit page
but I can neither save it nor preview it.  My browser
times out.  Here are my two sets of comments

Brian Kelley's "frowns" packages
(http://staffa.wi.mit.edu/people/kelley/ ) has graph code appropriate
for a chemical informatics library, including a wrapper to the VFlib
isomorphism graph library.  Andrew Dalke has some clique finding
code at http://starship.python.net/crew/dalke/clique/ .

   and

(Andrew Dalke speaking here.)

It's pretty uniformly agreed that there is no standard graph
representation, if for no other reason than that my nodes are called
"atoms" and my edges are called "bonds" and I don't care about neither
multigraphs nor directed edges.  ;)

Magnus suggests that adapters is the right approach.  My concern with
that is two-fold.  First, the adapter layer will cause a non-trivial
overhead -- at least an extra function call for every graph-specific
action.  Second, the algorithm can't be specialized for the graph at
hand.  (Eg, in a DFS the algorithm may generate events for node enter,
exit, and backtrack, while I may only need one of those.)

My thought is to use some sort of templating system.  If done well the
actual code might even be Python, with a naming scheme to allow
replacements appropriate to the data structure (eg, to use
"weights[edge]" or "edge.size" for a flow algorithm) and remove code
blocks not needed for a given algorithm (eg, don't include a callback
code if it isn't used.)

This would depend on having standard ideas of what you can do with a
graph.  Some include "iterate over all nodes", "compare two nodes for
equivalence", "get all edges for a given node."

Now responding to your post

> Your idea is interesting -- but quite a way from what I had
> envisioned... Would there be a way to combine the ideas? I.e. define
> an (abstract) interface to standard graph functionality and in
> addition have this sort of template stuff? It might be a bit far
> fetched, though, as it seems your template idea obviates the need for
> a standard interface.

As I outlined above, I believe it should be possible to build a
templating system on top of working Python code.  The code
uses the 'standard' API and the templating system gives a way
to conform the algorithm to the graph at hand.

I think many people will just use the standard scheme,
but that enough people have different needs (like chemistry)
that it's worth the effort to be malleable that way.

> Well -- I was rather aiming for a definition of a graph protocol
> (similar to e.g. the sequence protocol or the mapping protocol), and
> that would entail fixing the methods and attributes involved.

Ahh, then we are talking about different but related things.
I don't think it's possible to have a graph protocol like
mention which is good enough for all graph systems.  Eg,
does the weight for a max flow calculation come from a
property of the edge, from a table passed into the function,
or from a callable?  If a table, is it indexed by id(node),
by the node itself (is the node hashable?) or by some unique
id given to all nodes?

I know there are lessons to learn from both LEDA and
the Boost Graph Library.  Sadly, I don't know them.

> Maybe. I guess it depends on how you implement things. For example, if
> it's only a matter of renaming methods, that wouldn't entail any
> performance loss at all (you could just have another attribute bound
> to the same method).

I do not want that.  Which will users of my library use,
"edges" or "bonds"?  This sort of choice is bad.

> But coding this doesn't seem like much fun.
> 
> My hope was that I could write new graph algorithms so they looked
> somewhat pretty and readable. Having to write them in some new
> template langage doesn't seem very tempting.
> 
> (As I said, stock algorithms aren't of that much interest to me.)

What I need to do is take a look at, say, David Eppstein's
set of algorithms and see if my templating idea can work
without being too onerous.

> I don't see how this has anything to do with the issue at hand,
> really.
> 
> The person who implemented the DFS could deal with this issue (by
> parametrizing it) or something. (*Or* using adaptation, for that
> matter.) I'm sure there are many other ways of dealing with this...
> IMO this is orthogonal. Whether or not you can tell the DFS how you
> access the neighbors of a node isn't directly related to what you want
> the DFS to do...

Mmm, in some way you are right.  You've been talking
about adapting the data structure to work with the
algorithm.  I've been talking about adapting the algorithm
to deal with the data structure.

>>I found that 'is' testing for my graphs is much better.
> 
> 
> The problem is that it is impossible to override.

That's why it's faster.  :)

> If I were to use some special hardware as the source of my graph
> (which is a realistic example for me) I might have to create node
> wrappers on the fly. Unless I use the Singleton pattern (the Borg
> pattern wouldn't work) I could easily end up with equal but
> non-identical nodes, and no way of overriding this.

I had experience using a C library for the underlying
graph data.  It returned opaque integers for object
references.  Every time I got one I wrapped it inside
a new Python object.  So I could have different Python
objects for the same underlying object.

It turned out the some algorithms used so many "have
I seen this atom/bond before?" tests that the switch
from == to 'is' made a big difference.

On the other hand, it did mean I needed to convert from
my library's natural style of graph into the form that
allowed 'is' tests.

				Andrew
				dalke at dalkescientific.com