Graph library for Python

Mon Dec 7 18:28:49 EST 2009

geremy condra wrote:
> On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> geremy condra wrote:
>>> How interested are you in a C port of graphine? I haven't had
>>> any specific requests for it, but if its something you need I
>>> can shuffle it towards the top of the to do pile.
>>
>> There are two main reasons for a C implementation:
>>
>>  1. performance
>>
>>  2. memory footprint
>>
>> These are important when using graphs with lots of nodes or
>> when using lots of graphs or operations on graphs in tight
>> loops.
>>
>> However, to get such a new type into the core, you need to start
>> a PEP process and hash out the API some more, esp. with respect
>> to integration with the other core types, such as dictionaries
>> and lists of tuples for both creation and as alternative
>> representation.
> 
> I'm happy to start the PEP process, but will need some
> guidance, as I've never written a PEP before.

Some pointers to get you started:

PEPs in general:
http://www.python.org/dev/peps/pep-0001/
http://www.python.org/dev/peps/pep-0009/

Adding modules to the standard lib:
http://www.python.org/dev/peps/pep-0002/

(though, in reality, you'd probably only be patching the
collections.py module, I guess)

>> Some other comments (in no particular order):
>>
>>  * the "name" default of using id(node) is not ideal, since
>>   id(node) will return the address of the object and that
>>   can be subject to reuse within the lifetime of the process;
>>   using a global (thread-safe) counter would be safer
> 
> Good idea. I'm going to start a branch on the repo to
> handle the changes you mention.
> 
>>  * Graph.__init__ should be able to take a list or set
>>   of nodes and edges as initializer
> 
> The format of this will need to be thought all the way
> through before being implemented. To date, we haven't
> come up with anything completely satisfactory, but
> AFAIK everybody involved is still open to suggestions
> on this.

I wasn't thinking of anything clever :-) ...

g = Graph(
      [Node("a"), Node("b"), Node("c")],
      [Edge(Node("a"), Node("b"), "ab"),
       Edge(Node("a"), Node("c"), "ac"),
       Edge(Node("b"), Node("c"), "bc"),
      ])

The main motivation here is to get lists, sets and dicts
play nice together.

>>  * Graph.__setitem__ could be mapped to .add_node() for
>>   convenience
> 
> This may be a question of playing around with it. ATM I'm
> not sold on the idea, but I'll implement it and see how it
> turns out in practice.

Thinking about it some more, I agree, it's not all that useful.

>>  * Graph.__length__ could be mapped to .size() for
>>   convenience
> 
> We decided not to do this due to the ambiguity between
> whether .size() or .order() was the intended operation,
> and looking back I'm not sure that was entirely unjustified.
> Do you see there being any confusion on that score?

There is an ambiguity here, indeed. My thinking was that
the edges define the graph and can be mapped to a dictionary,
e.g.

d = {"ab": ("a", "b"),
     "ac": ("a", "c"),
     "bc": ("b", "c")}

len(d) == 3
len(g) == 3

A graph without edges is also what you typically call an empty
graph, ie.

if not g:
    print 'empty'

http://mathworld.wolfram.com/EmptyGraph.html

Still, perhaps it's better to just not go down this route.

>>  * Graph.__iter__ could be mapped to an iterator using
>>   the fastest traversal method for the graph nodes
>>   (ie. order does not matter, it's only important that
>>   all nodes are found as fast as possible)
> 
> Again, it seems ambiguous as to whether nodes or
> edges are the intended target here, and while the
> API can obviously dictate that, it seems a bit like
> a case of "in the face of ambiguity, refuse the
> temptation to guess" to me.

Right, but sometimes "practicalty beats purity" ;-) We had
the same situation for dictionaries and then decided that
iteration over keys would be more natural than iterating over
items (key, value) or values.

It's also important to note that:

	for n in g: print n

and

	n in g

match up in terms of semantics.

Since n in g already uses the "node in graph" approach,
the for-loop should use the same logic.

>>  * Graph could also benefit from some bulk update methods,
>>   e.g. to update weights on all edges or nodes by passing
>>   in a dictionary mapping names to attribute values
> 
> Sounds good. Again, the format will need some careful
> thought, but you're right that this would improve it
> greatly.

This is only an optimization, but could lead to some great
performance improvements by avoiding function call overhead.

>>  * Graph could benefit from some bulk query methods,
>>   such as .get_node_attributes() and .add_edges().
> 
> I'm not sure how the first is different from the existing
> .data, what did you have in mind?

The second one was a typo. It should have been
.get_edge_attributes().

In both cases I was thinking of being able to quickly
extract a number of node/edge attributes, e.g.

>>> g.get_edge_attributes('weight', 'color')
{"ab": {"weight": 3, "color": "blue"},
 "ac": {"weight": 2, "color": "green"},
 "bc": {"weight": 1, "color": "red"}}

Again, the idea is to reduce call overhead and later
on be able to move these lookups to C.

>>  * Node.__getitem__ could be mapped to .data.__getitem__
>>   (following the approach taken by ElementTree); same
>>   for Edge.__getitem__
> 
>>  * Node.__setitem__ could be mapped to .data.__setitem__
>>   (following the approach taken by ElementTree); same
>>   for Edge.__setitem__
> 
> .data is just a property that returns a dictionary of non
> private members right now, so if you're wanting to just say
> node.this = 'stuff', you can already do that. Or am I
> misreading your intent?

Right, but with attributes you are restricted to Python
identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
The above approach bypasses this restriction.

>>  * I'd add a DirectedEdge alias for Edge and a new
>>   UndirectedEdge as subclass which has is_directed=False;
>>   makes code more readable
> 
> Sounds good, and thanks for the advice!

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/