Graph library for Python

Mon Dec 7 23:28:05 EST 2009

On Mon, Dec 7, 2009 at 6:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> geremy condra wrote:
>> On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> geremy condra wrote:
>>>> How interested are you in a C port of graphine? I haven't had
>>>> any specific requests for it, but if its something you need I
>>>> can shuffle it towards the top of the to do pile.
>>>
>>> There are two main reasons for a C implementation:
>>>
>>>  1. performance
>>>
>>>  2. memory footprint
>>>
>>> These are important when using graphs with lots of nodes or
>>> when using lots of graphs or operations on graphs in tight
>>> loops.
>>>
>>> However, to get such a new type into the core, you need to start
>>> a PEP process and hash out the API some more, esp. with respect
>>> to integration with the other core types, such as dictionaries
>>> and lists of tuples for both creation and as alternative
>>> representation.
>>
>> I'm happy to start the PEP process, but will need some
>> guidance, as I've never written a PEP before.
>
> Some pointers to get you started:
>
> PEPs in general:
> http://www.python.org/dev/peps/pep-0001/
> http://www.python.org/dev/peps/pep-0009/
>
> Adding modules to the standard lib:
> http://www.python.org/dev/peps/pep-0002/
>
> (though, in reality, you'd probably only be patching the
> collections.py module, I guess)

Thanks, I'll go over these a little later.

>>> Some other comments (in no particular order):
>>>
>>>  * the "name" default of using id(node) is not ideal, since
>>>   id(node) will return the address of the object and that
>>>   can be subject to reuse within the lifetime of the process;
>>>   using a global (thread-safe) counter would be safer
>>
>> Good idea. I'm going to start a branch on the repo to
>> handle the changes you mention.
>>
>>>  * Graph.__init__ should be able to take a list or set
>>>   of nodes and edges as initializer
>>
>> The format of this will need to be thought all the way
>> through before being implemented. To date, we haven't
>> come up with anything completely satisfactory, but
>> AFAIK everybody involved is still open to suggestions
>> on this.
>
> I wasn't thinking of anything clever :-) ...
>
> g = Graph(
>      [Node("a"), Node("b"), Node("c")],
>      [Edge(Node("a"), Node("b"), "ab"),
>       Edge(Node("a"), Node("c"), "ac"),
>       Edge(Node("b"), Node("c"), "bc"),
>      ])
>
> The main motivation here is to get lists, sets and dicts
> play nice together.

Generally, we've tried to discourage people from instantiating
nodes and edges directly, in favor of having them controlled
through the graph. Maybe something along the lines of:

g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')])

?

>>>  * Graph.__setitem__ could be mapped to .add_node() for
>>>   convenience
>>
>> This may be a question of playing around with it. ATM I'm
>> not sold on the idea, but I'll implement it and see how it
>> turns out in practice.
>
> Thinking about it some more, I agree, it's not all that useful.
>
>>>  * Graph.__length__ could be mapped to .size() for
>>>   convenience
>>
>> We decided not to do this due to the ambiguity between
>> whether .size() or .order() was the intended operation,
>> and looking back I'm not sure that was entirely unjustified.
>> Do you see there being any confusion on that score?
>
> There is an ambiguity here, indeed. My thinking was that
> the edges define the graph and can be mapped to a dictionary,
> e.g.
>
> d = {"ab": ("a", "b"),
>     "ac": ("a", "c"),
>     "bc": ("b", "c")}
>
> len(d) == 3
> len(g) == 3
>
> A graph without edges is also what you typically call an empty
> graph, ie.
>
> if not g:
>    print 'empty'
>
> http://mathworld.wolfram.com/EmptyGraph.html
>
> Still, perhaps it's better to just not go down this route.

I'd rather avoid it if possible, since there are many
potential interpretations of this in different contexts.
Again, I'm open to to the idea, but not convinced.

>>>  * Graph.__iter__ could be mapped to an iterator using
>>>   the fastest traversal method for the graph nodes
>>>   (ie. order does not matter, it's only important that
>>>   all nodes are found as fast as possible)
>>
>> Again, it seems ambiguous as to whether nodes or
>> edges are the intended target here, and while the
>> API can obviously dictate that, it seems a bit like
>> a case of "in the face of ambiguity, refuse the
>> temptation to guess" to me.
>
> Right, but sometimes "practicalty beats purity" ;-) We had
> the same situation for dictionaries and then decided that
> iteration over keys would be more natural than iterating over
> items (key, value) or values.
>
> It's also important to note that:
>
>        for n in g: print n
>
> and
>
>        n in g
>
> match up in terms of semantics.
>
> Since n in g already uses the "node in graph" approach,
> the for-loop should use the same logic.

Beware, that doesn't just match nodes.

g = Graph()
g.add_node('a')
g.add_node('b')
g.add_edge('a', 'b', 'ab')
'ab' in g # returns true

>>>  * Graph could also benefit from some bulk update methods,
>>>   e.g. to update weights on all edges or nodes by passing
>>>   in a dictionary mapping names to attribute values
>>
>> Sounds good. Again, the format will need some careful
>> thought, but you're right that this would improve it
>> greatly.
>
> This is only an optimization, but could lead to some great
> performance improvements by avoiding function call overhead.

Agreed.

>>>  * Graph could benefit from some bulk query methods,
>>>   such as .get_node_attributes() and .add_edges().
>>
>> I'm not sure how the first is different from the existing
>> .data, what did you have in mind?
>
> The second one was a typo. It should have been
> .get_edge_attributes().
>
> In both cases I was thinking of being able to quickly
> extract a number of node/edge attributes, e.g.
>
>>>> g.get_edge_attributes('weight', 'color')
> {"ab": {"weight": 3, "color": "blue"},
>  "ac": {"weight": 2, "color": "green"},
>  "bc": {"weight": 1, "color": "red"}}
>
> Again, the idea is to reduce call overhead and later
> on be able to move these lookups to C.

Entirely doable, but I'm not sure I see the use case.
Would you mind providing a more realistic example?

>>>  * Node.__getitem__ could be mapped to .data.__getitem__
>>>   (following the approach taken by ElementTree); same
>>>   for Edge.__getitem__
>>
>>>  * Node.__setitem__ could be mapped to .data.__setitem__
>>>   (following the approach taken by ElementTree); same
>>>   for Edge.__setitem__
>>
>> .data is just a property that returns a dictionary of non
>> private members right now, so if you're wanting to just say
>> node.this = 'stuff', you can already do that. Or am I
>> misreading your intent?
>
> Right, but with attributes you are restricted to Python
> identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
> The above approach bypasses this restriction.

Hmm. Sounds like a plausible use case to me, although I'm
not sure its one that should be encouraged. The bigger
question in my mind is whether all attribute lookups should
have to pay the extra lookup cost to support a somewhat
narrow need. I'll definitely have to talk with the other devs
about this one, and run a little bit of timing besides.

Geremy Condra