Graph library for Python

Tue Dec 8 03:06:29 EST 2009

On Mon, Dec 7, 2009 at 11:28 PM, geremy condra <debatem1 at gmail.com> wrote:
> On Mon, Dec 7, 2009 at 6:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> geremy condra wrote:
>>> On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> geremy condra wrote:
>>>>> How interested are you in a C port of graphine? I haven't had
>>>>> any specific requests for it, but if its something you need I
>>>>> can shuffle it towards the top of the to do pile.
>>>>
>>>> There are two main reasons for a C implementation:
>>>>
>>>>  1. performance
>>>>
>>>>  2. memory footprint
>>>>
>>>> These are important when using graphs with lots of nodes or
>>>> when using lots of graphs or operations on graphs in tight
>>>> loops.
>>>>
>>>> However, to get such a new type into the core, you need to start
>>>> a PEP process and hash out the API some more, esp. with respect
>>>> to integration with the other core types, such as dictionaries
>>>> and lists of tuples for both creation and as alternative
>>>> representation.
>>>
>>> I'm happy to start the PEP process, but will need some
>>> guidance, as I've never written a PEP before.
>>
>> Some pointers to get you started:
>>
>> PEPs in general:
>> http://www.python.org/dev/peps/pep-0001/
>> http://www.python.org/dev/peps/pep-0009/
>>
>> Adding modules to the standard lib:
>> http://www.python.org/dev/peps/pep-0002/
>>
>> (though, in reality, you'd probably only be patching the
>> collections.py module, I guess)
>
> Thanks, I'll go over these a little later.
>
>>>> Some other comments (in no particular order):
>>>>
>>>>  * the "name" default of using id(node) is not ideal, since
>>>>   id(node) will return the address of the object and that
>>>>   can be subject to reuse within the lifetime of the process;
>>>>   using a global (thread-safe) counter would be safer
>>>
>>> Good idea. I'm going to start a branch on the repo to
>>> handle the changes you mention.
>>>
>>>>  * Graph.__init__ should be able to take a list or set
>>>>   of nodes and edges as initializer
>>>
>>> The format of this will need to be thought all the way
>>> through before being implemented. To date, we haven't
>>> come up with anything completely satisfactory, but
>>> AFAIK everybody involved is still open to suggestions
>>> on this.
>>
>> I wasn't thinking of anything clever :-) ...
>>
>> g = Graph(
>>      [Node("a"), Node("b"), Node("c")],
>>      [Edge(Node("a"), Node("b"), "ab"),
>>       Edge(Node("a"), Node("c"), "ac"),
>>       Edge(Node("b"), Node("c"), "bc"),
>>      ])
>>
>> The main motivation here is to get lists, sets and dicts
>> play nice together.
>
> Generally, we've tried to discourage people from instantiating
> nodes and edges directly, in favor of having them controlled
> through the graph. Maybe something along the lines of:
>
> g = Graph(nodes=['a', 'b', 'c'], edges=[('a', 'b'), ('a', 'c'), ('b', 'c')])
>
> ?
>
>>>>  * Graph.__setitem__ could be mapped to .add_node() for
>>>>   convenience
>>>
>>> This may be a question of playing around with it. ATM I'm
>>> not sold on the idea, but I'll implement it and see how it
>>> turns out in practice.
>>
>> Thinking about it some more, I agree, it's not all that useful.
>>
>>>>  * Graph.__length__ could be mapped to .size() for
>>>>   convenience
>>>
>>> We decided not to do this due to the ambiguity between
>>> whether .size() or .order() was the intended operation,
>>> and looking back I'm not sure that was entirely unjustified.
>>> Do you see there being any confusion on that score?
>>
>> There is an ambiguity here, indeed. My thinking was that
>> the edges define the graph and can be mapped to a dictionary,
>> e.g.
>>
>> d = {"ab": ("a", "b"),
>>     "ac": ("a", "c"),
>>     "bc": ("b", "c")}
>>
>> len(d) == 3
>> len(g) == 3
>>
>> A graph without edges is also what you typically call an empty
>> graph, ie.
>>
>> if not g:
>>    print 'empty'
>>
>> http://mathworld.wolfram.com/EmptyGraph.html
>>
>> Still, perhaps it's better to just not go down this route.
>
> I'd rather avoid it if possible, since there are many
> potential interpretations of this in different contexts.
> Again, I'm open to to the idea, but not convinced.
>
>>>>  * Graph.__iter__ could be mapped to an iterator using
>>>>   the fastest traversal method for the graph nodes
>>>>   (ie. order does not matter, it's only important that
>>>>   all nodes are found as fast as possible)
>>>
>>> Again, it seems ambiguous as to whether nodes or
>>> edges are the intended target here, and while the
>>> API can obviously dictate that, it seems a bit like
>>> a case of "in the face of ambiguity, refuse the
>>> temptation to guess" to me.
>>
>> Right, but sometimes "practicalty beats purity" ;-) We had
>> the same situation for dictionaries and then decided that
>> iteration over keys would be more natural than iterating over
>> items (key, value) or values.
>>
>> It's also important to note that:
>>
>>        for n in g: print n
>>
>> and
>>
>>        n in g
>>
>> match up in terms of semantics.
>>
>> Since n in g already uses the "node in graph" approach,
>> the for-loop should use the same logic.
>
> Beware, that doesn't just match nodes.
>
> g = Graph()
> g.add_node('a')
> g.add_node('b')
> g.add_edge('a', 'b', 'ab')
> 'ab' in g # returns true
>
>>>>  * Graph could also benefit from some bulk update methods,
>>>>   e.g. to update weights on all edges or nodes by passing
>>>>   in a dictionary mapping names to attribute values
>>>
>>> Sounds good. Again, the format will need some careful
>>> thought, but you're right that this would improve it
>>> greatly.
>>
>> This is only an optimization, but could lead to some great
>> performance improvements by avoiding function call overhead.
>
> Agreed.
>
>>>>  * Graph could benefit from some bulk query methods,
>>>>   such as .get_node_attributes() and .add_edges().
>>>
>>> I'm not sure how the first is different from the existing
>>> .data, what did you have in mind?
>>
>> The second one was a typo. It should have been
>> .get_edge_attributes().
>>
>> In both cases I was thinking of being able to quickly
>> extract a number of node/edge attributes, e.g.
>>
>>>>> g.get_edge_attributes('weight', 'color')
>> {"ab": {"weight": 3, "color": "blue"},
>>  "ac": {"weight": 2, "color": "green"},
>>  "bc": {"weight": 1, "color": "red"}}
>>
>> Again, the idea is to reduce call overhead and later
>> on be able to move these lookups to C.
>
> Entirely doable, but I'm not sure I see the use case.
> Would you mind providing a more realistic example?
>
>>>>  * Node.__getitem__ could be mapped to .data.__getitem__
>>>>   (following the approach taken by ElementTree); same
>>>>   for Edge.__getitem__
>>>
>>>>  * Node.__setitem__ could be mapped to .data.__setitem__
>>>>   (following the approach taken by ElementTree); same
>>>>   for Edge.__setitem__
>>>
>>> .data is just a property that returns a dictionary of non
>>> private members right now, so if you're wanting to just say
>>> node.this = 'stuff', you can already do that. Or am I
>>> misreading your intent?
>>
>> Right, but with attributes you are restricted to Python
>> identifiers, e.g. keywords (e.g. "from", "pass") are not allowed.
>> The above approach bypasses this restriction.
>
> Hmm. Sounds like a plausible use case to me, although I'm
> not sure its one that should be encouraged. The bigger
> question in my mind is whether all attribute lookups should
> have to pay the extra lookup cost to support a somewhat
> narrow need. I'll definitely have to talk with the other devs
> about this one, and run a little bit of timing besides.
>
> Geremy Condra
>

Just as a note, I've made some of the changes we've been
talking about in a new branch over at graphine.org.

Geremy Condra