Graph library for Python (was: Re: python bijection)

Mon Dec 7 17:23:24 EST 2009

On Mon, Dec 7, 2009 at 2:51 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> geremy condra wrote:
>> How interested are you in a C port of graphine? I haven't had
>> any specific requests for it, but if its something you need I
>> can shuffle it towards the top of the to do pile.
>
> There are two main reasons for a C implementation:
>
>  1. performance
>
>  2. memory footprint
>
> These are important when using graphs with lots of nodes or
> when using lots of graphs or operations on graphs in tight
> loops.
>
> However, to get such a new type into the core, you need to start
> a PEP process and hash out the API some more, esp. with respect
> to integration with the other core types, such as dictionaries
> and lists of tuples for both creation and as alternative
> representation.

I'm happy to start the PEP process, but will need some
guidance, as I've never written a PEP before.

> Some other comments (in no particular order):
>
>  * the "name" default of using id(node) is not ideal, since
>   id(node) will return the address of the object and that
>   can be subject to reuse within the lifetime of the process;
>   using a global (thread-safe) counter would be safer

Good idea. I'm going to start a branch on the repo to
handle the changes you mention.

>  * Graph.__init__ should be able to take a list or set
>   of nodes and edges as initializer

The format of this will need to be thought all the way
through before being implemented. To date, we haven't
come up with anything completely satisfactory, but
AFAIK everybody involved is still open to suggestions
on this.

>  * Graph.__setitem__ could be mapped to .add_node() for
>   convenience

This may be a question of playing around with it. ATM I'm
not sold on the idea, but I'll implement it and see how it
turns out in practice.

>  * Graph.__length__ could be mapped to .size() for
>   convenience

We decided not to do this due to the ambiguity between
whether .size() or .order() was the intended operation,
and looking back I'm not sure that was entirely unjustified.
Do you see there being any confusion on that score?

>  * Graph.__iter__ could be mapped to an iterator using
>   the fastest traversal method for the graph nodes
>   (ie. order does not matter, it's only important that
>   all nodes are found as fast as possible)

Again, it seems ambiguous as to whether nodes or
edges are the intended target here, and while the
API can obviously dictate that, it seems a bit like
a case of "in the face of ambiguity, refuse the
temptation to guess" to me.

>  * Graph could also benefit from some bulk update methods,
>   e.g. to update weights on all edges or nodes by passing
>   in a dictionary mapping names to attribute values

Sounds good. Again, the format will need some careful
thought, but you're right that this would improve it
greatly.

>  * Graph could benefit from some bulk query methods,
>   such as .get_node_attributes() and .add_edges().

I'm not sure how the first is different from the existing
.data, what did you have in mind?

>  * Node.__getitem__ could be mapped to .data.__getitem__
>   (following the approach taken by ElementTree); same
>   for Edge.__getitem__

>  * Node.__setitem__ could be mapped to .data.__setitem__
>   (following the approach taken by ElementTree); same
>   for Edge.__setitem__

.data is just a property that returns a dictionary of non
private members right now, so if you're wanting to just say
node.this = 'stuff', you can already do that. Or am I
misreading your intent?

>  * I'd add a DirectedEdge alias for Edge and a new
>   UndirectedEdge as subclass which has is_directed=False;
>   makes code more readable

Sounds good, and thanks for the advice!

Geremy Condra