[SciPy-Dev] Graph shortest path

Tue Dec 20 00:56:20 EST 2011

On Mon, Dec 19, 2011 at 8:38 PM, Aric Hagberg <aric.hagberg at gmail.com>wrote:

> Warren posted over at the networkx-devel list and now I'm following up
> here too
> (see also
> http://groups.google.com/group/networkx-devel/browse_thread/thread/4bbd1bc3cafe3d54
> )
>
> On Mon, Dec 19, 2011 at 6:08 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, Dec 19, 2011 at 5:10 PM, Gael Varoquaux
> > <gael.varoquaux at normalesup.org> wrote:
> >>
> >> On Mon, Dec 19, 2011 at 10:03:29AM -0600, Warren Weckesser wrote:
> >> >    Networkx ([3]http://networkx.lanl.gov/) already provides an
> >> > assortment of
> >> >    data structures and algorithms for graphs.  It is an active,
> >> >    well-documented project.  Personally, I'd rather not start adding
> >> > code to
> >> >    scipy that duplicates it.   If your code is better/faster/stronger
> >> > than
> >> >    theirs, why not contribute it to networkx?
> >>
> >> I think that there is some room for a lower-level and more efficient
> base
> >> structure from graphs in the scipy world, and that scipy is well suited
> >> for that. Simple graph algorithms are common to many applications. For
> >> instance a lot of the sparse linalg needs basic graph algorithms. It
> >> seems to that having sparse graph algorithms in scipy (and we already
> >> have some under the sparse package) would help the scipy world sharing
> >> code with regards to graph algorithms. For instance, the scikit-image,
> >> the scikit-learn, and pyamg all have shortest path algorithms.
> >>
> >> NetworkX is not well-suited for purely numerical application. First of
> >> all, it is pure-Python, which has a lot of overheads as many of the
> graph
> >> algorithms that cannot be implemented as array or matrix computation.
> >> Second, it is using a dict of dicts representation to store and
> >> manipulate graphs. The reason that they are using this representation is
> >> that it enables to store rich objects for the nodes, and to dynamically
> >> allocate and remove nodes. Such choices are great for the
> complex-systems
> >> types of problems that they are targeting. However, for number crunching
> >> applications, it is not great, as it has a significant memory overhead,
> >> and prevents calling easily compiled code.
> >>
> >> To put a few numbers on the computation cost, and the memory overhead, I
> >> did some small benchmarks on computing the shortest paths of a graph,
> >> something that we need in the scikit-learn. I compared the running time
> >> of the algorithm, but I also benched the time necessary to convert the
> >> data structures:
>
> Gaël is right that the overhead of using Python dictionaries (and
> converting from SciPy sparse matrices and back) will be pretty
> expensive.   For some graph algorithms that are easily expressed with
> adjacency matrices (scipy sparse matrices) it will be much better to
> implement them directly on that data structure.   There are also big
> memory gains if you don't want to store arbitrary data on edges
> (instead either nothing or a single number).
>
> We have considered, and I think Dan Schult even prototyped, a NetworkX
> graph class based on SciPy sparse matrices.  When we started NetworkX
> (2002) I don't think there even was a sparse matrix implementation in
> SciPy and we decided to use Python dictionaries.  But there are some
> good benefits from using them and your use cases might be a good
> reason to do the engineering for that.  Some parts are harder or just
> slower (like removing or adding nodes to a graph = removing or adding
> rows/columns to a scipy sparse matrix) but I think it can be done.
>
> >> Don't get me wrong, I love networkX and use it often because it has so
> >> many algorithms and is so versatile. I just think that a few graph
> >> algorithms are central and deserve a special treatment.
>
> There are some that I would definitely agree are useful in SciPy (or
> in a very SciPy sparse friendly NetworkX class?).  For example I don't
> think there is a (reverse) Cuthill-McKee ordering algorithm in SciPy.
> I implemented one using NetworkX
> https://networkx.lanl.gov/trac/browser/networkx/networkx/utils/rcm.py
> but it would be great to have one that worked directly on a SciPy sparse
> matrix.
>
> > Another graph algorithm I'd like to see in scipy is union-find. I've
> thought
> > of adding it on occasion and keep a cython version around for my own use.
>
> I'd agree with that one too.  We also use a nifty Python
> implementation that we borrowed from Eppstein and Carlson
>
> https://networkx.lanl.gov/trac/browser/networkx/networkx/utils/union_find.py
>
>
I also keep the elements of each set in a linked circular list. Merging the
lists when the sets are merged is easy and for my use cases being able to
access all elements in a set given any representative element is
convenient. I was also tempted to go the hashable object route but ended up
just using integers since pretty much anything can be indexed and it is
pretty efficient to just use an array instead of an dictionary. There might
be a middle ground in there somewhere ;)

If any other SciPy folks want to explore the possibilites of
> high-performance graph algorithms that work directly on sparse
> matrices I'm interested and would be willing to help with design and
> code.
>
>
That sounds like a useful thing to explore.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20111219/cf14aec3/attachment.html>