[SciPy-Dev] Graph shortest path

Charles R Harris charlesr.harris at gmail.com
Mon Dec 19 20:08:48 EST 2011


On Mon, Dec 19, 2011 at 5:10 PM, Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> On Mon, Dec 19, 2011 at 10:03:29AM -0600, Warren Weckesser wrote:
> >    Networkx ([3]http://networkx.lanl.gov/) already provides an
> assortment of
> >    data structures and algorithms for graphs.  It is an active,
> >    well-documented project.  Personally, I'd rather not start adding
> code to
> >    scipy that duplicates it.   If your code is better/faster/stronger
> than
> >    theirs, why not contribute it to networkx?
>
> I think that there is some room for a lower-level and more efficient base
> structure from graphs in the scipy world, and that scipy is well suited
> for that. Simple graph algorithms are common to many applications. For
> instance a lot of the sparse linalg needs basic graph algorithms. It
> seems to that having sparse graph algorithms in scipy (and we already
> have some under the sparse package) would help the scipy world sharing
> code with regards to graph algorithms. For instance, the scikit-image,
> the scikit-learn, and pyamg all have shortest path algorithms.
>
> NetworkX is not well-suited for purely numerical application. First of
> all, it is pure-Python, which has a lot of overheads as many of the graph
> algorithms that cannot be implemented as array or matrix computation.
> Second, it is using a dict of dicts representation to store and
> manipulate graphs. The reason that they are using this representation is
> that it enables to store rich objects for the nodes, and to dynamically
> allocate and remove nodes. Such choices are great for the complex-systems
> types of problems that they are targeting. However, for number crunching
> applications, it is not great, as it has a significant memory overhead,
> and prevents calling easily compiled code.
>
> To put a few numbers on the computation cost, and the memory overhead, I
> did some small benchmarks on computing the shortest paths of a graph,
> something that we need in the scikit-learn. I compared the running time
> of the algorithm, but I also benched the time necessary to convert the
> data structures:
>
>    In [1]: from sklearn.utils import graph_shortest_path as skl_graph
>
>    In [2]: import networkx as nx
>
>  * For medium sized graphs (negligeable memory cost):
>
>    In [3]: G = nx.gnm_random_graph(100, 100, seed=0)
>
>    In [4]: mat = nx.to_scipy_sparse_matrix(G).asformat('lil')
>
>    In [5]: %timeit skl_graph.graph_shortest_path(mat, directed=False)
>    100 loops, best of 3: 3.61 ms per loop
>
>    In [6]: %timeit nx.from_scipy_sparse_matrix(mat)
>    1000 loops, best of 3: 1.02 ms per loop
>
>    In [7]: %timeit nx.shortest_paths.all_pairs_dijkstra_path_length(G)
>    10 loops, best of 3: 50.3 ms per loop
>
>  * For biggish graphs (nx uses 800 Mb, scikit-learn around 10 to 20 Mb):
>
>    In [8]: G = nx.gnm_random_graph(5000, 5000, seed=0)
>
>    In [9]: mat = nx.to_scipy_sparse_matrix(G).asformat('lil')
>
>    In [10]: %timeit skl_graph.graph_shortest_path(mat, directed=False)
>    1 loops, best of 3: 15.8 s per loop
>
>    In [11]: %timeit nx.from_scipy_sparse_matrix(mat)
>    10 loops, best of 3: 59.6 ms per loop
>
>    In [12]: %timeit nx.shortest_paths.all_pairs_dijkstra_path_length(G)
>    1 loops, best of 3: 197 s per loop
>
> With 10000 nodes, the above code does run on my laptop as it uses more
> than the available 2G of RAM.
>
> In addition, basic graph algorithms on numerical structures are needed
> for a lot of applications. People who need efficient implementation, like
> scikit-learn, scikit-image, or pyamg, will reimplement them. It seems
> profitable to have scipy as a common resource for sharing them.
> Scipy.sparse has all the necessary tools for that.
>
> Don't get me wrong, I love networkX and use it often because it has so
> many algorithms and is so versatile. I just think that a few graph
> algorithms are central and deserve a special treatment.
>
>
Another graph algorithm I'd like to see in scipy is
union-find<http://en.wikipedia.org/wiki/Disjoint-set_data_structure>.
I've thought of adding it on occasion and keep a cython version around for
my own use.

One thing to be careful about is finding an interface that is widely useful
without requiring a lot of special node structures and such.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20111219/c794643d/attachment.html>


More information about the SciPy-Dev mailing list