Which graph library is best suited for large graphs?

Fri Dec 11 18:35:00 EST 2009

On Fri, Dec 11, 2009 at 3:12 AM, Wolodja Wentland <
wentland at cl.uni-heidelberg.de> wrote:

> Hi all,
>
> I am writing a library for accessing Wikipedia data and include a module
> that generates graphs from the Link structure between articles and other
> pages (like categories).
>
> These graphs could easily contain some million nodes which are frequently
> linked. The graphs I am building right now have around 300.000 nodes
> with an average in/out degree of - say - 4 and already need around 1-2GB of
> memory. I use networkx to model the graphs and serialise them to files on
> the disk. (using adjacency list format, pickle and/or graphml).
>
> The recent thread on including a graph library in the stdlib spurred my
> interest and introduced me to a number of libraries I have not seen
> before. I would like to reevaluate my choice of networkx and need some
> help in doing so.
>
> I really like the API of networkx but have no problem in switching to
> another one (right now) .... I have the impression that graph-tool might
> be faster and have a smaller memory footprint than networkx, but am
> unsure about that.
>
> Which library would you choose? This decision is quite important for me
> as the choice will influence my libraries external interface. Or is
> there something like WSGI for graph libraries?
>
> kind regards
>

I once computed the PageRank of the English Wikipedia. I ended up using the
Boost graph library, of which there is a parallel implementation that runs
on clusters. I tried to do it using Python but failed as the memory
requirements were so large.  Boost and the parallel version both have python
interfaces.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091211/6f4cc0bb/attachment-0001.html>