very large graph

MRAB google at mrabarnett.plus.com
Tue Jun 24 05:40:08 EDT 2008


On Jun 24, 1:26 am, chrispoliq... at gmail.com wrote:
> I need to represent the hyperlinks between a large number of HTML
> files as a graph.  My non-directed graph will have about 63,000 nodes
> and and probably close to 500,000 edges.
>
> I have looked into igraph (http://cneurocvs.rmki.kfki.hu/igraph/doc/
> python/index.html) and networkX (https://networkx.lanl.gov/wiki) for
> generating a file to store the graph, and I have also looked into
> Graphviz for visualization.  I'm just not sure which modules are
> best.  I need to be able to do the following:
>
> 1)  The names of my nodes are not known ahead of time, so I will
> extract the title from all the HTML files to name the nodes prior to
> parsing the files for hyperlinks (edges).
>
> 2) Every file will be parsed for links and nondirectional connections
> will be drawn between the two nodes.
>
> 3)  The files might link to each other so the graph package needs to
> be able to check to see if an edge between two nodes already exists,
> or at least not double draw connections between the two nodes when
> adding edges.
>
> I'm relatively new to graph theory so I would greatly appreciate any
> suggestions for filetypes.  I imagine doing this as a python
> dictionary with a list for the edges and a node:list paring is out of
> the question for such a large graph?

Perhaps a dictionary where the key is a node and the value is a set of
destination nodes?



More information about the Python-list mailing list