[scikit-learn] [semi-supervised learning] Using a pre-existing graph with LabelSpreading API

Thu Dec 1 22:33:33 EST 2016

Hello,

I have an existing graph dataset in the edge format:

node_i node_j weight

The number of nodes are around 3.6M, and the number of edges are around 72M.

I also have some labeled data (around a dozen per class with 16 classes in
total), so overall, a perfect setting for label propagation or its
variants. In particular, I want to try the LabelSpreading implementation
for the regularization. I looked at the documentation and can't find a way
to plug in a pre-computed graph (or adjacency matrix). So two questions:

1. What are any scaling issues I should be aware of for a dataset of this
size? I can try sparsifying the graph, but would love to learn any knobs I
should be aware of.
2. How do I plugin an existing weighted graph with the current API? Happy
to use any undocumented features.

Thanks in advance!
Delip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161202/41cd8ffa/attachment.html>