[SciPy-user] Creating coo_matrix from data in text file

Nathan Bell wnbell at gmail.com
Tue Feb 5 19:07:51 EST 2008


On Feb 5, 2008 5:08 PM, Dinesh B Vadhia <dineshbvadhia at hotmail.com> wrote:
> The sparse coo_matrix method performs really well but our data sets are very
> large and the working arrays (ie. ij, row, column and data) take up
> significant memory.  The judicious use of <del working array object> helps
> but not that much.
>
> Is there a fast method available similar to coo_matrix to create a sparse
> matrix from a text file instead of through a set of interim working arrays?
> The file would contain the coordinates (i, j) and the value of each item.
> Once the sparse matrix has been created we can then save/load it at will
> (using Andrew Straw's fast load/save code).

Suppose you have a file named matrix.txt with the following contents:

$ cat matrix.txt
0 1 10
0 2 20
5 3 -5
6 4 14


now run this script:

from numpy import fromfile
from scipy.sparse import coo_matrix

IJV = fromfile("matrix.txt",sep=" ").reshape(-1,3)

row  = IJV[:,0]
col  = IJV[:,1]
data = IJV[:,2]

A = coo_matrix( (data,(row,col)) )

print repr(A)
print A.todense()



You should see:

<7x5 sparse matrix of type '<type 'numpy.float64'>'
        with 4 stored elements in COOrdinate format>
[[  0.  10.  20.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.   0.  -5.   0.]
 [  0.   0.   0.   0.  14.]]


This should be very fast.  The only thing that would be faster is the
recent scipy.io MATLAB file support which stores data in binary format
(or storing your own binary format I suppose)


-- 
Nathan Bell wnbell at gmail.com
http://graphics.cs.uiuc.edu/~wnbell/



More information about the SciPy-User mailing list