[SciPy-Dev] Cannot generate very large very sparse random matrix

Emanuele Olivetti olivetti at fbk.eu
Wed Nov 11 18:45:23 EST 2020


Hi,

I've just noticed that it is not possible to generate very large very
sparse random matrices with scipy.sparse.random(). For example:
  scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
should create a sparse matrix with only 10 non-zero values... but instead
triggers a MemoryError:
----
MemoryError                               Traceback (most recent call last)
<ipython-input-8-eb81d3aec480> in <module>
----> 1 scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)

~/miniconda3/envs/lap/lib/python3.8/site-packages/scipy/sparse/construct.py
in random(m, n, density, format, dtype, random_state, data_rvs)
    787             data_rvs = partial(random_state.uniform, 0., 1.)
    788
--> 789     ind = random_state.choice(mn, size=k, replace=False)
    790
    791     j = np.floor(ind * 1. / m).astype(tp, copy=False)

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

mtrand.pyx in numpy.random.mtrand.RandomState.permutation()

MemoryError: Unable to allocate 7.28 TiB for an array with shape
(1000000000000,) and data type int64
----

Here is the problematic line in current master branch of SciPy:
  https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L806

In short, the issue is due to random_state.choice(... replace=False) which
needs to allocate the humongous array in order to pick the ten random
numbers...

I understand the technical difficulty of generating random numbers without
replacement, but it is quite counterintuitive that in order to generate a
sparse random matrix it is necessary to create an equally large but *dense*
vector first.

Is there a solution to this problem?

Thanks in advance,

Emanuele

-- 
--
Le informazioni contenute nella presente comunicazione sono di natura 
privata e come tali sono da considerarsi riservate ed indirizzate 
esclusivamente ai destinatari indicati e per le finalità strettamente 
legate al relativo contenuto. Se avete ricevuto questo messaggio per 
errore, vi preghiamo di eliminarlo e di inviare una comunicazione 
all’indirizzo e-mail del mittente.

--
The information transmitted is 
intended only for the person or entity to which it is addressed and may 
contain confidential and/or privileged material. If you received this in 
error, please contact the sender and delete the material.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20201112/306de0aa/attachment-0001.html>


More information about the SciPy-Dev mailing list