[SciPy-Dev] Cannot generate very large very sparse random matrix

CJ Carey perimosocordiae at gmail.com
Fri Nov 13 10:22:42 EST 2020


This is a known issue, see https://github.com/scipy/scipy/issues/9699.

I haven't checked on the status of numpy.random.Generator.choice() in a
while, so maybe the issue can be resolved now.

On Wed, Nov 11, 2020 at 6:46 PM Emanuele Olivetti <olivetti at fbk.eu> wrote:

> Hi,
>
> I've just noticed that it is not possible to generate very large very
> sparse random matrices with scipy.sparse.random(). For example:
>   scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
> should create a sparse matrix with only 10 non-zero values... but instead
> triggers a MemoryError:
> ----
> MemoryError                               Traceback (most recent call last)
> <ipython-input-8-eb81d3aec480> in <module>
> ----> 1 scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
>
> ~/miniconda3/envs/lap/lib/python3.8/site-packages/scipy/sparse/construct.py
> in random(m, n, density, format, dtype, random_state, data_rvs)
>     787             data_rvs = partial(random_state.uniform, 0., 1.)
>     788
> --> 789     ind = random_state.choice(mn, size=k, replace=False)
>     790
>     791     j = np.floor(ind * 1. / m).astype(tp, copy=False)
>
> mtrand.pyx in numpy.random.mtrand.RandomState.choice()
>
> mtrand.pyx in numpy.random.mtrand.RandomState.permutation()
>
> MemoryError: Unable to allocate 7.28 TiB for an array with shape
> (1000000000000,) and data type int64
> ----
>
> Here is the problematic line in current master branch of SciPy:
>
> https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L806
>
> In short, the issue is due to random_state.choice(... replace=False) which
> needs to allocate the humongous array in order to pick the ten random
> numbers...
>
> I understand the technical difficulty of generating random numbers without
> replacement, but it is quite counterintuitive that in order to generate a
> sparse random matrix it is necessary to create an equally large but *dense*
> vector first.
>
> Is there a solution to this problem?
>
> Thanks in advance,
>
> Emanuele
>
>
>
>
> --
> Le informazioni contenute nella presente comunicazione sono di natura privata
> e come tali sono da considerarsi riservate ed indirizzate esclusivamente
> ai destinatari indicati e per le finalità strettamente legate al relativo
> contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di
> eliminarlo e di inviare una comunicazione all’indirizzo e-mail del
> mittente.
> --
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. If you received this in error, please contact the sender and
> delete the material.
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20201113/f357c3a9/attachment.html>


More information about the SciPy-Dev mailing list