[SciPy-Dev] Cannot generate very large very sparse random matrix

CJ Carey perimosocordiae at gmail.com
Mon Jan 18 15:36:15 EST 2021


Sorry for such a late response to this thread, but I wanted to point out
another workaround that should help users with numpy 1.17+. You can pass a
`random_state` parameter to scipy.sparse.random, which will accept a
new-style Generator object.

So if you amend your example to:

scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11, random_state =
np.random.default_rng())

then you'll get the fast behavior.

On Fri, Nov 13, 2020 at 6:29 PM Emanuele Olivetti <olivetti at fbk.eu> wrote:

> Thank you for your response. Indeed numpy.random.Generator.choice() solves
> the problem:
> ----
> rng = np.random.default_rng()
> rng.choice(1_000_0000_000_000_000, size=10, replace=False)
>
> array([7363643319410659, 1001129358099623, 7384908776761990,
>        3610742892883208, 9484192959193500, 6273686405826185,
>        1550972534180773, 1845765940909299,  144504113475750,
>        7853188631204629])
> ----
> while:
> ----
> np.random.choice(1_000_0000_000_000_000, size=10, replace=False)
> ---------------------------------------------------------------------------
> MemoryError                               Traceback (most recent call last)
> <ipython-input-11-95b556ac15b9> in <module>
> ----> 1 np.random.choice(1_000_0000_000_000_000, size=10, replace=False)
>
> mtrand.pyx in numpy.random.mtrand.RandomState.choice()
>
> mtrand.pyx in numpy.random.mtrand.RandomState.permutation()
>
> MemoryError: Unable to allocate 71.1 PiB for an array with shape
> (10000000000000000,) and data type int64
> ----
>
> According to the latest comment on the github issue you mentioned: "It
> looks like np.random.Generator should be available from numpy 1.17 on, and
> the current minimum numpy version is 1.16.5."... So this may require a
> little while...
>
> As a quick fix but also meaningful new feature, would it be possible to
> extend the API of scipy.sparse.random() and to add the option
> "replace=False" (then piped to np.random.choice()) which, if set to True,
> would give the liberty to the user to solve the issue for very large very
> sparse matrices at the cost of some (rare) collisions? I would gladly
> accept it - and that's also my current fix on my local SciPy.
>
> Best,
>
> Emanuele
>
>
>
> On Fri, Nov 13, 2020 at 4:23 PM CJ Carey <perimosocordiae at gmail.com>
> wrote:
>
>> This is a known issue, see https://github.com/scipy/scipy/issues/9699.
>>
>> I haven't checked on the status of numpy.random.Generator.choice() in a
>> while, so maybe the issue can be resolved now.
>>
>> On Wed, Nov 11, 2020 at 6:46 PM Emanuele Olivetti <olivetti at fbk.eu>
>> wrote:
>>
>>> Hi,
>>>
>>> I've just noticed that it is not possible to generate very large very
>>> sparse random matrices with scipy.sparse.random(). For example:
>>>   scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
>>> should create a sparse matrix with only 10 non-zero values... but
>>> instead triggers a MemoryError:
>>> ----
>>> MemoryError                               Traceback (most recent call
>>> last)
>>> <ipython-input-8-eb81d3aec480> in <module>
>>> ----> 1 scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
>>>
>>> ~/miniconda3/envs/lap/lib/python3.8/site-packages/scipy/sparse/construct.py
>>> in random(m, n, density, format, dtype, random_state, data_rvs)
>>>     787             data_rvs = partial(random_state.uniform, 0., 1.)
>>>     788
>>> --> 789     ind = random_state.choice(mn, size=k, replace=False)
>>>     790
>>>     791     j = np.floor(ind * 1. / m).astype(tp, copy=False)
>>>
>>> mtrand.pyx in numpy.random.mtrand.RandomState.choice()
>>>
>>> mtrand.pyx in numpy.random.mtrand.RandomState.permutation()
>>>
>>> MemoryError: Unable to allocate 7.28 TiB for an array with shape
>>> (1000000000000,) and data type int64
>>> ----
>>>
>>> Here is the problematic line in current master branch of SciPy:
>>>
>>> https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L806
>>>
>>> In short, the issue is due to random_state.choice(... replace=False)
>>> which needs to allocate the humongous array in order to pick the ten random
>>> numbers...
>>>
>>> I understand the technical difficulty of generating random numbers
>>> without replacement, but it is quite counterintuitive that in order to
>>> generate a sparse random matrix it is necessary to create an equally large
>>> but *dense* vector first.
>>>
>>> Is there a solution to this problem?
>>>
>>> Thanks in advance,
>>>
>>> Emanuele
>>>
>>>
>>>
>>>
>>> --
>>> Le informazioni contenute nella presente comunicazione sono di natura privata
>>> e come tali sono da considerarsi riservate ed indirizzate esclusivamente
>>> ai destinatari indicati e per le finalità strettamente legate al
>>> relativo contenuto. Se avete ricevuto questo messaggio per errore, vi
>>> preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo
>>> e-mail del mittente.
>>> --
>>> The information transmitted is intended only for the person or entity to
>>> which it is addressed and may contain confidential and/or privileged
>>> material. If you received this in error, please contact the sender and
>>> delete the material.
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
> --
> Le informazioni contenute nella presente comunicazione sono di natura privata
> e come tali sono da considerarsi riservate ed indirizzate esclusivamente
> ai destinatari indicati e per le finalità strettamente legate al relativo
> contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di
> eliminarlo e di inviare una comunicazione all’indirizzo e-mail del
> mittente.
> --
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. If you received this in error, please contact the sender and
> delete the material.
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210118/be67cf5b/attachment.html>


More information about the SciPy-Dev mailing list