[SciPy-Dev] Cannot generate very large very sparse random matrix

Emanuele Olivetti olivetti at fbk.eu
Tue Jan 19 05:22:52 EST 2021


Thanks for the tip!

Emanuele

On Mon, Jan 18, 2021 at 9:36 PM CJ Carey <perimosocordiae at gmail.com> wrote:

> Sorry for such a late response to this thread, but I wanted to point out
> another workaround that should help users with numpy 1.17+. You can pass a
> `random_state` parameter to scipy.sparse.random, which will accept a
> new-style Generator object.
>
> So if you amend your example to:
>
> scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11, random_state =
> np.random.default_rng())
>
> then you'll get the fast behavior.
>
> On Fri, Nov 13, 2020 at 6:29 PM Emanuele Olivetti <olivetti at fbk.eu> wrote:
>
>> Thank you for your response. Indeed numpy.random.Generator.choice()
>> solves the problem:
>> ----
>> rng = np.random.default_rng()
>> rng.choice(1_000_0000_000_000_000, size=10, replace=False)
>>
>> array([7363643319410659, 1001129358099623, 7384908776761990,
>>        3610742892883208, 9484192959193500, 6273686405826185,
>>        1550972534180773, 1845765940909299,  144504113475750,
>>        7853188631204629])
>> ----
>> while:
>> ----
>> np.random.choice(1_000_0000_000_000_000, size=10, replace=False)
>>
>> ---------------------------------------------------------------------------
>> MemoryError                               Traceback (most recent call
>> last)
>> <ipython-input-11-95b556ac15b9> in <module>
>> ----> 1 np.random.choice(1_000_0000_000_000_000, size=10, replace=False)
>>
>> mtrand.pyx in numpy.random.mtrand.RandomState.choice()
>>
>> mtrand.pyx in numpy.random.mtrand.RandomState.permutation()
>>
>> MemoryError: Unable to allocate 71.1 PiB for an array with shape
>> (10000000000000000,) and data type int64
>> ----
>>
>> According to the latest comment on the github issue you mentioned: "It
>> looks like np.random.Generator should be available from numpy 1.17 on, and
>> the current minimum numpy version is 1.16.5."... So this may require a
>> little while...
>>
>> As a quick fix but also meaningful new feature, would it be possible to
>> extend the API of scipy.sparse.random() and to add the option
>> "replace=False" (then piped to np.random.choice()) which, if set to True,
>> would give the liberty to the user to solve the issue for very large very
>> sparse matrices at the cost of some (rare) collisions? I would gladly
>> accept it - and that's also my current fix on my local SciPy.
>>
>> Best,
>>
>> Emanuele
>>
>>
>>
>> On Fri, Nov 13, 2020 at 4:23 PM CJ Carey <perimosocordiae at gmail.com>
>> wrote:
>>
>>> This is a known issue, see https://github.com/scipy/scipy/issues/9699.
>>>
>>> I haven't checked on the status of numpy.random.Generator.choice() in a
>>> while, so maybe the issue can be resolved now.
>>>
>>> On Wed, Nov 11, 2020 at 6:46 PM Emanuele Olivetti <olivetti at fbk.eu>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've just noticed that it is not possible to generate very large very
>>>> sparse random matrices with scipy.sparse.random(). For example:
>>>>   scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
>>>> should create a sparse matrix with only 10 non-zero values... but
>>>> instead triggers a MemoryError:
>>>> ----
>>>> MemoryError                               Traceback (most recent call
>>>> last)
>>>> <ipython-input-8-eb81d3aec480> in <module>
>>>> ----> 1 scipy.sparse.random(1_000_000, 1_000_000, density = 1e-11)
>>>>
>>>> ~/miniconda3/envs/lap/lib/python3.8/site-packages/scipy/sparse/construct.py
>>>> in random(m, n, density, format, dtype, random_state, data_rvs)
>>>>     787             data_rvs = partial(random_state.uniform, 0., 1.)
>>>>     788
>>>> --> 789     ind = random_state.choice(mn, size=k, replace=False)
>>>>     790
>>>>     791     j = np.floor(ind * 1. / m).astype(tp, copy=False)
>>>>
>>>> mtrand.pyx in numpy.random.mtrand.RandomState.choice()
>>>>
>>>> mtrand.pyx in numpy.random.mtrand.RandomState.permutation()
>>>>
>>>> MemoryError: Unable to allocate 7.28 TiB for an array with shape
>>>> (1000000000000,) and data type int64
>>>> ----
>>>>
>>>> Here is the problematic line in current master branch of SciPy:
>>>>
>>>> https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L806
>>>>
>>>> In short, the issue is due to random_state.choice(... replace=False)
>>>> which needs to allocate the humongous array in order to pick the ten random
>>>> numbers...
>>>>
>>>> I understand the technical difficulty of generating random numbers
>>>> without replacement, but it is quite counterintuitive that in order to
>>>> generate a sparse random matrix it is necessary to create an equally large
>>>> but *dense* vector first.
>>>>
>>>> Is there a solution to this problem?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Emanuele
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Le informazioni contenute nella presente comunicazione sono di natura privata
>>>> e come tali sono da considerarsi riservate ed indirizzate esclusivamente
>>>> ai destinatari indicati e per le finalità strettamente legate al
>>>> relativo contenuto. Se avete ricevuto questo messaggio per errore, vi
>>>> preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo
>>>> e-mail del mittente.
>>>> --
>>>> The information transmitted is intended only for the person or entity
>>>> to which it is addressed and may contain confidential and/or privileged
>>>> material. If you received this in error, please contact the sender and
>>>> delete the material.
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>
>> --
>> Le informazioni contenute nella presente comunicazione sono di natura privata
>> e come tali sono da considerarsi riservate ed indirizzate esclusivamente
>> ai destinatari indicati e per le finalità strettamente legate al
>> relativo contenuto. Se avete ricevuto questo messaggio per errore, vi
>> preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo
>> e-mail del mittente.
>> --
>> The information transmitted is intended only for the person or entity to
>> which it is addressed and may contain confidential and/or privileged
>> material. If you received this in error, please contact the sender and
>> delete the material.
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>

-- 
--
Le informazioni contenute nella presente comunicazione sono di natura 
privata e come tali sono da considerarsi riservate ed indirizzate 
esclusivamente ai destinatari indicati e per le finalità strettamente 
legate al relativo contenuto. Se avete ricevuto questo messaggio per 
errore, vi preghiamo di eliminarlo e di inviare una comunicazione 
all’indirizzo e-mail del mittente.

--
The information transmitted is 
intended only for the person or entity to which it is addressed and may 
contain confidential and/or privileged material. If you received this in 
error, please contact the sender and delete the material.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210119/0a26db5b/attachment-0001.html>


More information about the SciPy-Dev mailing list