[Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

Nathaniel Smith njs at pobox.com
Fri Mar 9 18:32:18 EST 2018


On Thu, Mar 8, 2018 at 9:45 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> On Thu, Mar 8, 2018 at 5:54 PM Juan Nunez-Iglesias <jni.soma at gmail.com>
> wrote:
>>
>> On Fri, Mar 9, 2018, at 5:56 AM, Stephan Hoyer wrote:
>>
>> Marten's case 1: works exactly like ndarray, but stores data differently:
>> parallel arrays (e.g., dask.array), sparse arrays (e.g.,
>> https://github.com/pydata/sparse), hypothetical non-strided arrays (e.g.,
>> always C ordered).
>>
>>
>> Two other "hypotheticals" that would fit nicely in this space:
>> - the Open Connectome folks (https://neurodata.io) proposed linearising
>> indices using space-filling curves, which minimizes cache misses (or IO
>> reads) for giant volumes. I believe they implemented this but can't find it
>> currently.
>> - the N5 format for chunked arrays on disk:
>> https://github.com/saalfeldlab/n5
>
>
> I think these fall into another important category of duck arrays.
> "Indexable" arrays the serve as storage, but that don't support computation.
> These sorts of arrays typically support operations like indexing and define
> handful of array-like properties (e.g., dtype and shape), but not
> arithmetic, reductions or reshaping.
>
> This means you can't quite use them as a drop-in replacement for NumPy
> arrays in all cases, but that's OK. In contrast, both dask.array and sparse
> do aspire to do fill out nearly the full numpy.ndarray API.

I'm not sure if these particular formats fall into that category or
not (isn't the point of the space-filling curves to support
cache-efficient computation?). But I suppose you're also thinking of
things like h5py.Dataset? My impression is that these are mostly
handled pretty well already by defining __array__ and/or providing
array operations that implicitly convert to ndarray -- do you agree?

This does raise an interesting point: maybe we'll eventually want an
__abstract_array__ method that asabstractarray tries calling if
defined, so e.g. if your object isn't itself an array but can be
efficiently converted into a *sparse* array, you have a way to declare
that? I think this is something to file under "worry about later,
after we have the basic infrastructure", but it's not something I'd
thought of before so mentioning here.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list