[Numpy-discussion] Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith njs at pobox.com
Tue Aug 21 21:56:06 EDT 2018


On Tue, Aug 21, 2018 at 6:12 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs at pobox.com> wrote:
>>
>> >> My suggestion: at numpy import time, check for an envvar, like say
>> >> NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1. If it's not set, then all the
>> >> __array_function__ dispatches turn into no-ops. This lets interested
>> >> downstream libraries and users try this out, but makes sure that we
>> >> won't have a hundred thousand end users depending on it without
>> >> realizing.
>> >>
>> >>
>> >>
>> >> - makes it easy for end-users to check how much overhead this adds (by
>> >> running their code with it enabled vs disabled)
>> >> - if/when we decide to commit to supporting it for real, we just
>> >> remove the envvar.
>> >
>> >
>> > I'm slightly concerned that the cost of reading an environment variable
>> > with
>> > os.environ could exaggerate the performance cost of __array_function__.
>> > It
>> > takes about 1 microsecond to read an environment variable on my laptop,
>> > which is comparable to the full overhead of __array_function__.
>>
>> That's why I said "at numpy import time" :-). I was imagining we'd
>> check it once at import, and then from then on it'd be stashed in some
>> C global, so after that the overhead would just be a single
>> predictable branch 'if (array_function_is_enabled) { ... }'.
>
>
> Indeed, I missed the "at numpy import time" bit :).
>
> In that case, I'm concerned that it isn't always possible to set environment
> variables once before importing NumPy. The environment variable solution
> works great if users have full control of their own Python binaries, but
> that isn't always the case today in this era of server-less infrastructure
> and online notebooks.
>
> One example offhand is Google's Colaboratory
> (https://research.google.com/colaboratory), a web based Jupyter notebook.
> NumPy is always loaded when a notebook is opened, as you can check from
> inspecting sys.modules. Now, I work with the developers of Colaboratory, so
> we could probably figure out a work-around together, but I'm pretty sure
> this would also come up in the context of other tools.

I mean, the idea of the envvar is to be a temporary measure enable
devs to experiment with a provisional feature, while being awkward
enough that people don't build lots of stuff assuming its there. It
doesn't have to 100% supported in every environment.

> Another problem is unit testing. Does pytest use a separate Python process
> for running the tests in each file? I don't know and that feels like an
> implementation detail that I shouldn't have to know :). Yes, in principle I
> could use a subprocess in my __array_function__ for unit tests, but that
> would be really awkward.

Set the envvar before invoking pytest?

For numpy itself we'll need to write a few awkward tests involving
subprocesses to make sure the envvar parsing is working properly, but
I don't think this is a big deal. As long as we only have 1-2 places
that __array_function__ dispatch funnels through, we just need to make
sure that they work properly with/without the envvar; no need to test
every API separately. Or if it is an issue we can have some private
API that's only available to the numpy test suite...

>> > So we may
>> > want to switch to an explicit Python API instead, e.g.,
>> > np.enable_experimental_array_function().
>>
>> If we do this, then libraries that want to use __array_function__ will
>> just call it themselves at import time. The point of the env-var is
>> that our policy is not to break end-users, so if we want an API to be
>> provisional and experimental then it's end-users who need to be aware
>> of that before using it. (This is also an advantage of checking the
>> envvar only at import time: it means libraries can't easily just
>> setenv() to enable the functionality behind users' backs.)
>
>
> I'm in complete agreement that only authors of end-user applications should
> invoke this option, but just because something is technically possible
> doesn't mean that people will actually do it or that we need to support that
> use case :).

I didn't say "authors of end-user applications", I said "end-users" :-).

That said, I dunno. My intuition is that if we have a function call
like this then libraries that define __array_function__ will merrily
call it in their package __init__ and it accomplishes nothing, but
maybe I'm being too cynical and untrusting.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list