[Numpy-discussion] Proposal to accept NEP-18, array_function protocol (original) (raw)

Nathaniel Smith njs at pobox.com
Tue Aug 21 21:56:06 EDT 2018


On Tue, Aug 21, 2018 at 6:12 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs at pobox.com> wrote:

>> My suggestion: at numpy import time, check for an envvar, like say >> NUMPYEXPERIMENTALARRAYFUNCTION=1. If it's not set, then all the >> arrayfunction dispatches turn into no-ops. This lets interested >> downstream libraries and users try this out, but makes sure that we >> won't have a hundred thousand end users depending on it without >> realizing. >> >> >> >> - makes it easy for end-users to check how much overhead this adds (by >> running their code with it enabled vs disabled) >> - if/when we decide to commit to supporting it for real, we just >> remove the envvar. > > > I'm slightly concerned that the cost of reading an environment variable > with > os.environ could exaggerate the performance cost of arrayfunction. > It > takes about 1 microsecond to read an environment variable on my laptop, > which is comparable to the full overhead of arrayfunction. That's why I said "at numpy import time" :-). I was imagining we'd check it once at import, and then from then on it'd be stashed in some C global, so after that the overhead would just be a single predictable branch 'if (arrayfunctionisenabled) { ... }'. Indeed, I missed the "at numpy import time" bit :). In that case, I'm concerned that it isn't always possible to set environment variables once before importing NumPy. The environment variable solution works great if users have full control of their own Python binaries, but that isn't always the case today in this era of server-less infrastructure and online notebooks. One example offhand is Google's Colaboratory (https://research.google.com/colaboratory), a web based Jupyter notebook. NumPy is always loaded when a notebook is opened, as you can check from inspecting sys.modules. Now, I work with the developers of Colaboratory, so we could probably figure out a work-around together, but I'm pretty sure this would also come up in the context of other tools.

I mean, the idea of the envvar is to be a temporary measure enable devs to experiment with a provisional feature, while being awkward enough that people don't build lots of stuff assuming its there. It doesn't have to 100% supported in every environment.

Another problem is unit testing. Does pytest use a separate Python process for running the tests in each file? I don't know and that feels like an implementation detail that I shouldn't have to know :). Yes, in principle I could use a subprocess in my arrayfunction for unit tests, but that would be really awkward.

Set the envvar before invoking pytest?

For numpy itself we'll need to write a few awkward tests involving subprocesses to make sure the envvar parsing is working properly, but I don't think this is a big deal. As long as we only have 1-2 places that array_function dispatch funnels through, we just need to make sure that they work properly with/without the envvar; no need to test every API separately. Or if it is an issue we can have some private API that's only available to the numpy test suite...

> So we may > want to switch to an explicit Python API instead, e.g., > np.enableexperimentalarrayfunction().

If we do this, then libraries that want to use arrayfunction will just call it themselves at import time. The point of the env-var is that our policy is not to break end-users, so if we want an API to be provisional and experimental then it's end-users who need to be aware of that before using it. (This is also an advantage of checking the envvar only at import time: it means libraries can't easily just setenv() to enable the functionality behind users' backs.) I'm in complete agreement that only authors of end-user applications should invoke this option, but just because something is technically possible doesn't mean that people will actually do it or that we need to support that use case :).

I didn't say "authors of end-user applications", I said "end-users" :-).

That said, I dunno. My intuition is that if we have a function call like this then libraries that define array_function will merrily call it in their package init and it accomplishes nothing, but maybe I'm being too cynical and untrusting.

-n

-- Nathaniel J. Smith -- https://vorpus.org



More information about the NumPy-Discussion mailing list