[Numpy-discussion] Proposal to accept NEP-18, array_function protocol (original) (raw)

Stephan Hoyer shoyer at gmail.com
Tue Aug 21 21:12:25 EDT 2018


On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs at pobox.com> wrote:

>> My suggestion: at numpy import time, check for an envvar, like say >> NUMPYEXPERIMENTALARRAYFUNCTION=1. If it's not set, then all the >> arrayfunction dispatches turn into no-ops. This lets interested >> downstream libraries and users try this out, but makes sure that we >> won't have a hundred thousand end users depending on it without >> realizing. >> >> >> >> - makes it easy for end-users to check how much overhead this adds (by >> running their code with it enabled vs disabled) >> - if/when we decide to commit to supporting it for real, we just >> remove the envvar. > > > I'm slightly concerned that the cost of reading an environment variable with > os.environ could exaggerate the performance cost of arrayfunction. It > takes about 1 microsecond to read an environment variable on my laptop, > which is comparable to the full overhead of arrayfunction.

That's why I said "at numpy import time" :-). I was imagining we'd check it once at import, and then from then on it'd be stashed in some C global, so after that the overhead would just be a single predictable branch 'if (arrayfunctionisenabled) { ... }'.

Indeed, I missed the "at numpy import time" bit :).

In that case, I'm concerned that it isn't always possible to set environment variables once before importing NumPy. The environment variable solution works great if users have full control of their own Python binaries, but that isn't always the case today in this era of server-less infrastructure and online notebooks.

One example offhand is Google's Colaboratory ( https://research.google.com/colaboratory), a web based Jupyter notebook. NumPy is always loaded when a notebook is opened, as you can check from inspecting sys.modules. Now, I work with the developers of Colaboratory, so we could probably figure out a work-around together, but I'm pretty sure this would also come up in the context of other tools.

Another problem is unit testing. Does pytest use a separate Python process for running the tests in each file? I don't know and that feels like an implementation detail that I shouldn't have to know :). Yes, in principle I could use a subprocess in my array_function for unit tests, but that would be really awkward.

So we may > want to switch to an explicit Python API instead, e.g., > np.enableexperimentalarrayfunction().

If we do this, then libraries that want to use arrayfunction will just call it themselves at import time. The point of the env-var is that our policy is not to break end-users, so if we want an API to be provisional and experimental then it's end-users who need to be aware of that before using it. (This is also an advantage of checking the envvar only at import time: it means libraries can't easily just setenv() to enable the functionality behind users' backs.)

I'm in complete agreement that only authors of end-user applications should invoke this option, but just because something is technically possible doesn't mean that people will actually do it or that we need to support that use case :).

numpy.seterr() is a good example. It allows users to globally set how NumPy does error handling, but well written libraries still don't do that.

TensorFlow has similar function tf.enable_eager_execution() for enabling "eager mode" that is also worth examining: https://www.tensorflow.org/api_docs/python/tf/enable_eager_execution

To solve the testing issue, they wrote decorator for using with tests, run_in_graph_and_eager_modes(): https://www.tensorflow.org/api_docs/python/tf/contrib/eager/run_test_in_graph_and_eager_modes -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180821/48c1c13f/attachment.html>



More information about the NumPy-Discussion mailing list