[Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API (original) (raw)

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Sun Jun 3 19:23:58 EDT 2018


In most cases, I suspect that the overhead of a function call and checking several arguments for "arrayfunction" will be negligible, like the situation for arrayufunc. I'm not strongly opposed to either of your proposed solutions, but I do think it would be a little strange to insist that we need a solution for arrayfunction when arrayufunc was fine.

Ufuncs actually do try to speed-up array checks - but indeed the same can (and should) be done for __array_ufunc__. They also do have subok. This currently ignored but that is mostly because looking for it in kwargs is so damn slow!

Anyway, my main point was that it should be explicitly mentioned as a constraint that for pure ndarray input, things should be really fast.

A. Two "namespaces", one for the undecorated base functions, and one completely trivial one for the decorated ones. The idea would be that if one knows one is dealing with arrays only, one would do import_ _numpy.arrayonly as np (i.e., the reverse of the suggestion currently in the NEP, where the decorated ones are in their own namespace - I agree with the reasons for discounting that one).

I will mention this as a possibility. I do think there is something to be said for clear separation of overloaded and non-overloaded APIs. But f I were to choose between adding numpy.api and numpy.arrayonly, I would pick numpy.api, because of the virtue of preserving the existing numpy namespace as it currently exists.

Good point. Overall, the separate namespaces probably is not the way to do.

B. Automatic insertion by the decorator of an arrayonly=np.NoValue (or coerce and perhaps subok=... if not present) in the function signature, so that users who know that they have arrays only could pass arrayonly=True (name to be decided).

Rather than adding another argument to every NumPy function, I would rather encourage writing np.asarray() explicitly.

Good point - just as good as long as the check for all-array is very fast (which it should be - arg.__class__ is np.ndarray is fast!).

Note that both A and B could also address, at least partially, the problem

of sometimes wanting to just use the old coercion methods, i.e., not having to implement every possible numpy function in one go in a new _arrayfunction_ on one's class.

Yes, agreed. 1. I'm rather unclear about the use of types. It can help me decide what to do, but I would still have to find the argument in question (e.g., for Quantity, the unit of the relevant argument). I'd recommend passing instead a tuple of all arguments that were inspected, in the inspection order; after all, it is just a arg._class_ away from the type, and in your example you'd only have to replace issubclass by isinstance. The virtue of a types argument is that we can deduplicate arguments once, rather than in each arrayfunction check. This could result in significantly more efficient code, e.g,. when np.concatenate() is called on 10,000 arrays with only two unique types, we don't need to loop through all 10,000 again objects to check that overloading is valid.

I think one might still want to know where the type occurs (e.g., as an output or index would have different implications). Possibly, a solution would rely on the same structure as used for the "dance". But as a general point, I don't see the advantage of passing types rather than arguments - less information for no benefit.

Even for Quantity, I suspect you will want two layers of checks: 1. A check to verify that every argument is a Quantity (or something coercible to a Quantity). This could use types and return NotImplemented when it fails. 2. A check to verify that units match. This will have custom logic for different operations and will require checking all arguments -- not just their unique types.

Not sure. With, Quantity I generally do not worry about other types, but rather look at units attributes, assume anything without is dimensionless, cast Quantity to array with the right unit, and then defer to ndarray.

For many Quantity functions, the second check will indeed probably be super simple (i.e., verifying that all units match). But the first check (with types) really is something that basically very overload should do.

2. For subclasses, it would be very handy to have ndarray._arrayfunction_, so one can call super after changing arguments. (For _arrayufunc_, there was lots of question about whether this was useful, but it really is!!). [I think you already agreed with this, but want to have it in-place, as for subclasses of ndarray this is just as useful as it would be for subclasses of dask arrays.) Yes, indeed.


NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/55f6e218/attachment.html>



More information about the NumPy-Discussion mailing list