[Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs (original) (raw)
Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Mon Jun 11 10:59:49 EDT 2018
- Previous message (by thread): [Numpy-discussion] Fwd: Allowing broadcasting of code dimensions in generalized ufuncs
- Next message (by thread): [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Nathaniel: Output shape feels very similar to output dtype to me, so maybe the general way to handle this would be to make the first callback take the input shapes+dtypes and return the desired output shapes+dtypes? This hits on an interesting alternative to frozen dimensions - np.cross could just become a regular ufunc with signature np.dtype((float64, 3)), np.dtype((float64, 3)) → np.dtype((float64, 3)) As you note further down, the present proposal of just using numbers has the advantage of being clear and easy. Another (small?) advantage is that I can use
axis
to tell where my three coordinates are, rather than be stuck with having them as the last dimension.
Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy routines, I started with just making every 3-vector and 3x3-matrix structured arrays with the relevant single sub-array entry. That worked, but I ended up disliking the casting to and fro.
Furthermore, the expansion quickly becomes cumbersome. For instance, for the allequal signature of (n|1),(n|1)->() …
I think this is only a good argument when used in conjunction with the broadcasting syntax. I don’t think it’s a reason for matmul not to have multiple signatures. Having multiple signatures is an disincentive to introduced too many overloads of the same function, which seems like a good thing to me But implementation for matmul is actually considerably trickier, since the internal loop now has to check the number of distinct dimensions.
Summarizing my overall opinions:
- I’m +0.5 on frozen dimensions. The use-cases seem reasonable, and it seems like an easy-ish way to get them. Allowing ufuncs to natively support subarray types might be a tidier solution, but that could come down the road Indeed, they are not mutually exclusive. My guess would be that the use cases would be somewhat different.
- I’m -1 on optional dimensions: they seem to legitimize creating many overloads of gufuncs. I’m already not a fan of how matmul has special cases for lower dimensions that don’t generalize well. To me, the best way to handle matmul would be to use the proposed arrayfunction to handle the shape-based special-case dispatching, either by: - Inserting dimensions, and calling the true gufunc np.linalg.matmul2d (which is a function I’d like direct access to anyway). - Dispatching to one of four ufuncs I must admit I wish that
@
was just pure matrix multiplication... But otherwise agree with Stephan as optional dimensions being the least-bad solution.
Aside: do agree we should think about how to expose the linalg
gufuncs.
- Broadcasting dimensions: - I know you’re not suggesting this but: enabling broadcasting unconditionally for all gufuncs would be a bad idea, masking linalg bugs. (although einsum does support broadcasting…) Indeed, definitely not suggesting that!
- - Does it really need a per-dimension flag, rather than a global one? Can you give a case where that’s useful? Mostly simply that the implementation is easier given the optional dimensions... Also, it has the benefit of being clear what the function can handle by inspection of the signature, i.e., it self-documents better (one of my main arguments in favour of frozen dimensions...).
- - If we’d already made allequal a gufunc, I’d be +1 on adding broadcasting support to it - I’m -0.5 on the allequal path in the first place. I think we either should have a more generic approach to combined ufuncs, or just declare them numbas job. I am working on and off on a way to generically chain ufuncs (goal would be to auto-create an inner loop that calls all the chained ufuncs loops in turn). Not sure that short-circuiting will be all that easy.
I actually quite like the all_equal ufunc, but it is in part because I
remember discovering how painfully slow (a==b).all() was (and still have a
place where I would use it if it existed). And it does fit in the
(admittedly vague) plans to try to make .reduce
a gufunc.
- - Can you come up with a broadcasting use-case that isn’t just chaining a reduction with a broadcasting ufunc? Perhaps the use is that it allows people to write gufuncs that are like such functions... Absent a mechanism to chain ufuncs, more complicated gufuncs are currently the easiest way to get fast more complicated algebra.
But perhaps a putative
weighted_mean(y, sigma) -> mean, sigma_mean
is a decent example? Its signature would be
(n),(n)->(),()
but then you're forced to give individual sigmas for each point. With
(n|1),(n|1)->(),()
you are no longer forced to do that (though the case of all y being the same is less than useful here... I did at some point have an implementation that worked by core dimension of each argument, but ended up feeling it was not worth the extra complication)
-- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/fcc7b4b2/attachment.html>
- Previous message (by thread): [Numpy-discussion] Fwd: Allowing broadcasting of code dimensions in generalized ufuncs
- Next message (by thread): [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]