[Numpy-discussion] A little about XND (original) (raw)

Stefan Krah skrah at bytereef.org
Mon Jun 18 15:09:50 EDT 2018


Hi Marten,

On Mon, Jun 18, 2018 at 12:34:03PM -0400, Marten van Kerkwijk wrote:

That looks quite nice and expressive. In the context of a discussion we have been having about describing matmul/@ and possibly broadcastable dimensions, I think from your description it sounds like one would describe @ with multiple functions (the multiple dispatch we have been (are?) considering as well):

"... * N * M * T, ... * M * P * T -> ... * N * P * T" "M * T, ... * M * P * T -> ... P * T" "... * N * M * T, M * T -> ... * N * T" "M * T, M * T -> T"

Yes, that's the way, and the outer dimensions (the part matched by the ellipsis) are always broadcast like in NumPy.

Is there a way to describe broadcasting? The sample case we've come up with is a function that calculates a weighted mean. This might take (values, sigmas) and return (mean, sigmamean), which would imply a signature like:

"... N * T, ... N * T -> ... * T, ... * T" But would your signature allow indicating that one could pass in a single sigma? I.e., broadcast the second 1 to N if needed?

Actually I came across this today when implementing optimized matching for binary functions.

I wanted the faster kernel

"... * N * int64, ... * N * int64 -> ... * N * int64"

to also match e.g. the input

"int64, 10 * int64".

The generic datashape spec would forbid this, but perhaps the '?' that you propose in nep-0020 would offer a way out of this for ndtypes.

It's a bit confusing for datashape, since there is already a questionmark for missing variable dimensions (that have shape==0 in the data).

ndt("var * ?var * int64") ndt("var * ?var * int64")

This would be the type for e.g. [[0], None, [1,2,3]].

But for symbolic dimensions (which only match fixed dimensions) perhaps this

"... * ?N * int64, ... * ?N * int64 -> ... * ?N * int64"

or, as in the NEP,

"... * N? * int64, ... * N? * int64 -> ... * N? * int64"

should mean "At least one input has ndim >= 1, broadcast as necessary".

This still means that for the "all ndim==0" case one would need an additional kernel "int64, int64 -> int64".

I realize that this is no longer about describing precisely what the function doing the calculation expects, but rather what an upper level is allowed to do before calling the function (i.e., take a dimension of 1 and broadcast it).

Yes, for datashape the problem is that it also allows non-broadcastable signatures like "N * float64", really the same as "double x[]" in C.

But the '?' with occasionally one additional kernel for ndim==0 could solve this.

Stefan Krah



More information about the NumPy-Discussion mailing list