Multidimensional groupby by rabernat · Pull Request #818 · pydata/xarray (original) (raw)

My new commit supports unstacking in apply with shortcut=True. However, the behavior is kind of weird, in a way that is unique to the multidimensional case.

Consider the behavior of the text case:

da = xr.DataArray([[0,1],[2,3]], coords={'lon': (['ny','nx'], [[30,40],[40,50]] ), 'lat': (['ny','nx'], [[10,10],[20,20]] ),}, dims=['ny','nx'], da.groupby('lon').apply(lambda x : x - x.mean(), shortcut=False) <xarray.DataArray (lon_groups: 3, ny: 2, nx: 2)> array([[[ 0. , nan], [ nan, nan]],

   [[ nan, -0.5],
    [ 0.5,  nan]],

   [[ nan,  nan],
    [ nan,  0. ]]])

Coordinates:

ny (ny) int64 0 1
nx (nx) int64 0 1 lat (lon_groups, ny, nx) float64 10.0 nan nan nan nan 10.0 20.0 ... lon (lon_groups, ny, nx) float64 30.0 nan nan nan nan 40.0 40.0 ...
lon_groups (lon_groups) int64 30 40 50

When unstacking, the indices that are not part of the group get filled with nans. We are not able to put these arrays back together into a single array.

Note that if we do not rename the group name here:
https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R121

Then we get an error here:
https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R407

ValueError: the variable 'lon' has the same name as one of its dimensions ('lon', 'ny', 'nx'), but it is not 1-dimensional and thus it is not a valid index