Multidimensional groupby by rabernat · Pull Request #818 · pydata/xarray (original) (raw)
My new commit supports unstacking in apply with shortcut=True
. However, the behavior is kind of weird, in a way that is unique to the multidimensional case.
Consider the behavior of the text case:
da = xr.DataArray([[0,1],[2,3]], coords={'lon': (['ny','nx'], [[30,40],[40,50]] ), 'lat': (['ny','nx'], [[10,10],[20,20]] ),}, dims=['ny','nx'], da.groupby('lon').apply(lambda x : x - x.mean(), shortcut=False) <xarray.DataArray (lon_groups: 3, ny: 2, nx: 2)> array([[[ 0. , nan], [ nan, nan]],
[[ nan, -0.5],
[ 0.5, nan]],
[[ nan, nan],
[ nan, 0. ]]])
Coordinates:
- ny (ny) int64 0 1
- nx (nx) int64 0 1 lat (lon_groups, ny, nx) float64 10.0 nan nan nan nan 10.0 20.0 ... lon (lon_groups, ny, nx) float64 30.0 nan nan nan nan 40.0 40.0 ...
- lon_groups (lon_groups) int64 30 40 50
When unstacking, the indices that are not part of the group get filled with nans. We are not able to put these arrays back together into a single array.
Note that if we do not rename the group name here:
https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R121
Then we get an error here:
https://github.com/pydata/xarray/pull/818/files#diff-96b65e0bfec9fd2b9d562483f53661f5R407
ValueError: the variable 'lon' has the same name as one of its dimensions ('lon', 'ny', 'nx'), but it is not 1-dimensional and thus it is not a valid index