xarray.combine_nested (original) (raw)

xarray.combine_nested(datasets, concat_dim, compat='no_conflicts', data_vars='all', coords='different', fill_value=, join='outer', combine_attrs='drop')[source]#

Explicitly combine an N-dimensional grid of datasets into one by using a succession of concat and merge operations along each dimension of the grid.

Does not sort the supplied datasets under any circumstances, so the datasets must be passed in the order you wish them to be concatenated. It does align coordinates, but different variables on datasets can cause it to fail under some scenarios. In complex cases, you may need to clean up your data and use concat/merge explicitly.

To concatenate along multiple dimensions the datasets must be passed as a nested list-of-lists, with a depth equal to the length of concat_dims.combine_nested will concatenate along the top-level list first.

Useful for combining datasets from a set of nested directories, or for collecting the output of a simulation parallelized along multiple dimensions.

Parameters:

Returns:

combined (xarray.Dataset)

Examples

A common task is collecting data from a parallelized simulation in which each process wrote out to a separate file. A domain which was decomposed into 4 parts, 2 each along both the x and y axes, requires organising the datasets into a doubly-nested list, e.g:

x1y1 = xr.Dataset( ... { ... "temperature": (("x", "y"), np.random.randn(2, 2)), ... "precipitation": (("x", "y"), np.random.randn(2, 2)), ... } ... ) x1y1 <xarray.Dataset> Size: 64B Dimensions: (x: 2, y: 2) Dimensions without coordinates: x, y Data variables: temperature (x, y) float64 32B 1.764 0.4002 0.9787 2.241 precipitation (x, y) float64 32B 1.868 -0.9773 0.9501 -0.1514 x1y2 = xr.Dataset( ... { ... "temperature": (("x", "y"), np.random.randn(2, 2)), ... "precipitation": (("x", "y"), np.random.randn(2, 2)), ... } ... ) x2y1 = xr.Dataset( ... { ... "temperature": (("x", "y"), np.random.randn(2, 2)), ... "precipitation": (("x", "y"), np.random.randn(2, 2)), ... } ... ) x2y2 = xr.Dataset( ... { ... "temperature": (("x", "y"), np.random.randn(2, 2)), ... "precipitation": (("x", "y"), np.random.randn(2, 2)), ... } ... )

ds_grid = [[x1y1, x1y2], [x2y1, x2y2]] combined = xr.combine_nested(ds_grid, concat_dim=["x", "y"]) combined <xarray.Dataset> Size: 256B Dimensions: (x: 4, y: 4) Dimensions without coordinates: x, y Data variables: temperature (x, y) float64 128B 1.764 0.4002 -0.1032 ... 0.04576 -0.1872 precipitation (x, y) float64 128B 1.868 -0.9773 0.761 ... 0.1549 0.3782

combine_nested can also be used to explicitly merge datasets with different variables. For example if we have 4 datasets, which are divided along two times, and contain two different variables, we can pass Noneto concat_dim to specify the dimension of the nested list over which we wish to use merge instead of concat:

t1temp = xr.Dataset({"temperature": ("t", np.random.randn(5))}) t1temp <xarray.Dataset> Size: 40B Dimensions: (t: 5) Dimensions without coordinates: t Data variables: temperature (t) float64 40B -0.8878 -1.981 -0.3479 0.1563 1.23

t1precip = xr.Dataset({"precipitation": ("t", np.random.randn(5))}) t1precip <xarray.Dataset> Size: 40B Dimensions: (t: 5) Dimensions without coordinates: t Data variables: precipitation (t) float64 40B 1.202 -0.3873 -0.3023 -1.049 -1.42

t2temp = xr.Dataset({"temperature": ("t", np.random.randn(5))}) t2precip = xr.Dataset({"precipitation": ("t", np.random.randn(5))})

ds_grid = [[t1temp, t1precip], [t2temp, t2precip]] combined = xr.combine_nested(ds_grid, concat_dim=["t", None]) combined <xarray.Dataset> Size: 160B Dimensions: (t: 10) Dimensions without coordinates: t Data variables: temperature (t) float64 80B -0.8878 -1.981 -0.3479 ... -0.4381 -1.253 precipitation (t) float64 80B 1.202 -0.3873 -0.3023 ... -0.8955 0.3869