Integrating with duck arrays (original) (raw)

Warning

This is an experimental feature. Please report any bugs or other difficulties on xarray’s issue tracker.

Xarray can wrap custom numpy-like arrays (”duck arrays”) - see the user guide documentation. This page is intended for developers who are interested in wrapping a new custom array type with xarray.

Duck array requirements#

Xarray does not explicitly check that required methods are defined by the underlying duck array object before attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:

These need to be defined consistently with numpy.ndarray, for example the array shapeproperty needs to obey numpy’s broadcasting rules(see also the Python Array API standard’s explanationof these same rules).

Python Array API standard support#

As an integration library xarray benefits greatly from the standardization of duck-array libraries’ APIs, and so is a big supporter of the Python Array API Standard.

We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. xarray.DataArray.pad() calls numpy.pad()). See xarray issue #7848 for a list of such functions. We can still support dispatching on these functions through the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard then some features in xarray will not work.

Custom inline reprs#

In certain situations (e.g. when printing the collapsed preview of variables of a Dataset), xarray will display the repr of a duck arrayin a single line, truncating it to a certain number of characters. If that would drop too much information, the duck array may define a_repr_inline_ method that takes max_width (number of characters) as an argument

class MyDuckArray: ...

def _repr_inline_(self, max_width):
    """format to a single line with at most max_width characters"""
    ...

...

To avoid duplicated information, this method must omit information about the shape anddtype. For example, the string representation of a dask array or asparse matrix would be:

In [1]: import dask.array as da

In [2]: import xarray as xr

In [3]: import sparse

In [4]: a = da.linspace(0, 1, 20, chunks=2)

In [5]: a Out[5]: dask.array<linspace, shape=(20,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>

In [6]: b = np.eye(10)

In [7]: b[[5, 7, 3, 0], [6, 8, 2, 9]] = 2

In [8]: b = sparse.COO.from_numpy(b)

In [9]: b Out[9]: <COO: shape=(10, 10), dtype=float64, nnz=14, fill_value=0.0>

In [10]: xr.Dataset(dict(a=("x", a), b=(("y", "z"), b))) Out[10]: <xarray.Dataset> Size: 496B Dimensions: (x: 20, y: 10, z: 10) Dimensions without coordinates: x, y, z Data variables: a (x) float64 160B dask.array<chunksize=(2,), meta=np.ndarray> b (y, z) float64 336B <COO: nnz=14, fill_value=0.0>