Integrating with duck arrays (original) (raw)
Warning
This is an experimental feature. Please report any bugs or other difficulties on xarray’s issue tracker.
Xarray can wrap custom numpy-like arrays (”duck arrays”) - see the user guide documentation. This page is intended for developers who are interested in wrapping a new custom array type with xarray.
Duck array requirements#
Xarray does not explicitly check that required methods are defined by the underlying duck array object before attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:
shape
property,dtype
property,ndim
property,__array__
method,__array_ufunc__
method,__array_function__
method.
These need to be defined consistently with numpy.ndarray, for example the array shape
property needs to obey numpy’s broadcasting rules(see also the Python Array API standard’s explanationof these same rules).
Python Array API standard support#
As an integration library xarray benefits greatly from the standardization of duck-array libraries’ APIs, and so is a big supporter of the Python Array API Standard.
We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. xarray.DataArray.pad() calls numpy.pad()). See xarray issue #7848 for a list of such functions. We can still support dispatching on these functions through the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard then some features in xarray will not work.
Custom inline reprs#
In certain situations (e.g. when printing the collapsed preview of variables of a Dataset
), xarray will display the repr of a duck arrayin a single line, truncating it to a certain number of characters. If that would drop too much information, the duck array may define a_repr_inline_
method that takes max_width
(number of characters) as an argument
class MyDuckArray: ...
def _repr_inline_(self, max_width):
"""format to a single line with at most max_width characters"""
...
...
To avoid duplicated information, this method must omit information about the shape anddtype. For example, the string representation of a dask
array or asparse
matrix would be:
In [1]: import dask.array as da
In [2]: import xarray as xr
In [3]: import sparse
In [4]: a = da.linspace(0, 1, 20, chunks=2)
In [5]: a Out[5]: dask.array<linspace, shape=(20,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>
In [6]: b = np.eye(10)
In [7]: b[[5, 7, 3, 0], [6, 8, 2, 9]] = 2
In [8]: b = sparse.COO.from_numpy(b)
In [9]: b Out[9]: <COO: shape=(10, 10), dtype=float64, nnz=14, fill_value=0.0>
In [10]: xr.Dataset(dict(a=("x", a), b=(("y", "z"), b))) Out[10]: <xarray.Dataset> Size: 496B Dimensions: (x: 20, y: 10, z: 10) Dimensions without coordinates: x, y, z Data variables: a (x) float64 160B dask.array<chunksize=(2,), meta=np.ndarray> b (y, z) float64 336B <COO: nnz=14, fill_value=0.0>