Multidimensional groupby by rabernat · Pull Request #818 · pydata/xarray (original) (raw)

@@ -0,0 +1,201 @@

.. _examples.multidim:

Working with Multidimensional Coordinates

=========================================

Author: `Ryan Abernathey http://github.org/rabernat`__

Many datasets have *physical coordinates* which differ from their

*logical coordinates*. Xarray provides several ways to plot and analyze

such datasets.

.. code:: python

%matplotlib inline

import numpy as np

import pandas as pd

import xarray as xr

import cartopy.crs as ccrs

from matplotlib import pyplot as plt

print("numpy version : ", np.__version__)

print("pandas version : ", pd.__version__)

print("xarray version : ", xr.version.version)

.. parsed-literal::

('numpy version : ', '1.11.0')

('pandas version : ', u'0.18.0')

('xarray version : ', '0.7.2-32-gf957eb8')

As an example, consider this dataset from the

`xarray-data https://github.com/pydata/xarray-data`__ repository.

.. code:: python

! curl -L -O https://github.com/pydata/xarray-data/raw/master/RASM\_example\_data.nc

.. code:: python

ds = xr.open_dataset('RASM_example_data.nc')

ds

.. parsed-literal::

<xarray.Dataset>

Dimensions: (time: 36, x: 275, y: 205)

Coordinates:

* time (time) datetime64[ns] 1980-09-16T12:00:00 1980-10-17 ...

yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...

xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...

* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...

* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...

Data variables:

Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...

Attributes:

title: /workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc

institution: U.W.

source: RACM R1002RBRxaaa01a

output_frequency: daily

output_mode: averaged

convention: CF-1.4

references: Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429.

comment: Output from the Variable Infiltration Capacity (VIC) model.

nco_openmp_thread_number: 1

NCO: 4.3.7

history: history deleted for brevity

In this example, the *logical coordinates* are ``x`` and ``y``, while

the *physical coordinates* are ``xc`` and ``yc``, which represent the

latitudes and longitude of the data.

.. code:: python

print(ds.xc.attrs)

print(ds.yc.attrs)

.. parsed-literal::

OrderedDict([(u'long_name', u'longitude of grid cell center'), (u'units', u'degrees_east'), (u'bounds', u'xv')])

OrderedDict([(u'long_name', u'latitude of grid cell center'), (u'units', u'degrees_north'), (u'bounds', u'yv')])

Plotting

--------

Let's examine these coordinate variables by plotting them.

.. code:: python

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14,4))

ds.xc.plot(ax=ax1)

ds.yc.plot(ax=ax2)

.. parsed-literal::

<matplotlib.collections.QuadMesh at 0x118688fd0>

.. parsed-literal::

/Users/rpa/anaconda/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

if self._edgecolors == str('face'):

.. image:: multidimensional_coords_files/xarray_multidimensional_coords_8_2.png

Note that the variables ``xc`` (longitude) and ``yc`` (latitude) are

two-dimensional scalar fields.

If we try to plot the data variable ``Tair``, by default we get the

logical coordinates.

.. code:: python

ds.Tair[0].plot()

.. parsed-literal::

<matplotlib.collections.QuadMesh at 0x11b6da890>

.. image:: multidimensional_coords_files/xarray_multidimensional_coords_10_1.png

In order to visualize the data on a conventional latitude-longitude

grid, we can take advantage of xarray's ability to apply

`cartopy http://scitools.org.uk/cartopy/index.html`__ map projections.

.. code:: python

plt.figure(figsize=(14,6))

ax = plt.axes(projection=ccrs.PlateCarree())

ax.set_global()

ds.Tair[0].plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), x='xc', y='yc', add_colorbar=False)

ax.coastlines()

ax.set_ylim([0,90]);

.. image:: multidimensional_coords_files/xarray_multidimensional_coords_12_0.png

Multidimensional Groupby

------------------------

The above example allowed us to visualize the data on a regular

latitude-longitude grid. But what if we want to do a calculation that

involves grouping over one of these physical coordinates (rather than

the logical coordinates), for example, calculating the mean temperature

at each latitude. This can be achieved using xarray's ``groupby``

function, which accepts multidimensional variables. By default,

``groupby`` will use every unique value in the variable, which is

probably not what we want. Instead, we can use the ``groupby_bins``

function to specify the output coordinates of the group.

.. code:: python

# define two-degree wide latitude bins

lat_bins = np.arange(0,91,2)

# define a label for each bin corresponding to the central latitude

lat_center = np.arange(1,90,2)

# group according to those bins and take the mean

Tair_lat_mean = ds.Tair.groupby_bins('xc', lat_bins, labels=lat_center).mean()

# plot the result

Tair_lat_mean.plot()

.. parsed-literal::

[<matplotlib.lines.Line2D at 0x11cb92e90>]

.. image:: multidimensional_coords_files/xarray_multidimensional_coords_14_1.png

Note that the resulting coordinate for the ``groupby_bins`` operation

got the ``_bins`` suffix appended: ``xc_bins``. This help us distinguish

it from the original multidimensional variable ``xc``.