xgriddedaxis: Cell Boundary-aware Operations with xarray

pypi conda forge Build Status codecov docs GitHub Workflow Status

xgriddedaxis is a Python package for managing/working with one-dimensional axes with their respective cell boundaries information. xgriddedaxis consumes and produces xarray data structures, which are coordinate and metadata-rich representations of multidimensional array data.

xgriddedaxis was motivated by the fact that xarray is not aware of cell boundary variables when performing operations such as resampling. The main objective of xgriddedaxis is to provide a set of utilities that enables fluid translation between data at different intervals while being aware of the cell boundary variables.

The fundamental concept in xgriddedaxis is a Remapper object. Remapper’s role includes:

  • Creating a source axis, i.e. the axis that your original data is on,
  • Creating a destination axis, i.e. the axis that you want to convert your data to,
  • Creating a Remapper object by passing the source and destination axis you created previously,
  • Finally, converting your data from the source axis to the destination axis, using the Remapper object you created in previous step.

For more information, read the full xgriddedaxis documentation.

Installation

xgriddedaxis can be installed from PyPI with pip:

python -m pip install xgriddedaxis

It is also available from conda-forge for conda installations:

conda install -c conda-forge xgriddedaxis

Feedback

If you encounter any errors or problems with xgriddedaxis, please open an issue at the GitHub main repository.

Installation

xgriddedaxis can be installed from PyPI with pip:

python -m pip install xgriddedaxis

It is also available from conda-forge for conda installations:

conda install -c conda-forge xgriddedaxis

Tutorial

[1]:
from xgriddedaxis import Remapper
from xgriddedaxis.testing import create_dataset
import xarray as xr
xr.set_options(display_style="html")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-d8ee320b5cf4> in <module>
      1 from xgriddedaxis import Remapper
----> 2 from xgriddedaxis.testing import create_dataset
      3 import xarray as xr
      4 xr.set_options(display_style="html")

ModuleNotFoundError: No module named 'xgriddedaxis.testing'

Input Data

For demonstration purposes, we are going to use the create_dataset() function for generating test data.

[2]:
ds = create_dataset(start='2000-01-01', end='2002-01-01', freq='D', nlats=90, nlons=180)
ds
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-07ad2c7318d6> in <module>
----> 1 ds = create_dataset(start='2000-01-01', end='2002-01-01', freq='D', nlats=90, nlons=180)
      2 ds

NameError: name 'create_dataset' is not defined

Our input data set consists of two variables tmin, and tmax plus the time_bounds variable. The data was generated at a daily frequency for two years.

[3]:
ds.tmin.isel(time=0).plot(robust=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-21f5d7dc9c47> in <module>
----> 1 ds.tmin.isel(time=0).plot(robust=True)

NameError: name 'ds' is not defined
[4]:
m = ds.mean(dim=['lat', 'lon'])
m.tmin.plot()
m.tmax.plot()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-ef7652c33a7a> in <module>
----> 1 m = ds.mean(dim=['lat', 'lon'])
      2 m.tmin.plot()
      3 m.tmax.plot()

NameError: name 'ds' is not defined

Remapper Object

Say we want to downsample the input data from daily to monthly frequency. To achieve this, we create a remapper object, and pass in:

  • An xarray Dataset containing the time, time boundary information of the incoming time axis.
  • An outgoing frequency. For e.g ‘M’, ‘2D’, ‘H’, or ‘3T’ For full specification of available frequencies, please see here
[5]:
remapper = Remapper(ds, freq='M')
remapper
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-dd8fccf6b83e> in <module>
----> 1 remapper = Remapper(ds, freq='M')
      2 remapper

NameError: name 'ds' is not defined

During the Remapper object creation, xgriddedaxis uses the incoming time axis information in conjunction with the specified frequency to construct an outgoing time axis information. This information is stored as an xarray Dataset in the .info attribute of the remapper object:

[6]:
remapper.info
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-9abd55daf06e> in <module>
----> 1 remapper.info

NameError: name 'remapper' is not defined

The remapper is telling us that it can remap data from a daily time frequency with 731 incoming timesteps (731 days) to monthly time frequency with 24 outgoing timesteps (24 months).

The remapping weights are stored as a sparse matrix (following the Coordinate List (COO) layout) in the weights variable:

[7]:
remapper.info.weights.data
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-b2872189b48a> in <module>
----> 1 remapper.info.weights.data

NameError: name 'remapper' is not defined
[8]:
remapper.info.weights.data.todense()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-609d9ea0fe7a> in <module>
----> 1 remapper.info.weights.data.todense()

NameError: name 'remapper' is not defined

The outgoing time bounds are stored in the outgoing_time_bounds variable:

[9]:
remapper.info.outgoing_time_bounds
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-a25bd9ad4eb6> in <module>
----> 1 remapper.info.outgoing_time_bounds

NameError: name 'remapper' is not defined

More information about the incoming and outgoing time axes is stored in the attrs section:

[10]:
remapper.info.attrs
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-e5503ddef281> in <module>
----> 1 remapper.info.attrs

NameError: name 'remapper' is not defined

Performing remapping (resampling)

Now that we have an instance of the Remapper object, we can tell xgriddedaxis to convert data from the incoming time axis to the outgoing (destination) axis.

[11]:
tmin_out = remapper.average(ds.tmin)
tmin_out
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-d9de4c008527> in <module>
----> 1 tmin_out = remapper.average(ds.tmin)
      2 tmin_out

NameError: name 'remapper' is not defined

Check results

[12]:
tmin_out.isel(time=0).plot(robust=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-12-6ba65521348e> in <module>
----> 1 tmin_out.isel(time=0).plot(robust=True)

NameError: name 'tmin_out' is not defined

Check broadcasting over extra dimensions

The remapping should affect the time dimension only. We can check that xgriddedaxis tracks coordinate values over extra dimensions

[13]:
ds.lat
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-21970646b2de> in <module>
----> 1 ds.lat

NameError: name 'ds' is not defined
[14]:
# Passes if the output is exactly the same as the input
xr.testing.assert_identical(ds.lat, tmin_out.lat)
xr.testing.assert_identical(ds.lon, tmin_out.lon)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-22e9fd95de44> in <module>
      1 # Passes if the output is exactly the same as the input
----> 2 xr.testing.assert_identical(ds.lat, tmin_out.lat)
      3 xr.testing.assert_identical(ds.lon, tmin_out.lon)

NameError: name 'xr' is not defined

We can plot the time series at a specific location, to make sure the broadcasting is correct:

[15]:
ds.tmin.sel(lat=-90, lon=-180).plot()
tmin_out.sel(lat=-90, lon=-180).plot(color='red')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-ebffd2cf2e5d> in <module>
----> 1 ds.tmin.sel(lat=-90, lon=-180).plot()
      2 tmin_out.sel(lat=-90, lon=-180).plot(color='red')

NameError: name 'ds' is not defined
[16]:
%load_ext watermark
%watermark -v -m -g -p xarray,xgriddedaxis,cftime,pandas
CPython 3.7.3
IPython 7.13.0

xarray 0.15.1
xgriddedaxis 0.0.post43
cftime 1.1.2
pandas 1.0.3

compiler   : GCC 7.4.0
system     : Linux
release    : 5.0.0-1032-azure
machine    : x86_64
processor  : x86_64
CPU cores  : 1
interpreter: 64bit
Git hash   : b4e6bd9215269111a1dd602831c9d6ecec6f768d

API Reference

This page provides an auto-generated summary of xgriddedaxis’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Axis

xgriddedaxis.Axis

Remapper

xgriddedaxis.Remapper()
class xgriddedaxis.Remapper[source]

Indices and tables