Remapping a dataset to HEALPix
In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The dataset used as an example is from the EERIE project and is available online. It consists of five 2d variables and covers 23 years with a 6 hour time step.
<xarray.Dataset> Size: 4TB
Dimensions: (ncells: 5242880, height: 1, height_2: 1, time: 33604)
Coordinates:
cell_sea_land_mask (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
* height (height) float64 8B 10.0
* height_2 (height_2) float64 8B 2.0
lat (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
lon (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
* time (time) datetime64[ns] 269kB 1990-12-31T23:59:00 ... 2...
Dimensions without coordinates: ncells
Data variables:
hus2m (time, height_2, ncells) float32 705GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
psl (time, ncells) float32 705GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
ts (time, ncells) float32 705GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
uas (time, height, ncells) float32 705GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
vas (time, height, ncells) float32 705GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray> Dimensions: ncells : 5242880height : 1height_2 : 1time : 33604
Coordinates: (6)
cell_sea_land_mask
(ncells)
int32
dask.array<chunksize=(5242880,), meta=np.ndarray>
grid_type : unstructured long_name : sea (-2 inner, -1 boundary) land (2 inner, 1 boundary) mask for the cell number_of_grid_in_reference : 1 units : 2,1,-1,-
Array
Chunk
Bytes
20.00 MiB
20.00 MiB
Shape
(5242880,)
(5242880,)
Dask graph
1 chunks in 2 graph layers
Data type
int32 numpy.ndarray
5242880
1
height
(height)
float64
10.0
axis : Z long_name : height positive : up standard_name : height units : m height_2
(height_2)
float64
2.0
axis : Z long_name : height positive : up standard_name : height units : m lat
(ncells)
float64
dask.array<chunksize=(5242880,), meta=np.ndarray>
bounds : clat_vertices long_name : center latitude standard_name : grid_latitude units : radian
Array
Chunk
Bytes
40.00 MiB
40.00 MiB
Shape
(5242880,)
(5242880,)
Dask graph
1 chunks in 2 graph layers
Data type
float64 numpy.ndarray
5242880
1
lon
(ncells)
float64
dask.array<chunksize=(5242880,), meta=np.ndarray>
bounds : clon_vertices long_name : center longitude standard_name : grid_longitude units : radian
Array
Chunk
Bytes
40.00 MiB
40.00 MiB
Shape
(5242880,)
(5242880,)
Dask graph
1 chunks in 2 graph layers
Data type
float64 numpy.ndarray
5242880
1
time
(time)
datetime64[ns]
1990-12-31T23:59:00 ... 2013-12-...
units : minutes since 1950-01-01 00:00:00 calendar : gregorian standard_name : time array(['1990-12-31T23:59:00.000000000', '1991-01-01T05:59:00.000000000',
'1991-01-01T11:59:00.000000000', ..., '2013-12-31T05:59:00.000000000',
'2013-12-31T11:59:00.000000000', '2013-12-31T17:59:00.000000000'],
dtype='datetime64[ns]') Data variables: (5)
hus2m
(time, height_2, ncells)
float32
dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
CDI_grid_type : unstructured long_name : specific humidity in 2m number_of_grid_in_reference : 1 param : 6.0.0 standard_name : qv2m units : kg kg-1
Array
Chunk
Bytes
656.33 GiB
20.00 MiB
Shape
(33604, 1, 5242880)
(1, 1, 5242880)
Dask graph
33604 chunks in 2 graph layers
Data type
float32 numpy.ndarray
5242880
1
33604
psl
(time, ncells)
float32
dask.array<chunksize=(1, 5242880), meta=np.ndarray>
CDI_grid_type : unstructured long_name : mean sea level pressure number_of_grid_in_reference : 1 param : 1.3.0 standard_name : mean sea level pressure units : Pa
Array
Chunk
Bytes
656.33 GiB
20.00 MiB
Shape
(33604, 5242880)
(1, 5242880)
Dask graph
33604 chunks in 2 graph layers
Data type
float32 numpy.ndarray
5242880
33604
ts
(time, ncells)
float32
dask.array<chunksize=(1, 5242880), meta=np.ndarray>
CDI_grid_type : unstructured long_name : surface temperature number_of_grid_in_reference : 1 param : 0.0.0 standard_name : surface_temperature units : K
Array
Chunk
Bytes
656.33 GiB
20.00 MiB
Shape
(33604, 5242880)
(1, 5242880)
Dask graph
33604 chunks in 2 graph layers
Data type
float32 numpy.ndarray
5242880
33604
uas
(time, height, ncells)
float32
dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
CDI_grid_type : unstructured long_name : zonal wind in 10m number_of_grid_in_reference : 1 param : 2.2.0 standard_name : uas units : m s-1
Array
Chunk
Bytes
656.33 GiB
20.00 MiB
Shape
(33604, 1, 5242880)
(1, 1, 5242880)
Dask graph
33604 chunks in 2 graph layers
Data type
float32 numpy.ndarray
5242880
1
33604
vas
(time, height, ncells)
float32
dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
CDI_grid_type : unstructured long_name : meridional wind in 10m number_of_grid_in_reference : 1 param : 3.2.0 standard_name : vas units : m s-1
Array
Chunk
Bytes
656.33 GiB
20.00 MiB
Shape
(33604, 1, 5242880)
(1, 1, 5242880)
Dask graph
33604 chunks in 2 graph layers
Data type
float32 numpy.ndarray
5242880
1
33604
Indexes: (3)
PandasIndex
PandasIndex(Index([10.0], dtype='float64', name='height')) PandasIndex
PandasIndex(Index([2.0], dtype='float64', name='height_2')) PandasIndex
PandasIndex(DatetimeIndex(['1990-12-31 23:59:00', '1991-01-01 05:59:00',
'1991-01-01 11:59:00', '1991-01-01 17:59:00',
'1991-01-01 23:59:00', '1991-01-02 05:59:00',
'1991-01-02 11:59:00', '1991-01-02 17:59:00',
'1991-01-02 23:59:00', '1991-01-03 05:59:00',
...
'2013-12-29 11:59:00', '2013-12-29 17:59:00',
'2013-12-29 23:59:00', '2013-12-30 05:59:00',
'2013-12-30 11:59:00', '2013-12-30 17:59:00',
'2013-12-30 23:59:00', '2013-12-31 05:59:00',
'2013-12-31 11:59:00', '2013-12-31 17:59:00'],
dtype='datetime64[ns]', name='time', length=33604, freq=None)) Attributes: (0)
The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.
Next, we can use our defined source and target grids to compute interpolation weights. The easygems
package provides a function to compute these weights using the Delaunay triangulation method.
These weights can be applied to single fields directly:
<matplotlib.image.AxesImage at 0x1604a7d40>
<Figure size 640x480 with 0 Axes>
Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.
<xarray.Dataset> Size: 2TB
Dimensions: (time: 33604, height_2: 1, cell: 3145728, height: 1)
Coordinates:
* height (height) float64 8B 10.0
* height_2 (height_2) float64 8B 2.0
* time (time) datetime64[ns] 269kB 1990-12-31T23:59:00 ... 2013-12-31T...
Dimensions without coordinates: cell
Data variables:
hus2m (time, height_2, cell) float32 423GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
psl (time, cell) float32 423GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
ts (time, cell) float32 423GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
uas (time, height, cell) float32 423GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
vas (time, height, cell) float32 423GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray> Dimensions: time : 33604height_2 : 1cell : 3145728height : 1
Coordinates: (3)
Data variables: (5)
hus2m
(time, height_2, cell)
float32
dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
CDI_grid_type : unstructured long_name : specific humidity in 2m number_of_grid_in_reference : 1 param : 6.0.0 standard_name : qv2m units : kg kg-1
Array
Chunk
Bytes
393.80 GiB
12.00 MiB
Shape
(33604, 1, 3145728)
(1, 1, 3145728)
Dask graph
33604 chunks in 5 graph layers
Data type
float32 numpy.ndarray
3145728
1
33604
psl
(time, cell)
float32
dask.array<chunksize=(1, 3145728), meta=np.ndarray>
CDI_grid_type : unstructured long_name : mean sea level pressure number_of_grid_in_reference : 1 param : 1.3.0 standard_name : mean sea level pressure units : Pa
Array
Chunk
Bytes
393.80 GiB
12.00 MiB
Shape
(33604, 3145728)
(1, 3145728)
Dask graph
33604 chunks in 5 graph layers
Data type
float32 numpy.ndarray
3145728
33604
ts
(time, cell)
float32
dask.array<chunksize=(1, 3145728), meta=np.ndarray>
CDI_grid_type : unstructured long_name : surface temperature number_of_grid_in_reference : 1 param : 0.0.0 standard_name : surface_temperature units : K
Array
Chunk
Bytes
393.80 GiB
12.00 MiB
Shape
(33604, 3145728)
(1, 3145728)
Dask graph
33604 chunks in 5 graph layers
Data type
float32 numpy.ndarray
3145728
33604
uas
(time, height, cell)
float32
dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
CDI_grid_type : unstructured long_name : zonal wind in 10m number_of_grid_in_reference : 1 param : 2.2.0 standard_name : uas units : m s-1
Array
Chunk
Bytes
393.80 GiB
12.00 MiB
Shape
(33604, 1, 3145728)
(1, 1, 3145728)
Dask graph
33604 chunks in 5 graph layers
Data type
float32 numpy.ndarray
3145728
1
33604
vas
(time, height, cell)
float32
dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
CDI_grid_type : unstructured long_name : meridional wind in 10m number_of_grid_in_reference : 1 param : 3.2.0 standard_name : vas units : m s-1
Array
Chunk
Bytes
393.80 GiB
12.00 MiB
Shape
(33604, 1, 3145728)
(1, 1, 3145728)
Dask graph
33604 chunks in 5 graph layers
Data type
float32 numpy.ndarray
3145728
1
33604
Indexes: (3)
PandasIndex
PandasIndex(Index([10.0], dtype='float64', name='height')) PandasIndex
PandasIndex(Index([2.0], dtype='float64', name='height_2')) PandasIndex
PandasIndex(DatetimeIndex(['1990-12-31 23:59:00', '1991-01-01 05:59:00',
'1991-01-01 11:59:00', '1991-01-01 17:59:00',
'1991-01-01 23:59:00', '1991-01-02 05:59:00',
'1991-01-02 11:59:00', '1991-01-02 17:59:00',
'1991-01-02 23:59:00', '1991-01-03 05:59:00',
...
'2013-12-29 11:59:00', '2013-12-29 17:59:00',
'2013-12-29 23:59:00', '2013-12-30 05:59:00',
'2013-12-30 11:59:00', '2013-12-30 17:59:00',
'2013-12-30 23:59:00', '2013-12-31 05:59:00',
'2013-12-31 11:59:00', '2013-12-31 17:59:00'],
dtype='datetime64[ns]', name='time', length=33604, freq=None)) Attributes: (0)
The resulting dataset can then be used as usual and the remapping is performed on demand.
<matplotlib.image.AxesImage at 0x3d3e13110>
<Figure size 640x480 with 0 Axes>
Storing the coordinate reference system
It is good practice to store map projection information in the Coordinate Reference Systems (CRS).