Remapping a dataset#
In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The dataset used as an example is from the EERIE project and is available online. It consists of five 2D variables and covers a period of 23 years, with a time step of six hours.
Note
Please note that all of the methods used in this example are generic and can be applied to any combination of source and target grids.
import healpix as hp
import numpy as np
import xarray as xr
import easygems.healpix as egh
import easygems.remap as egr
ds = xr.open_dataset(
"https://eerie.cloud.dkrz.de/datasets/icon-esm-er.eerie-control-1950.v20240618.atmos.native.2d_6h_inst/kerchunk",
engine="zarr",
chunks={},
).squeeze()
ds
<xarray.Dataset> Size: 12TB
Dimensions: (time: 112496, ncells: 5242880)
Coordinates:
cell_sea_land_mask (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
height float64 8B 10.0
height_2 float64 8B 2.0
lat (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
lon (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
* time (time) datetime64[ns] 900kB 1991-01-01T05:59:59 ... 2...
Dimensions without coordinates: ncells
Data variables:
hus2m (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
psl (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
ts (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
uas (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
vas (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
Attributes: (12/31)
Conventions: CF-1.7 CMIP-6.2
activity_id: EERIE
data_specs_version: 01.00.32
forcing_index: 1
initialization_index: 1
license: EERIE model data produced by MPI-M is licensed und...
... ...
parent_activity_id: EERIE
sub_experiment_id: none
experiment: coupled control with fixed 1950's forcing (HighRes...
source: ICON-ESM-ER (2023): \naerosol: none, prescribed MA...
institution: Max Planck Institute for Meteorology, Hamburg 2014...
sub_experiment: noneThe first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.
order = zoom = 9
nside = hp.order2nside(order)
pixels = np.arange(hp.nside2npix(nside))
hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=pixels, lonlat=True, nest=True)
Next, we can use our defined source and target grids to compute interpolation weights.
The easygems package provides a function to compute these weights using the Delaunay triangulation method.
weights = egr.compute_weights_delaunay(
points=(ds.lon, ds.lat),
xi=(hp_lon, hp_lat)
)
Tip
For larger grids, computing the remapping weights can be quite resource-intensive. However, you can save the calculated weights for future use:
weights.to_netcdf("healpix_weights.nc")
These weights can be applied to single fields directly:
egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights));
/builds/easy/gems/.venv/lib/python3.13/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_coastline.zip
warnings.warn(f'Downloading: {url}', DownloadWarning)
<Figure size 640x480 with 0 Axes>
Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.
ds_remap = xr.apply_ufunc(
egr.apply_weights,
ds,
kwargs=weights,
keep_attrs=True,
input_core_dims=[["ncells"]],
output_core_dims=[["cell"]],
output_dtypes=["f4"],
vectorize=True,
dask="parallelized",
dask_gufunc_kwargs={
"output_sizes": {"cell": weights.sizes["tgt_idx"]},
},
)
ds_remap
<xarray.Dataset> Size: 7TB
Dimensions: (time: 112496, cell: 3145728)
Coordinates:
height float64 8B 10.0
height_2 float64 8B 2.0
* time (time) datetime64[ns] 900kB 1991-01-01T05:59:59 ... 2067-12-31T...
Dimensions without coordinates: cell
Data variables:
hus2m (time, cell) float32 1TB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
psl (time, cell) float32 1TB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
ts (time, cell) float32 1TB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
uas (time, cell) float32 1TB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
vas (time, cell) float32 1TB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
Attributes: (12/31)
Conventions: CF-1.7 CMIP-6.2
activity_id: EERIE
data_specs_version: 01.00.32
forcing_index: 1
initialization_index: 1
license: EERIE model data produced by MPI-M is licensed und...
... ...
parent_activity_id: EERIE
sub_experiment_id: none
experiment: coupled control with fixed 1950's forcing (HighRes...
source: ICON-ESM-ER (2023): \naerosol: none, prescribed MA...
institution: Max Planck Institute for Meteorology, Hamburg 2014...
sub_experiment: noneThe resulting dataset can then be used as usual and the remapping is performed on demand.
egh.healpix_show(ds_remap.ts.sel(time="1991-01-01").mean("time"));
<Figure size 640x480 with 0 Axes>
Attaching a coordinate reference system#
It is good practice to store map projection information in the Coordinate Reference Systems (CRS).
By making the crs coordinate non-dimensional it will “stick” to the dataset even if individual variables are subselected.
crs = xr.DataArray(
name="crs",
attrs={
"grid_mapping_name": "healpix",
"healpix_nside": nside,
"healpix_order": "nest",
},
)
ds_remap = ds_remap.assign_coords(crs=crs)