Remapping a dataset#
In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The example dataset consists of five 2D variables sampled at six-hourly intervals.
Note
Please note that all of the methods used in this example are generic and can be applied to any combination of source and target grids.
import healpix as hp
import intake
import numpy as np
import xarray as xr
import easygems.healpix as egh
import easygems.remap as egr
cat = intake.open_catalog("https://data.nextgems-h2020.eu/online.yaml")
ds = cat.tutorial["ICON.native.2d_PT6H_inst"](chunks={}).to_dask()
ds
<xarray.Dataset> Size: 13GB
Dimensions: (time: 124, ncells: 5242880)
Coordinates:
* time (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 20...
cell_sea_land_mask (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
lat (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
lon (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
Dimensions without coordinates: ncells
Data variables:
hus2m (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
psl (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
ts (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
uas (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
vas (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
Attributes: (12/31)
Conventions: CF-1.7 CMIP-6.2
activity_id: HighResMIP
contact: juergen.kroeger@mpimet.mpg.de
creation_date: 2025-07-08T10:51:45
data_specs_version: 01.00.32
experiment: coupled future 2015-2050 using a scenario as close...
... ...
source_id: ICON-ESM-ER
source_type: AOGCM
sub_experiment: none
sub_experiment_id: none
variant_label: r1i1p1f1
version_id: v20240618The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.
order = zoom = 9
nside = hp.order2nside(order)
pixels = np.arange(hp.nside2npix(nside))
hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=pixels, lonlat=True, nest=True)
Next, we can use our defined source and target grids to compute interpolation weights.
The easygems package provides a function to compute these weights using the Delaunay triangulation method.
weights = egr.compute_weights_delaunay(
points=(ds.lon, ds.lat),
xi=(hp_lon, hp_lat)
)
Tip
For larger grids, computing the remapping weights can be quite resource-intensive. However, you can save the calculated weights for future use:
weights.to_netcdf("healpix_weights.nc")
These weights can be applied to single fields directly:
egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights));
Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.
ds_remap = xr.apply_ufunc(
egr.apply_weights,
ds,
kwargs=weights,
keep_attrs=True,
input_core_dims=[["ncells"]],
output_core_dims=[["cell"]],
output_dtypes=["f4"],
vectorize=True,
dask="parallelized",
dask_gufunc_kwargs={
"output_sizes": {"cell": weights.sizes["tgt_idx"]},
},
)
ds_remap
<xarray.Dataset> Size: 8GB
Dimensions: (time: 124, cell: 3145728)
Coordinates:
* time (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 2020-01-31T23...
Dimensions without coordinates: cell
Data variables:
hus2m (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
psl (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
ts (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
uas (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
vas (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
Attributes: (12/31)
Conventions: CF-1.7 CMIP-6.2
activity_id: HighResMIP
contact: juergen.kroeger@mpimet.mpg.de
creation_date: 2025-07-08T10:51:45
data_specs_version: 01.00.32
experiment: coupled future 2015-2050 using a scenario as close...
... ...
source_id: ICON-ESM-ER
source_type: AOGCM
sub_experiment: none
sub_experiment_id: none
variant_label: r1i1p1f1
version_id: v20240618The resulting dataset can then be used as usual and the remapping is performed on demand.
egh.healpix_show(ds_remap.ts.isel(time=0));
Attaching a coordinate reference system#
It is good practice to store map projection information in the Coordinate Reference Systems (CRS).
By making the crs coordinate non-dimensional it will “stick” to the dataset even if individual variables are subselected.
crs = xr.DataArray(
name="crs",
attrs={
"grid_mapping_name": "healpix",
"healpix_nside": nside,
"healpix_order": "nest",
},
)
ds_remap = ds_remap.assign_coords(crs=crs)