Remapping a dataset#

In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The example dataset consists of five 2D variables sampled at six-hourly intervals.

Note

Please note that all of the methods used in this example are generic and can be applied to any combination of source and target grids.

import healpix as hp
import intake
import numpy as np
import xarray as xr

import easygems.healpix as egh
import easygems.remap as egr


cat = intake.open_catalog("https://data.nextgems-h2020.eu/online.yaml")
ds = cat.tutorial["ICON.native.2d_PT6H_inst"](chunks={}).to_dask()
ds
<xarray.Dataset> Size: 13GB
Dimensions:             (time: 124, ncells: 5242880)
Coordinates:
  * time                (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 20...
    cell_sea_land_mask  (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lat                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lon                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
Dimensions without coordinates: ncells
Data variables:
    hus2m               (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    psl                 (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    ts                  (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    uas                 (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    vas                 (time, ncells) float32 3GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
Attributes: (12/31)
    Conventions:           CF-1.7 CMIP-6.2
    activity_id:           HighResMIP
    contact:               juergen.kroeger@mpimet.mpg.de
    creation_date:         2025-07-08T10:51:45
    data_specs_version:    01.00.32
    experiment:            coupled future 2015-2050 using a scenario as close...
    ...                    ...
    source_id:             ICON-ESM-ER
    source_type:           AOGCM
    sub_experiment:        none
    sub_experiment_id:     none
    variant_label:         r1i1p1f1
    version_id:            v20240618

The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.

order = zoom = 9
nside = hp.order2nside(order)
pixels = np.arange(hp.nside2npix(nside))

hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=pixels, lonlat=True, nest=True)

Next, we can use our defined source and target grids to compute interpolation weights. The easygems package provides a function to compute these weights using the Delaunay triangulation method.

weights = egr.compute_weights_delaunay(
    points=(ds.lon, ds.lat),
    xi=(hp_lon, hp_lat)
)

Tip

For larger grids, computing the remapping weights can be quite resource-intensive. However, you can save the calculated weights for future use:

weights.to_netcdf("healpix_weights.nc")

These weights can be applied to single fields directly:

egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights));
../../_images/3f9906b2958656a11240454bdb91a76449ae8373487ce1d529bb57e4cdf8ef0d.png

Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.

ds_remap = xr.apply_ufunc(
    egr.apply_weights,
    ds,
    kwargs=weights,
    keep_attrs=True,
    input_core_dims=[["ncells"]],
    output_core_dims=[["cell"]],
    output_dtypes=["f4"],
    vectorize=True,
    dask="parallelized",
    dask_gufunc_kwargs={
        "output_sizes": {"cell": weights.sizes["tgt_idx"]},
    },
)
ds_remap
<xarray.Dataset> Size: 8GB
Dimensions:  (time: 124, cell: 3145728)
Coordinates:
  * time     (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 2020-01-31T23...
Dimensions without coordinates: cell
Data variables:
    hus2m    (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    psl      (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    ts       (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    uas      (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    vas      (time, cell) float32 2GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
Attributes: (12/31)
    Conventions:           CF-1.7 CMIP-6.2
    activity_id:           HighResMIP
    contact:               juergen.kroeger@mpimet.mpg.de
    creation_date:         2025-07-08T10:51:45
    data_specs_version:    01.00.32
    experiment:            coupled future 2015-2050 using a scenario as close...
    ...                    ...
    source_id:             ICON-ESM-ER
    source_type:           AOGCM
    sub_experiment:        none
    sub_experiment_id:     none
    variant_label:         r1i1p1f1
    version_id:            v20240618

The resulting dataset can then be used as usual and the remapping is performed on demand.

egh.healpix_show(ds_remap.ts.isel(time=0));
../../_images/68d286b2862175b5aa275cfd5d831929a611845988800c80cd07681183731e54.png

Attaching a coordinate reference system#

It is good practice to store map projection information in the Coordinate Reference Systems (CRS). By making the crs coordinate non-dimensional it will “stick” to the dataset even if individual variables are subselected.

crs = xr.DataArray(
    name="crs",
    attrs={
        "grid_mapping_name": "healpix",
        "healpix_nside": nside,
        "healpix_order": "nest",
    },
)
ds_remap = ds_remap.assign_coords(crs=crs)