Remapping a dataset to HEALPix#
In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The dataset used as an example is from the EERIE project and is available online. It consists of five 2d variables and covers 23 years with a 6 hour time step.
import healpix as hp
import numpy as np
import xarray as xr
import easygems.healpix as egh
import easygems.remap as egr
ds = xr.open_dataset(
"https://eerie.cloud.dkrz.de/datasets/icon-esm-er.eerie-control-1950.v20240618.atmos.native.2d_6h_inst/kerchunk",
engine="zarr",
chunks={},
)
ds
<xarray.Dataset> Size: 8TB Dimensions: (ncells: 5242880, height: 1, height_2: 1, time: 73051) Coordinates: cell_sea_land_mask (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray> * height (height) float64 8B 10.0 * height_2 (height_2) float64 8B 2.0 lat (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray> lon (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray> * time (time) datetime64[ns] 584kB 1991-01-01T05:59:59 ... 2... Dimensions without coordinates: ncells Data variables: hus2m (time, height_2, ncells) float32 2TB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray> psl (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray> ts (time, ncells) float32 2TB dask.array<chunksize=(1, 5242880), meta=np.ndarray> uas (time, height, ncells) float32 2TB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray> vas (time, height, ncells) float32 2TB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray> Attributes: (12/31) Conventions: CF-1.7 CMIP-6.2 activity_id: EERIE data_specs_version: 01.00.32 forcing_index: 1 initialization_index: 1 license: EERIE model data produced by MPI-M is licensed und... ... ... parent_activity_id: EERIE sub_experiment_id: none experiment: coupled control with fixed 1950's forcing (HighRes... source: ICON-ESM-ER (2023): \naerosol: none, prescribed MA... institution: Max Planck Institute for Meteorology, Hamburg 2014... sub_experiment: none
The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.
order = zoom = 9
nside = hp.order2nside(order)
npix = hp.nside2npix(nside)
hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=np.arange(npix), lonlat=True, nest=True)
hp_lon = (hp_lon + 180) % 360 - 180 # [-180, 180)
hp_lon += 360 / (4 * nside) / 4 # shift quarter-width
Next, we can use our defined source and target grids to compute interpolation weights.
The easygems
package provides a function to compute these weights using the Delaunay triangulation method.
weights = egr.compute_weights_delaunay((ds.lon, ds.lat), (hp_lon, hp_lat))
# You can also save the calculated weights for future use
# weights.to_netcdf("healpix_weights.nc")
These weights can be applied to single fields directly:
egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights))
<matplotlib.image.AxesImage at 0x7f870a96a240>
/usr/lib/python3.12/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_coastline.zip
warnings.warn(f'Downloading: {url}', DownloadWarning)
<Figure size 640x480 with 0 Axes>

Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.
ds_remap = xr.apply_ufunc(
egr.apply_weights,
ds,
kwargs=weights,
keep_attrs=True,
input_core_dims=[["ncells"]],
output_core_dims=[["cell"]],
output_dtypes=["f4"],
vectorize=True,
dask="parallelized",
dask_gufunc_kwargs={
"output_sizes": {"cell": npix},
},
)
ds_remap
<xarray.Dataset> Size: 5TB Dimensions: (time: 73051, height_2: 1, cell: 3145728, height: 1) Coordinates: * height (height) float64 8B 10.0 * height_2 (height_2) float64 8B 2.0 * time (time) datetime64[ns] 584kB 1991-01-01T05:59:59 ... 2040-12-31T... Dimensions without coordinates: cell Data variables: hus2m (time, height_2, cell) float32 919GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray> psl (time, cell) float32 919GB dask.array<chunksize=(1, 3145728), meta=np.ndarray> ts (time, cell) float32 919GB dask.array<chunksize=(1, 3145728), meta=np.ndarray> uas (time, height, cell) float32 919GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray> vas (time, height, cell) float32 919GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray> Attributes: (12/31) Conventions: CF-1.7 CMIP-6.2 activity_id: EERIE data_specs_version: 01.00.32 forcing_index: 1 initialization_index: 1 license: EERIE model data produced by MPI-M is licensed und... ... ... parent_activity_id: EERIE sub_experiment_id: none experiment: coupled control with fixed 1950's forcing (HighRes... source: ICON-ESM-ER (2023): \naerosol: none, prescribed MA... institution: Max Planck Institute for Meteorology, Hamburg 2014... sub_experiment: none
The resulting dataset can then be used as usual and the remapping is performed on demand.
egh.healpix_show(ds_remap.ts.sel(time="1991-01-01").mean("time"))
<matplotlib.image.AxesImage at 0x7f8704bcc8f0>
<Figure size 640x480 with 0 Axes>

Storing the coordinate reference system#
It is good practice to store map projection information in the Coordinate Reference Systems (CRS).
By making the crs
coordinate non-dimensional it will “stick” to the dataset even if individual variables are subselected.
ds_remap["crs"] = xr.DataArray(
name="crs",
data=0,
attrs={
"grid_mapping_name": "healpix",
"healpix_nside": 2**zoom,
"healpix_order": "nest",
},
)