Remapping a dataset to HEALPix#

In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The dataset used as an example is from the EERIE project and is available online. It consists of five 2d variables and covers 23 years with a 6 hour time step.

[1]:
import healpix as hp
import numpy as np
import xarray as xr

import easygems.healpix as egh
import easygems.remap as egr


ds = xr.open_dataset(
    "https://eerie.cloud.dkrz.de/datasets/icon-esm-er.eerie-control-1950.v20240618.atmos.native.2d_6h_inst/kerchunk",
    engine="zarr",
    chunks={},
)
ds
[1]:
<xarray.Dataset> Size: 5TB
Dimensions:             (ncells: 5242880, height: 1, height_2: 1, time: 46752)
Coordinates:
    cell_sea_land_mask  (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
  * height              (height) float64 8B 10.0
  * height_2            (height_2) float64 8B 2.0
    lat                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lon                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
  * time                (time) datetime64[ns] 374kB 1991-01-01T05:59:00 ... 2...
Dimensions without coordinates: ncells
Data variables:
    hus2m               (time, height_2, ncells) float32 980GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
    psl                 (time, ncells) float32 980GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    ts                  (time, ncells) float32 980GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    uas                 (time, height, ncells) float32 980GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
    vas                 (time, height, ncells) float32 980GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
Attributes: (12/30)
    Conventions:           CF-1.7 CMIP-6.2
    activity_id:           EERIE
    data_specs_version:    01.00.32
    forcing_index:         1
    initialization_index:  1
    license:               EERIE model data produced by MPI-M is licensed und...
    ...                    ...
    parent_experiment_id:  eerie-spinup-1950
    parent_activity_id:    EERIE
    sub_experiment_id:     none
    experiment:            coupled control with fixed 1950's forcing (HighRes...
    source:                ICON-ESM-ER (2023): \naerosol: none, prescribed MA...
    institution:           Max Planck Institute for Meteorology, Hamburg 2014...

The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.

[2]:
order = zoom = 9
nside = hp.order2nside(order)
npix = hp.nside2npix(nside)

hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=np.arange(npix), lonlat=True, nest=True)
hp_lon = (hp_lon + 180) % 360 - 180  # [-180, 180)
hp_lon += 360 / (4 * nside) / 4  # shift quarter-width

Next, we can use our defined source and target grids to compute interpolation weights. The easygems package provides a function to compute these weights using the Delaunay triangulation method.

[3]:
weights = egr.compute_weights_delaunay((ds.lon, ds.lat), (hp_lon, hp_lat))

# You can also save the calculated weights for future use
# weights.to_netcdf("healpix_weights.nc")

These weights can be applied to single fields directly:

[4]:
egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights))
[4]:
<matplotlib.image.AxesImage at 0x7ffd4919b4a0>
<Figure size 640x480 with 0 Axes>
../../_images/Processing_datasets_remapping_7_2.png

Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.

[5]:
ds_remap = xr.apply_ufunc(
    egr.apply_weights,
    ds,
    kwargs=weights,
    keep_attrs=True,
    input_core_dims=[["ncells"]],
    output_core_dims=[["cell"]],
    output_dtypes=["f4"],
    vectorize=True,
    dask="parallelized",
    dask_gufunc_kwargs={
        "output_sizes": {"cell": npix},
    },
)
ds_remap
[5]:
<xarray.Dataset> Size: 3TB
Dimensions:   (time: 46752, height_2: 1, cell: 3145728, height: 1)
Coordinates:
  * height    (height) float64 8B 10.0
  * height_2  (height_2) float64 8B 2.0
  * time      (time) datetime64[ns] 374kB 1991-01-01T05:59:00 ... 2022-12-31T...
Dimensions without coordinates: cell
Data variables:
    hus2m     (time, height_2, cell) float32 588GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
    psl       (time, cell) float32 588GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    ts        (time, cell) float32 588GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    uas       (time, height, cell) float32 588GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
    vas       (time, height, cell) float32 588GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
Attributes: (12/30)
    Conventions:           CF-1.7 CMIP-6.2
    activity_id:           EERIE
    data_specs_version:    01.00.32
    forcing_index:         1
    initialization_index:  1
    license:               EERIE model data produced by MPI-M is licensed und...
    ...                    ...
    parent_experiment_id:  eerie-spinup-1950
    parent_activity_id:    EERIE
    sub_experiment_id:     none
    experiment:            coupled control with fixed 1950's forcing (HighRes...
    source:                ICON-ESM-ER (2023): \naerosol: none, prescribed MA...
    institution:           Max Planck Institute for Meteorology, Hamburg 2014...

The resulting dataset can then be used as usual and the remapping is performed on demand.

[6]:
egh.healpix_show(ds_remap.ts.sel(time="1991-01-01").mean("time"))
[6]:
<matplotlib.image.AxesImage at 0x7ffd49527d10>
<Figure size 640x480 with 0 Axes>
../../_images/Processing_datasets_remapping_11_2.png

Storing the coordinate reference system#

It is good practice to store map projection information in the Coordinate Reference Systems (CRS). By making the crs coordinate non-dimensional it will “stick” to the dataset even if individual variables are subselected.

[7]:
ds_remap["crs"] = xr.DataArray(
    name="crs",
    data=0,
    attrs={
        "grid_mapping_name": "healpix",
        "healpix_nside": 2**zoom,
        "healpix_order": "nest",
    },
)