Remapping a dataset to HEALPix#

In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The dataset used as an example is from the EERIE project and is available online. It consists of five 2d variables and covers 23 years with a 6 hour time step.

[1]:
import healpix as hp
import numpy as np
import xarray as xr

import easygems.healpix as egh
import easygems.remap as egr


ds = xr.open_dataset(
    "https://eerie.cloud.dkrz.de/datasets/icon-esm-er.eerie-control-1950.v20240618.atmos.native.2d_6h_inst/kerchunk",
    engine="zarr",
    chunks={},
)
ds
[1]:
<xarray.Dataset> Size: 4TB
Dimensions:             (ncells: 5242880, height: 1, height_2: 1, time: 33604)
Coordinates:
    cell_sea_land_mask  (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
  * height              (height) float64 8B 10.0
  * height_2            (height_2) float64 8B 2.0
    lat                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lon                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
  * time                (time) datetime64[ns] 269kB 1990-12-31T23:59:00 ... 2...
Dimensions without coordinates: ncells
Data variables:
    hus2m               (time, height_2, ncells) float32 705GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
    psl                 (time, ncells) float32 705GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    ts                  (time, ncells) float32 705GB dask.array<chunksize=(1, 5242880), meta=np.ndarray>
    uas                 (time, height, ncells) float32 705GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>
    vas                 (time, height, ncells) float32 705GB dask.array<chunksize=(1, 1, 5242880), meta=np.ndarray>

The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as “zoom”) and “nest” ordering.

[2]:
order = zoom = 9
nside = hp.order2nside(order)
npix = hp.nside2npix(nside)

hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=np.arange(npix), lonlat=True, nest=True)
hp_lon = (hp_lon + 180) % 360 - 180  # [-180, 180)
hp_lon += 360 / (4 * nside) / 4  # shift quarter-width

Next, we can use our defined source and target grids to compute interpolation weights. The easygems package provides a function to compute these weights using the Delaunay triangulation method.

[3]:
weights = egr.compute_weights_delaunay((ds.lon, ds.lat), (hp_lon, hp_lat))

# You can also save the calculated weights for future use
# weights.to_netcdf("healpix_weights.nc")

These weights can be applied to single fields directly:

[4]:
egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights))
[4]:
<matplotlib.image.AxesImage at 0x1604a7d40>
<Figure size 640x480 with 0 Axes>
../../_images/Processing_datasets_remapping_7_2.png

Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.

[5]:
ds_remap = xr.apply_ufunc(
    egr.apply_weights,
    ds,
    kwargs=weights,
    keep_attrs=True,
    input_core_dims=[["ncells"]],
    output_core_dims=[["cell"]],
    output_dtypes=["f4"],
    vectorize=True,
    dask="parallelized",
    dask_gufunc_kwargs={
        "output_sizes": {"cell": npix},
    },
)
ds_remap
[5]:
<xarray.Dataset> Size: 2TB
Dimensions:   (time: 33604, height_2: 1, cell: 3145728, height: 1)
Coordinates:
  * height    (height) float64 8B 10.0
  * height_2  (height_2) float64 8B 2.0
  * time      (time) datetime64[ns] 269kB 1990-12-31T23:59:00 ... 2013-12-31T...
Dimensions without coordinates: cell
Data variables:
    hus2m     (time, height_2, cell) float32 423GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
    psl       (time, cell) float32 423GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    ts        (time, cell) float32 423GB dask.array<chunksize=(1, 3145728), meta=np.ndarray>
    uas       (time, height, cell) float32 423GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>
    vas       (time, height, cell) float32 423GB dask.array<chunksize=(1, 1, 3145728), meta=np.ndarray>

The resulting dataset can then be used as usual and the remapping is performed on demand.

[6]:
egh.healpix_show(ds_remap.ts.sel(time="1990-12-31").mean("time"))
[6]:
<matplotlib.image.AxesImage at 0x3d3e13110>
<Figure size 640x480 with 0 Axes>
../../_images/Processing_datasets_remapping_11_2.png

Storing the coordinate reference system#

It is good practice to store map projection information in the Coordinate Reference Systems (CRS).

[7]:
ds_remap["crs"] = xr.DataArray(
    name="crs",
    data=[],
    dims="crs",
    attrs={
        "grid_mapping_name": "healpix",
        "healpix_nside": 2**zoom,
        "healpix_order": "nest",
    },
)