Remapping a dataset#

In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The example dataset consists of five 2D variables sampled at six-hourly intervals.

Note

Please note that all of the methods used in this example are generic and can be applied to any combination of source and target grids.

import healpix as hp
import intake
import numpy as np
import xarray as xr

import easygems.healpix as egh
import easygems.remap as egr


cat = intake.open_catalog("https://data.nextgems-h2020.eu/online.yaml")
ds = cat.tutorial["ICON.native.2d_PT6H_inst"](chunks={}).to_dask()
ds
<xarray.Dataset> Size: 3GB
Dimensions:             (time: 124, ncells: 1310720)
Coordinates:
  * time                (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 20...
    cell_sea_land_mask  (ncells) float64 10MB dask.array<chunksize=(1310720,), meta=np.ndarray>
    lat                 (ncells) float64 10MB dask.array<chunksize=(1310720,), meta=np.ndarray>
    lon                 (ncells) float64 10MB dask.array<chunksize=(1310720,), meta=np.ndarray>
Dimensions without coordinates: ncells
Data variables:
    hus2m               (time, ncells) float32 650MB dask.array<chunksize=(1, 1310720), meta=np.ndarray>
    psl                 (time, ncells) float32 650MB dask.array<chunksize=(1, 1310720), meta=np.ndarray>
    ts                  (time, ncells) float32 650MB dask.array<chunksize=(1, 1310720), meta=np.ndarray>
    uas                 (time, ncells) float32 650MB dask.array<chunksize=(1, 1310720), meta=np.ndarray>
    vas                 (time, ncells) float32 650MB dask.array<chunksize=(1, 1310720), meta=np.ndarray>

The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 8 (also known as zoom or refinement level) and nested ordering.

order = zoom = 8
nside = hp.order2nside(order)
pixels = np.arange(hp.nside2npix(nside))

hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=pixels, lonlat=True, nest=True)

Next, we can use our defined source and target grids to compute interpolation weights. The easygems package provides a function to compute these weights using the Delaunay triangulation method.

weights = egr.compute_weights_delaunay(
    points=(ds.lon, ds.lat),
    xi=(hp_lon, hp_lat)
)

Tip

For larger grids, computing the remapping weights can be quite resource-intensive. However, you can save the calculated weights for future use:

weights.to_netcdf("healpix_weights.nc")

These weights can be applied to single fields directly:

egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights));
../../_images/5ca87f019bd0938be9a27697c671bb7120a82fa11aa677530e39a9ce6f8145ca.png

Alternatively, we can use xarray’s apply_ufunc() function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size.

ds_remap = xr.apply_ufunc(
    egr.apply_weights,
    ds,
    kwargs=weights,
    keep_attrs=True,
    input_core_dims=[["ncells"]],
    output_core_dims=[["cell"]],
    output_dtypes=["f4"],
    vectorize=True,
    dask="parallelized",
    dask_gufunc_kwargs={
        "output_sizes": {"cell": weights.sizes["tgt_idx"]},
    },
)
ds_remap
<xarray.Dataset> Size: 2GB
Dimensions:  (time: 124, cell: 786432)
Coordinates:
  * time     (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 2020-01-31T23...
Dimensions without coordinates: cell
Data variables:
    hus2m    (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    psl      (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    ts       (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    uas      (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    vas      (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>

The resulting dataset can then be used as usual and the remapping is performed on demand.

egh.healpix_show(ds_remap.ts.isel(time=0));
../../_images/2ad5b4a16c0df4c6a6dbbd80943e7194d3fd0a9e27f692638e9e015c7854f35f.png

Attaching a coordinate reference system#

It is good practice to store map projection information in the Coordinate Reference Systems (CRS). By making the crs coordinate non-dimensional it will “stick” to the dataset even if individual variables are subselected.

crs = xr.DataArray(
    name="crs",
    attrs={
        "grid_mapping_name": "healpix",
        "refinement_level": order,
        "indexing_scheme": "nested",
    },
)

cell = xr.DataArray(
    pixels, dims=("cell",), attrs={"standard_name": "healpix_index"}
)

ds_remap_cf = ds_remap.assign_coords(crs=crs, cell=cell)
ds_remap_cf
<xarray.Dataset> Size: 2GB
Dimensions:  (time: 124, cell: 786432)
Coordinates:
  * time     (time) datetime64[ns] 992B 2020-01-01T05:59:59 ... 2020-01-31T23...
  * cell     (cell) int64 6MB 0 1 2 3 4 5 ... 786427 786428 786429 786430 786431
    crs      float64 8B nan
Data variables:
    hus2m    (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    psl      (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    ts       (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    uas      (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>
    vas      (time, cell) float32 390MB dask.array<chunksize=(1, 786432), meta=np.ndarray>