--- file_format: mystnb kernelspec: name: python3 execution: timeout: 600 --- # Remapping a dataset to HEALPix In this tutorial we will remap a dataset from the native ICON grid to HEALPix. The dataset used as an example is from the EERIE project and is available online. It consists of five 2d variables and covers 23 years with a 6 hour time step. ```{code-cell} ipython3 import healpix as hp import numpy as np import xarray as xr import easygems.healpix as egh import easygems.remap as egr ds = xr.open_dataset( "https://eerie.cloud.dkrz.de/datasets/icon-esm-er.eerie-control-1950.v20240618.atmos.native.2d_6h_inst/kerchunk", engine="zarr", chunks={}, ) ds ``` The first step is to create a HEALPix grid that is close to the resolution of our source grid. Here we will choose an order of 9 (also known as "zoom") and "nest" ordering. ```{code-cell} ipython3 order = zoom = 9 nside = hp.order2nside(order) npix = hp.nside2npix(nside) hp_lon, hp_lat = hp.pix2ang(nside=nside, ipix=np.arange(npix), lonlat=True, nest=True) hp_lon = (hp_lon + 180) % 360 - 180 # [-180, 180) ``` Depending on the specifics of the source and target grids, it is possible that the triangulation process will be unable to identify three surrounding points for grid points located near the edge of the grid. One way to account for the periodic nature of the longitude dimension is to stack the grid, shifting it by ±360° (towards East and West). ```{code-cell} ipython3 lon_periodic = np.hstack((ds.lon - 360, ds.lon, ds.lon + 360)) lat_periodic = np.hstack((ds.lat, ds.lat, ds.lat)) ``` Next, we can use our defined source and target grids to compute interpolation weights. The `easygems` package provides a function to compute these weights using the [Delaunay triangulation](https://docs.scipy.org/doc/scipy/tutorial/spatial.html#delaunay-triangulations) method. ```{code-cell} ipython3 weights = egr.compute_weights_delaunay( points=(lon_periodic, lat_periodic), xi=(hp_lon, hp_lat) ) ``` Finally, we remap the source indices in the weight files to the valid range in the source grid. This enables us to apply the weights to the original data while retaining the periodic information. ```{code-cell} ipython3 weights = weights.assign(src_idx=weights.src_idx % ds.lat.size) # You can also save the calculated weights for future use # weights.to_netcdf("healpix_weights.nc") ``` These weights can be applied to single fields directly: ```{code-cell} ipython3 egh.healpix_show(egr.apply_weights(ds.ts.isel(time=0), **weights)) ``` Alternatively, we can use xarray's [apply_ufunc()](https://docs.xarray.dev/en/stable/generated/xarray.apply_ufunc.html) function to lift the function onto a full dataset. This requires a coupled of additional information from the user, e.g. the input dimension along which the function should be applied, and the resulting output dimensions name and size. ```{code-cell} ipython3 ds_remap = xr.apply_ufunc( egr.apply_weights, ds, kwargs=weights, keep_attrs=True, input_core_dims=[["ncells"]], output_core_dims=[["cell"]], output_dtypes=["f4"], vectorize=True, dask="parallelized", dask_gufunc_kwargs={ "output_sizes": {"cell": npix}, }, ) ds_remap ``` The resulting dataset can then be used as usual and the remapping is performed on demand. ```{code-cell} ipython3 egh.healpix_show(ds_remap.ts.sel(time="1991-01-01").mean("time")) ``` ## Storing the coordinate reference system It is good practice to store map projection information in the Coordinate Reference Systems (CRS). By making the `crs` coordinate non-dimensional it will "stick" to the dataset even if individual variables are subselected. ```{code-cell} ipython3 crs = xr.DataArray( name="crs", attrs={ "grid_mapping_name": "healpix", "healpix_nside": 2**zoom, "healpix_order": "nest", }, ) ds_remap = ds_remap.assign_coords(crs=crs) ```