--- file_format: mystnb kernelspec: name: python3 display_name: Python 3 execution: timeout: 30 --- # Selecting geospatial features For scientific analysis it is often required to sample specific parts of the globe, whether geospatial sections or scientific regions of interest. The HEALPix grid provides an analytical mapping between coordinates and indices, enabling fast and straightforward sampling. This notebook showcases some of the convenience functions provided by the `easygems` package. For demonstration, we will use a remapped version of the ERA5 reanalysis dataset. ```{code-cell} ipython3 import intake import easygems.healpix as egh cat = intake.open_catalog("https://data.nextgems-h2020.eu/online.yaml") ds = cat.ERA5(zoom=6).to_dask() ds ``` ## Geospatial extent As a first example, we will select a geospatial extent defined by its longitudinal (W--E) and latitudinal (S--N) coverage. ```{code-cell} ipython3 ds_extent = egh.select_extent(ds, [-80, -10, 0, 30]) # W, E, S, N ds_extent ``` ```{code-cell} ipython3 egh.healpix_show(ds_extent["2t"].sel(time="2020-02-01")) ``` ## Geodesic section Next, we will select a geospatial section defined by a start point (`lon1`, `lat1`) and an end point (`lon2`, `lat2`). The returned dataset is still ordered as the original global HEALPix map. Since HEALPix indices do not follow a simple geospatial order, sorting by a coordinate such as `lat` makes the data easier to work with for plotting or further analysis. ```{code-cell} ipython3 ds_section = egh.select_section(ds, 10, -60, -60, 60).sortby("lat") ds_section ``` We can plot the coordinates along our section to verify the selection worked as expected ```{code-cell} ipython3 import matplotlib.pyplot as plt import cartopy.crs as ccrs fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()}) ax.set_global() ax.coastlines() ax.plot(ds_section.lon, ds_section.lat, transform=ccrs.Geodetic()) ``` We can also, of course, consider physical variables along the section, such as total precipitation, averaged over time. ```{code-cell} ipython3 ds_section["tp"].sel(time="2020").mean("time").plot(x="lat") ``` ## Scientific regions For some analysis, specifically defined "scientific regions" have been defined. The [`regionmask`](https://regionmask.readthedocs.io/) package provides a convenient interface to apply these regions to latitude and longitude coordinates. Our `select_regionmask()` function is a convenient wrapper that makes use of the region masks to select HEALPix maps. By default, the AR6 climate reference regions ([Iturbide et al. (2020)](https://essd.copernicus.org/articles/12/2959/2020/essd-12-2959-2020.html)) are used. Other [defined regions](https://regionmask.readthedocs.io/en/stable/defined_scientific.html) sprovided by `regionmask` can be passed via the `defined_regions` keyword. Regions can be selected by passing their number or abbreviation. Here, we select Central and South America: ```{code-cell} ipython3 ds_roi = egh.select_regionmask(ds, [9, 10, 11, 12, 13, 14, 15]) egh.healpix_show(ds_roi["2t"].sel(time="2020-02-01")) ``` Again, we can also do arbitrary analysis on our selected data ```{code-cell} ipython3 ds_roi["2t"].mean("cell").plot() ```