Selecting geospatial features#

For scientific analysis it is often required to sample specific parts of the globe, whether geospatial sections or scientific regions of interest.

The HEALPix grid provides an analytical mapping between coordinates and indices, enabling fast and straightforward sampling. This notebook showcases some of the convenience functions provided by the easygems package. For demonstration, we will use a remapped version of the ERA5 reanalysis dataset.

import intake
import easygems.healpix as egh


cat = intake.open_catalog("https://data.nextgems-h2020.eu/online.yaml")
ds = cat.ERA5(zoom=6).to_dask()
ds
<xarray.Dataset> Size: 136GB
Dimensions:  (time: 1008, cell: 49152, level: 37)
Coordinates:
  * time     (time) datetime64[ns] 8kB 1940-01-01 1940-02-01 ... 2023-12-01
  * cell     (cell) float32 197kB 0.0 1.0 2.0 ... 4.915e+04 4.915e+04 4.915e+04
  * level    (level) float32 148B 1.0 2.0 3.0 5.0 ... 925.0 950.0 975.0 1e+03
    crs      float32 4B ...
    lat      (cell) float32 197kB dask.array<chunksize=(49152,), meta=np.ndarray>
    lon      (cell) float32 197kB dask.array<chunksize=(49152,), meta=np.ndarray>
Data variables: (12/111)
    100u     (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    100v     (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    10fg     (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    10si     (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    10u      (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    10v      (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    ...       ...
    uvb      (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
    v        (time, level, cell) float32 7GB dask.array<chunksize=(24, 4, 4096), meta=np.ndarray>
    vo       (time, level, cell) float32 7GB dask.array<chunksize=(24, 4, 4096), meta=np.ndarray>
    w        (time, level, cell) float32 7GB dask.array<chunksize=(24, 4, 4096), meta=np.ndarray>
    z        (time, level, cell) float32 7GB dask.array<chunksize=(24, 4, 4096), meta=np.ndarray>
    zust     (time, cell) float32 198MB dask.array<chunksize=(24, 16384), meta=np.ndarray>
Attributes:
    acknowledgment:  Contains modified Copernicus Climate Change Service info...
    contact:         lukas.kluft@mpimet.mpg.de
    creator:         Lukas Kluft
    description:     HEALPixelation of ERA5
    source:          Post-processed dataset based on the ERA5 mirror located ...

Geospatial extent#

As a first example, we will select a geospatial extent defined by its longitudinal (W–E) and latitudinal (S–N) coverage.

ds_extent = egh.select_extent(ds, [-80, -10, 0, 30])  # W, E, S, N
ds_extent
<xarray.Dataset> Size: 6GB
Dimensions:  (time: 1008, cell: 2326, level: 37)
Coordinates:
  * time     (time) datetime64[ns] 8kB 1940-01-01 1940-02-01 ... 2023-12-01
  * cell     (cell) float32 9kB 1.229e+04 1.229e+04 ... 3.225e+04 3.226e+04
  * level    (level) float32 148B 1.0 2.0 3.0 5.0 ... 925.0 950.0 975.0 1e+03
    crs      float32 4B ...
    lat      (cell) float32 9kB dask.array<chunksize=(2326,), meta=np.ndarray>
    lon      (cell) float32 9kB dask.array<chunksize=(2326,), meta=np.ndarray>
Data variables: (12/111)
    100u     (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    100v     (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    10fg     (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    10si     (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    10u      (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    10v      (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    ...       ...
    uvb      (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
    v        (time, level, cell) float32 347MB dask.array<chunksize=(24, 4, 2326), meta=np.ndarray>
    vo       (time, level, cell) float32 347MB dask.array<chunksize=(24, 4, 2326), meta=np.ndarray>
    w        (time, level, cell) float32 347MB dask.array<chunksize=(24, 4, 2326), meta=np.ndarray>
    z        (time, level, cell) float32 347MB dask.array<chunksize=(24, 4, 2326), meta=np.ndarray>
    zust     (time, cell) float32 9MB dask.array<chunksize=(24, 2326), meta=np.ndarray>
Attributes:
    acknowledgment:  Contains modified Copernicus Climate Change Service info...
    contact:         lukas.kluft@mpimet.mpg.de
    creator:         Lukas Kluft
    description:     HEALPixelation of ERA5
    source:          Post-processed dataset based on the ERA5 mirror located ...
egh.healpix_show(ds_extent["2t"].sel(time="2020-02-01"))
<matplotlib.image.AxesImage at 0x7f4817b77230>
../../_images/7ecf5282990ed7857228d28ab8c923cb42bf164086b47f7bc9446049b707a66a.png

Geodesic section#

Next, we will select a geospatial section defined by a start point (lon1, lat1) and an end point (lon2, lat2). The returned dataset is still ordered as the original global HEALPix map. Since HEALPix indices do not follow a simple geospatial order, sorting by a coordinate such as lat makes the data easier to work with for plotting or further analysis.

ds_section = egh.select_section(ds, 10, -60, -60, 60).sortby("lat")
ds_section
<xarray.Dataset> Size: 499MB
Dimensions:  (time: 1008, cell: 180, level: 37)
Coordinates:
  * time     (time) datetime64[ns] 8kB 1940-01-01 1940-02-01 ... 2023-12-01
  * cell     (cell) float32 720B 3.486e+04 3.486e+04 ... 1.589e+04 1.59e+04
  * level    (level) float32 148B 1.0 2.0 3.0 5.0 ... 925.0 950.0 975.0 1e+03
    crs      float32 4B ...
    lat      (cell) float32 720B dask.array<chunksize=(180,), meta=np.ndarray>
    lon      (cell) float32 720B dask.array<chunksize=(180,), meta=np.ndarray>
Data variables: (12/111)
    100u     (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    100v     (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    10fg     (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    10si     (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    10u      (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    10v      (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    ...       ...
    uvb      (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
    v        (time, level, cell) float32 27MB dask.array<chunksize=(24, 4, 180), meta=np.ndarray>
    vo       (time, level, cell) float32 27MB dask.array<chunksize=(24, 4, 180), meta=np.ndarray>
    w        (time, level, cell) float32 27MB dask.array<chunksize=(24, 4, 180), meta=np.ndarray>
    z        (time, level, cell) float32 27MB dask.array<chunksize=(24, 4, 180), meta=np.ndarray>
    zust     (time, cell) float32 726kB dask.array<chunksize=(24, 180), meta=np.ndarray>
Attributes:
    acknowledgment:  Contains modified Copernicus Climate Change Service info...
    contact:         lukas.kluft@mpimet.mpg.de
    creator:         Lukas Kluft
    description:     HEALPixelation of ERA5
    source:          Post-processed dataset based on the ERA5 mirror located ...

We can plot the coordinates along our section to verify the selection worked as expected

import matplotlib.pyplot as plt
import cartopy.crs as ccrs


fig, ax = plt.subplots(subplot_kw={"projection": ccrs.PlateCarree()})
ax.set_global()
ax.coastlines()
ax.plot(ds_section.lon, ds_section.lat, transform=ccrs.Geodetic())
[<matplotlib.lines.Line2D at 0x7f47f4bde210>]
../../_images/cd9251356c7b70982782c896b9cc7426fc715daf2cc85121374595daff8ea8cc.png

We can also, of course, consider physical variables along the section, such as total precipitation, averaged over time.

ds_section["tp"].sel(time="2020").mean("time").plot(x="lat")
[<matplotlib.lines.Line2D at 0x7f47dfe7a0d0>]
../../_images/2fb6da594b3cd46fc6e245dc5d84b2dc2f2d99315fbc865422f48d40a33cd46e.png

Scientific regions#

For some analysis, specifically defined “scientific regions” have been defined. The regionmask package provides a convenient interface to apply these regions to latitude and longitude coordinates. Our select_regionmask() function is a convenient wrapper that makes use of the region masks to select HEALPix maps.

By default, the AR6 climate reference regions (Iturbide et al. (2020)) are used. Other defined regions sprovided by regionmask can be passed via the defined_regions keyword.

Regions can be selected by passing their number or abbreviation. Here, we select Central and South America:

ds_roi = egh.select_regionmask(ds, [9, 10, 11, 12, 13, 14, 15])

egh.healpix_show(ds_roi["2t"].sel(time="2020-02-01"))
<matplotlib.image.AxesImage at 0x7f47dd2b8910>
../../_images/efc5dfe93139c28fe26d0a797cdf3d037c3da41b16592bf2edbd11e52bf23407.png

Again, we can also do arbitrary analysis on our selected data

ds_roi["2t"].mean("cell").plot()
[<matplotlib.lines.Line2D at 0x7f47dd2bbed0>]
../../_images/8ab855438775dbd0f1336ab3b584cb39f1a5fd6937688b5f923923bd2b9032c8.png