Using land-sea mask#

In this example, it is shown how to use a land-sea mask. We’ll use binary masks on the surface level, which can be extracted from the variable ocean_fraction_surface. Throughout these exercises we will use the land-sea mask to plot

Preparations#

Importing packages#

[1]:
import intake
import cartopy.crs as ccrs
import cmocean
import healpy as hp
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

Getting and selecting data#

We pick a nextgems run from the catalog as usual. Furthermore, we keep exp_name and a time_slice of interest for later usage.

[2]:
cat = intake.open_catalog("https://data.nextgems-h2020.eu/catalog.yaml")
exp_name = "ngc3028"
experiment = cat.ICON[exp_name]
time_slice = slice("2020-02-01", "2020-03-31")

Helper functions#

We define some helper functions, which * use the healpy library to compute the locations of the grid cells * change the units of the precipitation variable pr to better fit the upcoming analysis

[3]:
def get_nest(ds):
    return ds.crs.healpix_order == "nest"


def get_nside(ds):
    return ds.crs.healpix_nside


def attach_coords(ds):
    lons, lats = hp.pix2ang(
        get_nside(ds), np.arange(ds.dims["cell"]), nest=get_nest(ds), lonlat=True
    )
    return ds.assign_coords(
        lat=(("cell",), lats, {"units": "degree_north"}),
        lon=(("cell",), lons, {"units": "degree_east"}),
    )


def preprocess(ds):
    """
    Adapts dataset to the needs of this notebook.
    """
    ds = attach_coords(ds)
    ds = ds.assign(pr=(ds.pr * 86400).assign_attrs(units="mm d-1"))
    return ds

We’ll also use nnshow from the cartopy example:

[4]:
def nnshow(var, nx=1000, ny=1000, ax=None, **kwargs):
    """
    var: variable on healpix coordinates (array-like)
    nx: image resolution in x-direction
    ny: image resolution in y-direction
    ax: axis to plot on
    kwargs: additional arguments to imshow
    """
    if ax is None:
        ax = plt.gca()

    xlims = ax.get_xlim()
    ylims = ax.get_ylim()
    # NOTE: we want the center coordinate of each pixel, thus we have to
    # compute the linspace over halve a pixel size less than the plot's limits
    dx = (xlims[1] - xlims[0]) / nx
    dy = (ylims[1] - ylims[0]) / ny
    xvals = np.linspace(xlims[0] + dx / 2, xlims[1] - dx / 2, nx)
    yvals = np.linspace(ylims[0] + dy / 2, ylims[1] - dy / 2, ny)
    xvals2, yvals2 = np.meshgrid(xvals, yvals)
    latlon = ccrs.PlateCarree().transform_points(
        ax.projection, xvals2, yvals2, np.zeros_like(xvals2)
    )
    valid = np.all(np.isfinite(latlon), axis=-1)
    points = latlon[valid].T
    pix = hp.ang2pix(
        hp.npix2nside(len(var)), theta=points[0], phi=points[1], nest=True, lonlat=True
    )
    res = np.full(latlon.shape[:-1], np.nan, dtype=var.dtype)
    res[valid] = var[pix]
    return ax.imshow(res, extent=xlims + ylims, origin="lower", **kwargs)

masks for specific regions#

The purpose of this article is to look at the data based on some regions, so let’s define region masks as functions of a dataset, thus the masks can automatically adapt to the chosen zoom level of the dataset.

[5]:
def tropics(ds):
    return np.abs(ds.lat) <= 30.1


def land(ds):
    return ds.ocean_fraction_surface == 0


def ocean(ds):
    return ds.ocean_fraction_surface == 1


def south_america(ds):
    return tropics(ds) & (ds.lon >= 280) & (ds.lon <= 330) & land(ds)


def indo_pacific(ds):
    return (np.abs(ds.lat) <= 10) & (ds.lon >= 80) & (ds.lon <= 150) & ocean(ds)

testing the masks#

It’s always good to check if those masks we’ve defined actually correspond to the regions we are interested in. For testing, we need a dataset. As the result of the mask function is a boolean mask (i.e. 0 and 1 values) on the healpix grid, we can quickly visualize them using healpy functions:

[6]:
ds = experiment(zoom=6, chunks="auto").to_dask().pipe(preprocess)
hp.mollview(tropics(ds) & ocean(ds), nest=True, flip="geo")
../../_images/Processing_healpix_Land_sea_mask_13_0.png
[7]:
hp.mollview(south_america(ds), nest=True, flip="geo")
../../_images/Processing_healpix_Land_sea_mask_14_0.png

Plotting settings#

[8]:
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter

proj = ccrs.PlateCarree(central_longitude=-135.5808361)
sns.set_context("talk")  # make plots nicer

Plots#

Tropical land precipitation#

To get data only for grid cells in the tropics and on land, we can combine the tropics and the land mask using &. We use .pipe(preprocess) to apply our preprocessing function defined above.

[9]:
%%time
ds = (
    experiment(zoom=6, time="P1D", chunks="auto")
    .to_dask()
    .pipe(preprocess)
    .sel(time=time_slice)
)

pr_tropics_land = ds["pr"].where(tropics(ds) & land(ds)).mean("time")
CPU times: user 104 ms, sys: 5.62 ms, total: 109 ms
Wall time: 130 ms

We can now plot the selected data:

[10]:
%%time
title = f"{exp_name} Feb.2020 - Mar.2020 - zoom=6 - Tropical land"
fig, ax = plt.subplots(
    figsize=(12, 8), facecolor="white", subplot_kw={"projection": proj}
)
ax.set_extent([-180, 180, -30, 30], crs=ccrs.PlateCarree())
artist = nnshow(pr_tropics_land, ny=200, ax=ax, cmap=cmocean.cm.rain)
ax.coastlines(color="gray", lw=1)
fig.colorbar(
    artist, label="P / mmd$^{-1}$", shrink=0.5, orientation="horizontal", pad=0.05
)
plt.title(title)
None
CPU times: user 260 ms, sys: 289 ms, total: 549 ms
Wall time: 2.04 s
../../_images/Processing_healpix_Land_sea_mask_20_1.png

PDF of precipitation#

Here, we want to look at differences in rain intensity between ocean and land areas within the tropics. To do so, we’ll apply the same masks from above, but look at histograms instead of maps. Let’s get some data first:

[11]:
ds_hist = (
    experiment(zoom=6, time="P1D")
    .to_dask()
    .pipe(preprocess)
    .sel(time=slice("2020-02-01", "2021-01-31"))
)

Calculating histogram

[12]:
bins = np.logspace(np.log10(0.1), np.log10(250), 20)
histogram_tropics, _ = np.histogram(
    ds_hist.pr.where(tropics(ds_hist)), bins, density=True
)
histogram_tropics_ocean, _ = np.histogram(
    ds_hist.pr.where(tropics(ds_hist) & ocean(ds_hist)), bins, density=True
)
histogram_tropics_land, _ = np.histogram(
    ds_hist.pr.where(tropics(ds_hist) & land(ds_hist)), bins, density=True
)
[13]:
xaxis = (bins[1:] + bins[:-1]) / 2
fig, ax = plt.subplots(figsize=(8, 6), facecolor="white")
plt.plot(xaxis, histogram_tropics, label="Tropics")
plt.plot(xaxis, histogram_tropics_ocean, label="Tropical Ocean")
plt.plot(xaxis, histogram_tropics_land, label="Tropical Land")
ax.set_yscale("log")
ax.set_xlabel("P / mmd$^{-1}$")
sns.despine(offset=10)
plt.legend()
[13]:
<matplotlib.legend.Legend at 0x7fff841230a0>
../../_images/Processing_healpix_Land_sea_mask_25_1.png

Diurnal cycle of precipitation#

For this example, we use data with a time step of 30 minutes to calculate the dirunal cycle of precipitation over two regions: South America and the Indo-Pacific. Because data with time step of 30 minutes do not have the variableocean_fraction_surface, we use the daily output to extract this variable and add it to the 30 minutes time step.

[14]:
%%time
ds_diurnal = (
    experiment(zoom=6, time="PT30M", chunks="auto")
    .to_dask()
    .pipe(preprocess)
    .sel(time=slice("2020-02-01", "2020-02-29"))
)
# unfortunately, ocean_fraction_surface is only available in daily output. Let's fix that:
ds_diurnal = ds_diurnal.assign(
    ocean_fraction_surface=experiment(zoom=6, time="P1D")
    .to_dask()
    .ocean_fraction_surface
)
CPU times: user 77.7 ms, sys: 4.21 ms, total: 81.9 ms
Wall time: 209 ms

We’ll group the data by the hour of the day and average over those datapoints over time from every day within our selection:

[15]:
%%time
pr_samerica_28 = (
    ds_diurnal["pr"].where(south_america(ds_diurnal)).groupby("time.hour").mean()
)

pr_indopacific_28 = (
    ds_diurnal["pr"].where(indo_pacific(ds_diurnal)).groupby("time.hour").mean()
)
CPU times: user 73.2 ms, sys: 2.83 ms, total: 76 ms
Wall time: 79.3 ms

Calculating diurnal cycle of precipitation

Now, while we have the data grouped by hour, it’s the UTC hour and not the local time of the day. Before averaging horizontally in space, we have to shift every grid cell’s time to local time. We’ll do so by converting the cell’s longitude to a time offset and the modulo operator. We could have done this before the first time-average, but that would create a very large intermediate time variable for each grid cell and all points in time. Doing it in two steps can save a bit of memory.

[16]:
def local_hour(data):
    return np.round(data.hour + data.lon * (24 / 360)) % 24


def d_cycle(data):
    return data.groupby(local_hour(data)).mean()
[17]:
%%time
dcycle_samerica_28 = d_cycle(pr_samerica_28)
dcycle_indopacific_28 = d_cycle(pr_indopacific_28)
CPU times: user 1.06 s, sys: 61.5 ms, total: 1.12 s
Wall time: 1.13 s
[18]:
def format_axes_time_series(ax, title):
    ax.spines.right.set_visible(False)
    ax.spines.top.set_visible(False)
    ax.set_xlabel("LST / hr")
    ax.set_ylabel("P / mmd$^{-1}$")
    ax.set_title(title)
    sns.despine(offset=10)
[19]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4), facecolor="white")
ax1.plot(dcycle_samerica_28)
format_axes_time_series(ax1, title="South America - Feb.2020")

ax2.plot(dcycle_indopacific_28)
format_axes_time_series(ax2, title="Indo-Pacific - Feb.2020")
../../_images/Processing_healpix_Land_sea_mask_35_0.png