# Time space diagrams#

Different types of time-space diagrams are presented using traditional filled contoured, pcolormesh, and imshow. Examples are presented for two types of diagrams are created: - time latitude diagrams to illustrate the monsoon - time longitude diagram to illustrate interseasonal variability

For this we use some generic tools and functions that have been written for the nextGEMS data, especially to deal with the HEALPix formation.

[1]:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

import intake
import xarray as xr


## Time-latitude (e.g., for monsoon)#

### Setup and data sampling#

#### Coordinates and Regions#

For this analysis, it is convenient to have latitude and longitude coordinates attached to the dataset. We’ll use these coordinates both for selecting regions of interest as well as grouping the data for plotting. We can obtain these coordinates using a few helper functions and the healpy library:

[2]:

import healpy

def get_nest(dx):
return dx.crs.healpix_order == "nest"

def get_nside(dx):
return dx.crs.healpix_nside

def attach_coords(ds):
lons, lats = healpy.pix2ang(
get_nside(ds), np.arange(ds.dims["cell"]), nest=get_nest(ds), lonlat=True
)
return ds.assign_coords(
lat=(("cell",), lats, {"units": "degree_north"}),
lon=(("cell",), lons, {"units": "degree_east"}),
)


Based on these coordinates, we can extract cell indices for desired regions. Note that you might also want to consider healpy pixel querying routines instead. Especially for high resolution data, it might be prohibitive to first create an array of all cell coordinates just to throw them away.

In this case, we first specify the domains conveniently as a function returning a boolean array of valid cells given a Dataset. In order not to repeat ourselves, we convert this boolean array to an index array using np.where in a separate function:

[3]:

domains = {
"atlantic": lambda ds: ds["lon"] > 300,
"east pacific": lambda ds: (ds["lon"] > 210) & (ds["lon"] < 270.0),
"maritime continent": lambda ds: (ds["lon"] > 110) & (ds["lon"] < 150.0),
"indian ocean": lambda ds: (ds["lon"] > 60) & (ds["lon"] < 120.0),
"global": lambda ds: True,
}

def cells_of_domain(ds, domain_name):
return np.where(domains[domain_name](ds))[0]


#### Plot settings#

Scales and color maps are defined that will be used to define units for susquent plotting.

[4]:

plot_properties = {
"pr": {"vmin": 0.0, "vmax": 15.0 / (24 * 60 * 60), "cmap": "inferno"},
"uas": {"vmin": -8.0, "vmax": 8.0, "cmap": "PiYG"},
"prw": {"vmin": 20.0, "vmax": 60.0, "cmap": "magma"},
}


Additional to these variable-dependent plot properties, we also want to customize the plot appearence. To do this consistently for the upcoming plot examples, we’ll define a function which applies our customizations to a plot’s axis:

[5]:

def format_axes_lat(ax):
ax.set_ylabel("latitude / degN")

ax.grid(True)
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=(1, 7)))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(
mdates.ConciseDateFormatter(ax.xaxis.get_major_locator())
)
ax.set_yscale(
"function",
functions=(
),
)
ax.set_ylim(-90, 90)
ax.set_yticks([-50, -30, -10, 0, 10, 30, 50])

def format_axes_lon(ax):
ax.set_xticks(np.arange(7) * 60)
ax.set_xlabel("longitude / degE")

ax.yaxis.set_major_locator(mdates.MonthLocator())
ax.yaxis.set_minor_locator(mdates.MonthLocator())
ax.grid(True)
ax.yaxis.set_major_formatter(
mdates.ConciseDateFormatter(ax.yaxis.get_major_locator())
)


### Obtaining the data#

[6]:

cat = intake.open_catalog("https://data.nextgems-h2020.eu/catalog.yaml")
experiment = cat.ICON.ngc3028

ERROR 1: PROJ: proj_create_from_database: Open of /sw/spack-levante/mambaforge-23.1.0-1-Linux-x86_64-3boc6i/share/proj failed


Here comes the time consuming step: we resample the daily data to aggregate over three day periods, and regroup it by latitude. Timing this step took 37s, for a zoom level 5 of daily data, using 1 core of a Levante compute node. The amount of time will also depend on the density of the original data. Below we actually evaluate the data so that the subsequent plotting functions don’t need to do the time-intensive again and again.

[7]:

%%time

var = "prw"
domain = "maritime continent"

da = (
ds[var]
.isel(cell=cells_of_domain(ds, domain))
.resample(time="3D")
.mean(dim="time")
.groupby("lat")
.mean()
).compute()

CPU times: user 888 ms, sys: 1.8 s, total: 2.69 s
Wall time: 1min 14s


### Plotting the data#

In the examples below we plot the data using different approaches. 1. use a simple pcolormesh plot which should be self explanatory. 2. use a filled contour plot which some people might know from the good old days. 3. use imshow and explain why it may be problematic in this case.

Previously a datashader scatter plot has been used at this place as well. This has been removed as the data is already available in a dense rectangular grid with less data points than pixels. In this case pcolormesh will do just as good, but is much simpler to do correctly.

#### pcolormesh#

The using pcolormesh is xarray’s default for 2D data. We’ll just use it for simplicity:

[8]:

%%time
fig, ax = plt.subplots(figsize=(7, 3.5), constrained_layout=True)
ax.set_title(f"{domain} -- {var}")
format_axes_lat(ax)

CPU times: user 38 ms, sys: 1.92 ms, total: 39.9 ms
Wall time: 47.3 ms


#### contourf#

Some people like the broken shading of countourf. It’s easy to use as well, if it’s worth typing a few additional letters is up to the reader.

[9]:

%%time
fig, ax = plt.subplots(figsize=(7, 3.5), constrained_layout=True)
ax.set_title(f"{domain} -- {var}")
format_axes_lat(ax)

CPU times: user 40 ms, sys: 966 µs, total: 40.9 ms
Wall time: 48.5 ms


#### imshow#

imshow tends to be faster than either of the above, however it assumes linear spacing of the data in the array, which doesn’t apply to our case. Not only do we use a sin-scale for latitudes to better represent their share of global area, but also the HEALPix grid doesn’t have equal spacing of latitude rings:

[10]:

index = np.arange(0, len(da.lat))
lats_linspaced = np.linspace(da.lat.min(), da.lat.max(), len(da.lat))
fig, ax = plt.subplots()
plt.plot(index, lats_linspaced, label="linear")
plt.plot(index, da.lat, label="HEALPix")
plt.xlabel("index")
plt.ylabel("latitude / degN")
plt.title("non-linear latitude spacing of HEALPix grid")
None


If we were to ignore this fact (or assume the deviation is small), we could just do a quick plot as follows.

NOTE: the y-scale is wrong (as shown in the plot above). We intentionally don’t call format_axes_lat, because the sin-scale would not change the picture but make the scale even worse (the sin-scale would not be applied to the data, only to the scale).

On the machine where this notebook was developed, the difference between pcolormesh and imshow is 40 ms vs 25 ms. A noticable difference, but maybe not worth the extra trouble.

[11]:

%%time
fig, ax = plt.subplots(figsize=(7, 3.5), constrained_layout=True)
ax.set_title(f"{domain} -- {var}")
# format_axes_lat(ax)
None

CPU times: user 18.2 ms, sys: 2.57 ms, total: 20.8 ms
Wall time: 19.4 ms


## Longitude-time (e.g., for interseasonal)#

### Setup and data sampling#

HEALPix Data is aligned along lines of longitude only in the tropics. This means, the simple groupby approach works only in those regions. Outside that range one might want to use groupby_bins or resort to e.g. a scatter plot using datashader.

Two examples are shown, one that uses the matplotlib plot functionality which is very fast, but can lead to problems if applied to the extratropics. Another using data-shader, which requires some additional massaging of the data as datashader is designed to work with panda dataframes.

[12]:

%%time

var_lon = "pr"

Slim, Nlim = 15.0, 35.0
t1, t2 = "2023-08-01", "2023-11-01"
da_by_lon = (
ds[var_lon]
.sel(time=slice(t1, t2))
.where((ds["lat"] > Slim) & (ds["lat"] < Nlim), drop=True)
.groupby("lon")
.mean()
).compute()

CPU times: user 1.23 s, sys: 736 ms, total: 1.96 s
Wall time: 9.02 s

[13]:

%%time
fig, ax = plt.subplots(figsize=(4, 6), constrained_layout=True)
ax.set_title(f"{var_lon}: {Slim}$^{{\circ}}$ to {Nlim}$^{{\circ}}$")
format_axes_lon(ax)
ax.set_xlim(100.0, 330.0)

CPU times: user 70.8 ms, sys: 6.48 ms, total: 77.3 ms
Wall time: 76 ms

[13]:

(100.0, 330.0)


To use datashader we must massage the data into a 1 dimensional Pandas dataframe, with the different coordinates attached. This works fine when we use native HEALPix data, but the time-longitude grouping results in 2D x-arrays that we have to massage back into the dataframe format, also dealing with the fact that datashader does not like the x-array time coordinate.

[14]:

def time_as_days_since_1970(ds):
return (ds.time - np.datetime64("1970-01-01")) / np.timedelta64(1, "D")

df = (
da_by_lon.assign_coords(time=time_as_days_since_1970)
.stack(z=("lon", "time"))
.to_dataframe()
)

[15]:

%%time

fig, ax = plt.subplots(figsize=(4, 6), constrained_layout=True)
artist = dsshow(
df,
aspect="auto",
x_range=[100.0, 330.0],
ax=ax,
**plot_properties[var_lon],
)

ax.set_title(f"{var_lon}: {Slim}$^{{\circ}}$ to {Nlim}$^{{\circ}}$")
format_axes_lon(ax)
ax.set_xlim(100.0, 330.0)

CPU times: user 2.22 s, sys: 88.1 ms, total: 2.31 s
Wall time: 2.53 s

[15]:

(100.0, 330.0)