Vertical interpolation on non-coordinate dimensions#

Vertical interpolation is a very common task in geoscientific data analysis: model output is often defined on terrain-following or hybrid vertical coordinates, while analysis and diagnostics are typically performed on fixed pressure or height levels.

In simple cases, vertical interpolation can be done using xarray.interp(). However, this requires the vertical coordinate to be a 1D coordinate variable. In many real-world datasets, the vertical coordinate itself varies with time and horizontal position. In that case, it is no longer a true coordinate from xarray’s perspective, and a different approach is required.

This notebook demonstrates how to perform vectorized vertical interpolation with xarray when the vertical coordinate depends on multiple dimensions.

Opening the dataset#

First, we open a global ICON dataset stored on the native horizontal and vertical grids.
For performance reasons, we explicitly chunk the dataset so that entire vertical columns are kept in one Dask chunk. This is important because interpolation will operate along the vertical dimension.

import intake


cat = intake.open_catalog("https://data.nextgems-h2020.eu/online.yaml")
ds = cat.tutorial["ICON.native.3d_P1D_mean"](chunks={"height": -1}).to_dask()
ds
<xarray.Dataset> Size: 30GB
Dimensions:             (time: 4, height: 90, ncells: 5242880)
Coordinates:
  * time                (time) datetime64[ns] 32B 2026-01-02 ... 2026-01-05
  * height              (height) float64 720B 1.0 2.0 3.0 4.0 ... 88.0 89.0 90.0
    cell_sea_land_mask  (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lat                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lon                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
Dimensions without coordinates: ncells
Data variables:
    pfull               (time, height, ncells) float32 8GB dask.array<chunksize=(1, 90, 5242880), meta=np.ndarray>
    ta                  (time, height, ncells) float32 8GB dask.array<chunksize=(1, 90, 5242880), meta=np.ndarray>
    ua                  (time, height, ncells) float32 8GB dask.array<chunksize=(1, 90, 5242880), meta=np.ndarray>
    va                  (time, height, ncells) float32 8GB dask.array<chunksize=(1, 90, 5242880), meta=np.ndarray>
Attributes: (12/31)
    Conventions:           CF-1.7 CMIP-6.2
    activity_id:           HighResMIP
    contact:               juergen.kroeger@mpimet.mpg.de
    creation_date:         2025-07-08T10:51:45
    data_specs_version:    01.00.32
    experiment:            coupled future 2015-2050 using a scenario as close...
    ...                    ...
    source_id:             ICON-ESM-ER
    source_type:           AOGCM
    sub_experiment:        none
    sub_experiment_id:     none
    variant_label:         r1i1p1f1
    version_id:            v20240618

Why interp() is not sufficient here#

The variables we want to interpolate (e.g. temperature and wind components) have dimensions:

ds.ta.dims
('time', 'height', 'ncells')

The atmospheric pressure (pfull) also varies along time, height, and ncells. This means:

  • pfull is not a 1D coordinate

  • xarray.interp() cannot be used directly

Conceptually, what we want is:

For every (time, ncells) column, interpolate the vertical profile independently.

To do this efficiently and in a Dask-compatible way, we use xarray.apply_ufunc().

Defining a 1D interpolation kernel#

We first define a small NumPy-based helper function that interpolates one vertical profile onto new target levels. This function operates purely on NumPy arrays and knows nothing about xarray or Dask. xarray.apply_ufunc() will take care of vectorizing it over the remaining dimensions.

import numpy as np


def interp_1d(y, x, x_new):
    return np.interp(x_new, x, y, left=np.nan, right=np.nan)

Target pressure levels#

Next, we define the pressure levels onto which we want to interpolate the data. These levels are fixed and shared across all columns.

p_levels = 100 * np.array(
    [
        1000, 975, 950, 925, 900, 875, 850, 800, 750, 700, 600,
        500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10, 5, 1
    ]
)

Vectorized vertical interpolation with apply_ufunc()#

We now apply the interpolation function to the dataset.

Key points to note:

  • input_core_dims specifies that interpolation happens along height

  • vectorize=True applies the function independently over time and ncells

  • dask="parallelized" enables lazy, parallel execution

  • We lazily interpolate the whole dataset in one call

import xarray as xr


ds_plev = xr.apply_ufunc(
    interp_1d,
    ds,
    ds.pfull,
    p_levels,
    input_core_dims=[["height"], ["height"], ["plev"]],
    output_core_dims=[["plev"]],
    vectorize=True,
    dask="parallelized",
).assign_coords(plev=p_levels)

ds_plev
<xarray.Dataset> Size: 17GB
Dimensions:             (time: 4, ncells: 5242880, plev: 25)
Coordinates:
  * time                (time) datetime64[ns] 32B 2026-01-02 ... 2026-01-05
  * plev                (plev) int64 200B 100000 97500 95000 ... 1000 500 100
    cell_sea_land_mask  (ncells) int32 21MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lat                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
    lon                 (ncells) float64 42MB dask.array<chunksize=(5242880,), meta=np.ndarray>
Dimensions without coordinates: ncells
Data variables:
    pfull               (time, ncells, plev) float64 4GB dask.array<chunksize=(1, 5242880, 25), meta=np.ndarray>
    ta                  (time, ncells, plev) float64 4GB dask.array<chunksize=(1, 5242880, 25), meta=np.ndarray>
    ua                  (time, ncells, plev) float64 4GB dask.array<chunksize=(1, 5242880, 25), meta=np.ndarray>
    va                  (time, ncells, plev) float64 4GB dask.array<chunksize=(1, 5242880, 25), meta=np.ndarray>
Attributes: (12/31)
    Conventions:           CF-1.7 CMIP-6.2
    activity_id:           HighResMIP
    contact:               juergen.kroeger@mpimet.mpg.de
    creation_date:         2025-07-08T10:51:45
    data_specs_version:    01.00.32
    experiment:            coupled future 2015-2050 using a scenario as close...
    ...                    ...
    source_id:             ICON-ESM-ER
    source_type:           AOGCM
    sub_experiment:        none
    sub_experiment_id:     none
    variant_label:         r1i1p1f1
    version_id:            v20240618

Memory footprint

Vertical interpolation requires loading entire vertical columns at once. If the data is also stored in large horizontal chunks (for example, full global fields), this can result in very high memory usage.

Inspecting the result#

As a quick sanity check, we plot the globally averaged temperature profile for a single time slice.

ds_plev.ta.sel(time="2026-01-02").mean("ncells").plot(y="plev", yincrease=False)
[<matplotlib.lines.Line2D at 0x7fe3b07b0910>]
../_images/67f10fc7a712a033925d989cf5d554040541370a1a72410c04648db469f0f86a.png

Finally, we compute wind speed from the wind components and visualize it at 500 hPa.

from easygems.resample import KDTreeResampler
from easygems.show import map_show


ds_plev = ds_plev.assign(wind=lambda dx: np.hypot(dx.ua, dx.va))

kd_resampler = KDTreeResampler(lat=ds_plev.lat, lon=ds_plev.lon)
map_show(ds_plev.wind.sel(time="2026-01-02", plev=200e2).values, kd_resampler)
<matplotlib.image.AxesImage at 0x7fe3a8b76510>
../_images/6088fdbf85b06d70b0a2c1736f06307d1bc2bd34e813c51cec6ce6bf9d735173.png

Takeaway#

When a “coordinate” varies along the same dimension it indexes, it must be treated as data, not as an xarray coordinate. In these situations, xr.apply_ufunc() provides a powerful and scalable way to express column-wise operations such as vertical interpolation—without explicit Python loops and fully compatible with Dask.