{ "cells": [ { "cell_type": "markdown", "id": "360075b1-2ac9-4ca1-9615-403e5481d250", "metadata": {}, "source": [ "# Calculating zonal means" ] }, { "cell_type": "code", "execution_count": 1, "id": "ec783093-5325-45d2-8712-a741e8faa220", "metadata": {}, "outputs": [], "source": [ "import intake\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "from gridlocator import merge_grid" ] }, { "cell_type": "markdown", "id": "de341c08-79f1-4a4c-a93e-997fbf8e87cc", "metadata": {}, "source": [ "In this example we will compute the zonal-mean air temperature.\n", "We retrieve the 2m-air-temperature from the `dpp0067` NextGEMS simulation using `intake-esm`." ] }, { "cell_type": "code", "execution_count": 2, "id": "06aee1d5-5f60-4586-9b10-87a5315164b3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--> The keys in the returned dictionary of datasets are constructed as follows:\n", "\t'project.institution_id.source_id.experiment_id.simulation_id.realm.frequency.time_reduction.grid_label.level_type'\n" ] }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " \n", " 100.00% [1/1 00:00<00:00]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "catalog_file = \"/work/ka1081/Catalogs/dyamond-nextgems.json\"\n", "col = intake.open_esm_datastore(catalog_file)\n", "cat = col.search(\n", " variable_id=\"tas\",\n", " project=\"NextGEMS\",\n", " simulation_id=\"dpp0067\",\n", ")\n", "cat_dict = cat.to_dataset_dict(cdf_kwargs={\"chunks\": {\"time\": 1}})\n", "\n", "ds = merge_grid(list(cat_dict.values())[0]) # Include the grid information!" ] }, { "cell_type": "markdown", "id": "d844c228-de01-4188-8a39-e082385a8843", "metadata": {}, "source": [ "## The general idea\n", "behind the zonal average is to calculate a weighted average of values in a certain latitude bin.\n", "Therefore, in a first step, we count how many cells are in given equidistant latitude bins." ] }, { "cell_type": "code", "execution_count": 3, "id": "a8430e1a-0f1b-4148-b15f-4dac307fbe86", "metadata": {}, "outputs": [], "source": [ "hist_opts = dict(bins=128, range=(-np.pi / 2, np.pi / 2))\n", "cells_per_bin, lat_bins = np.histogram(ds.clat, **hist_opts)" ] }, { "cell_type": "markdown", "id": "a10c6bda-54c4-491c-9925-5b308ddfb3ff", "metadata": {}, "source": [ "Now comes the trick! In a next step, we will repeat the histogram but account a weight to each cell.\n", "Usually, `histogram` will weight each data point with one, i.e. it will count the values in a certain bin.\n", "Here, we will weight each data point with the 2m-temperature. Thereby, we will compute the cumulative sum of all temperatures in a given latitude bin.\n", "\n", ".. tip::\n", " The `np.histogram` function is more efficient when passing a range and a number of bins.\n", " This is because, when constructing the bins internally, the function can assume equidistant bin sizes.\n", " This is not the case when passing a sequence of bins directly." ] }, { "cell_type": "code", "execution_count": 4, "id": "5bbb275b-3389-4008-b37b-22fb37b942c1", "metadata": {}, "outputs": [], "source": [ "varsum_per_bin, _ = np.histogram(\n", " ds.clat, weights=ds.tas.isel(time=1, height_2=0), **hist_opts\n", ")" ] }, { "cell_type": "markdown", "id": "5d43cca2-88b3-4d44-8d7c-08d65ffd0a07", "metadata": {}, "source": [ "The zonal mean can now be computed by dividing the cumulative values of the temperature with the number of cells in each bin." ] }, { "cell_type": "code", "execution_count": 5, "id": "70c53f39-1422-48eb-9a78-aad2b8f83d9f", "metadata": {}, "outputs": [], "source": [ "zonal_mean = varsum_per_bin / cells_per_bin" ] }, { "cell_type": "markdown", "id": "9ad77d8e-2bea-4bb2-a813-c21e2efadae3", "metadata": {}, "source": [ "We can check our result by plotting the zonal mean as a function of the latitdue bins.\n", "While doing so, we will scale the bins by their area so that the visual appearance of each latitude bin represents their actual proportion." ] }, { "cell_type": "code", "execution_count": 6, "id": "718e97e5-2ada-480b-b4d7-3df5e41e705e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'latitude')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "ax.plot(0.5 * (lat_bins[1:] + lat_bins[:-1]), zonal_mean)\n", "ax.set_ylabel(\"tas / K\")\n", "ax.set_ylim(270, 305)\n", "\n", "# Scale the x-axis to account for differences in area with latitude.\n", "ax.set_xscale(\"function\", functions=(lambda d: np.sin(d), lambda d: np.arcsin(d)))\n", "ax.set_xlim(np.deg2rad(-80), np.deg2rad(80))\n", "ax.set_xlabel(\"latitude\")" ] }, { "cell_type": "markdown", "id": "2e87d916-0df4-4086-b8a8-b861adbb6edb", "metadata": {}, "source": [ "## Multi-dimensional input\n", "\n", "Until now, we used data at a single time step to illustrate the general idea of calculating zonal means by using histograms.\n", "However, most real-world data has several other dimensions like time or height.\n", "A straight-forward way to calculate zonal means for this kind of data is to unravel it, i.e., to get rid off every dimensions.\n", "This approach, however, will loose all information of the thrown away axes; data at different heights or times will be mixed.\n", "Fortunately, there are alternatives that allow us to calculate zonal means while maintaining the dimensional structure of our dataset.\n", "We achieve this by using `xr.apply_ufunc` which lifts a function (in this case `_compute_varsum`) from (numpy) arrays to (xarray) DataArrays.\n", "This lifting into the world of DataArrays involves describing the dimensions, shapes and data types which the function cares about. Afterwards, xarray applies the usual looping and broadcasting rules over the dimensions the functions does **not** care about. Any necessary looping may then be carried out in parallel (e.g. using dask)." ] }, { "cell_type": "code", "execution_count": 10, "id": "1ce575fa-e642-46a1-8a95-01b86a45a90d", "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "\n", "\n", "def calc_zonal_mean(variable, **kwargs):\n", " \"\"\"Compute a zonal-mean (along `clat`) for multi-dimensional input.\"\"\"\n", " counts_per_bin, bin_edges = np.histogram(variable.clat, **hist_opts)\n", "\n", " def _compute_varsum(var, **kwargs):\n", " \"\"\"Helper function to compute histogram for a single timestep.\"\"\"\n", " varsum_per_bin, _ = np.histogram(variable.clat, weights=var, **kwargs)\n", " return varsum_per_bin\n", "\n", " # For more information see:\n", " # https://docs.xarray.dev/en/stable/generated/xarray.apply_ufunc.html\n", " varsum = xr.apply_ufunc(\n", " _compute_varsum, # function to map\n", " variable, # variables to loop over\n", " kwargs=hist_opts, # keyword arguments passed to the function\n", " input_core_dims=[[\"cell\"]], # dimensions that should not be kept\n", " # Description of the output dataset\n", " dask=\"parallelized\",\n", " vectorize=True,\n", " output_core_dims=[(\"lat\",)],\n", " dask_gufunc_kwargs={\n", " \"output_sizes\": {\"lat\": hist_opts[\"bins\"]},\n", " },\n", " output_dtypes=[\"f8\"],\n", " )\n", "\n", " return varsum / counts_per_bin, bin_edges" ] }, { "cell_type": "markdown", "id": "35e6f478-b217-48c2-89b6-c6d5a74abae4", "metadata": {}, "source": [ "Using this function we can calculate the zonal means along the time dimension." ] }, { "cell_type": "code", "execution_count": 11, "id": "0e6d8689-6154-4498-8fd4-2eb37d343884", "metadata": {}, "outputs": [], "source": [ "zonal_means, lat_bins = calc_zonal_mean(\n", " ds.tas.isel(time=slice(24, None, 48), height_2=0), **hist_opts\n", ")" ] }, { "cell_type": "markdown", "id": "a79344f8-00ca-4fb8-8166-63fb0428b1ba", "metadata": {}, "source": [ "We can now either plot the zonal means for individual timesteps or the whole dataset." ] }, { "cell_type": "code", "execution_count": 12, "id": "3d7a4f19-b6f4-4fdc-bc18-8834b88331e8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'latitude')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "for zonal_mean in zonal_means:\n", " ax.plot(0.5 * (lat_bins[1:] + lat_bins[:-1]), zonal_mean)\n", "ax.plot(0.5 * (lat_bins[1:] + lat_bins[:-1]), zonal_means.mean(\"time\"), lw=3, c=\"k\")\n", "ax.set_ylabel(\"tas / K\")\n", "ax.set_ylim(270, 305)\n", "\n", "# Scale the x-axis to account for differences in area with latitude.\n", "ax.set_xscale(\"function\", functions=(lambda d: np.sin(d), lambda d: np.arcsin(d)))\n", "ax.set_xlim(np.deg2rad(-80), np.deg2rad(80))\n", "ax.set_xlabel(\"latitude\")" ] }, { "cell_type": "markdown", "id": "6b3df69e-d608-4366-a278-e1fbb398ecf2", "metadata": {}, "source": [ ".. tip::\n", " The concept of using histograms to compute zonal means can also be generalized across other dimensions,\n", " i.e., a meridional mean or to compute distributions in temperature or humidity space." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (based on the module python3/2022.01)", "language": "python", "name": "python3_2022_01" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" } }, "nbformat": 4, "nbformat_minor": 5 }