{ "cells": [ { "cell_type": "markdown", "id": "dddb64f8-7c77-4367-81e9-10467c1e7f9a", "metadata": { "tags": [] }, "source": [ "# EERIE Data at DKRZ\n", "\n", "This notebook guides EERIE data users and explains how to find and load data available at DKRZ.\n", "\n", "The notebook works well within the `python3/unstable` kernel." ] }, { "cell_type": "markdown", "id": "38c3c860-fdd2-4f10-8f31-6195518e1706", "metadata": {}, "source": [ "All data relevant for the project is referenced in the main DKRZ-EERIE Catalog:" ] }, { "cell_type": "code", "execution_count": 1, "id": "ea840052-f632-4a98-9770-c7d32fbe6c8d", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/yaml": "eerie:\n args:\n path: https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/eerie.yaml\n description: ''\n driver: intake.catalog.local.YAMLFileCatalog\n metadata: {}\n", "text/plain": [ "eerie:\n", " args:\n", " path: https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/eerie.yaml\n", " description: ''\n", " driver: intake.catalog.local.YAMLFileCatalog\n", " metadata: {}\n" ] }, "metadata": { "application/json": { "root": "eerie" } }, "output_type": "display_data" } ], "source": [ "import intake\n", "\n", "eerie_cat = intake.open_catalog(\n", " \"https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/eerie.yaml\"\n", ")\n", "eerie_cat" ] }, { "cell_type": "markdown", "id": "29053804-b345-4d7a-afbe-167337313929", "metadata": {}, "source": [ "We use a catalog reference syntax, i.e. a path name template:\n", "\n", "**hpc.hardware.product.source_id.experiment_id.realm.grid_type**\n", "\n", "Opened with python, the catalog is a nested dictionary of *catalog sources*. The lowest level will finally contain *data sources* which can be opened as xarray datasets with `to_dask()`.\n", "\n", "You can browse through the catalog by `list`ing the catalog and selecting keys:" ] }, { "cell_type": "code", "execution_count": 2, "id": "7d25524e-85c5-413e-9640-cb0b878f59cd", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['jasmin', 'dkrz']\n", "['disk', 'archive', 'cloud', 'main', 'dkrz_ngc3']\n" ] } ], "source": [ "print(list(eerie_cat))\n", "print(list(eerie_cat[\"dkrz\"]))" ] }, { "cell_type": "markdown", "id": "949423e2-fa28-4b50-b51f-47ad075d50cf", "metadata": {}, "source": [ "Entries can be *join*ed with a *'.'* so that you can access deeper level entries from the highest catalog level:" ] }, { "cell_type": "code", "execution_count": 3, "id": "39be6811-a816-432e-b016-c503cbaeee58", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/yaml": "disk:\n args:\n path: https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/dkrz/disk/main.yaml\n description: Use this catalog if you are working on Levante. This catalog contains\n datasets for all raw data in /work/bm1344 and accesses the data via kerchunks\n in /work/bm1344/DKRZ/kerchunks.\n driver: intake.catalog.local.YAMLFileCatalog\n metadata:\n catalog_dir: https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/dkrz\n", "text/plain": [ "disk:\n", " args:\n", " path: https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/dkrz/disk/main.yaml\n", " description: Use this catalog if you are working on Levante. This catalog contains\n", " datasets for all raw data in /work/bm1344 and accesses the data via kerchunks\n", " in /work/bm1344/DKRZ/kerchunks.\n", " driver: intake.catalog.local.YAMLFileCatalog\n", " metadata:\n", " catalog_dir: https://raw.githubusercontent.com/eerie-project/intake_catalogues/main/dkrz\n" ] }, "metadata": { "application/json": { "root": "disk" } }, "output_type": "display_data" } ], "source": [ "eerie_cat[\"dkrz.disk\"]" ] }, { "cell_type": "markdown", "id": "1bc397f4-d02c-4299-8e7f-6e43eb95913c", "metadata": {}, "source": [ "Note that there is the autocompletion feature catalogs when pushing *tab*." ] }, { "cell_type": "markdown", "id": "da7ccc06-cd24-45fe-a51b-7d8239138bfc", "metadata": {}, "source": [ "For model-output stored on DKRZ's disk,\n", "you can get a table-like overview from a *\"data base\"* csv file opened with pandas:" ] }, { "cell_type": "code", "execution_count": 4, "id": "11db4baa-30c1-4133-9a96-0b244fd5a819", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/sw/spack-levante/mambaforge-23.1.0-1-Linux-x86_64-3boc6i/lib/python3.10/site-packages/dask/dataframe/io/csv.py:542: UserWarning: Warning gzip compression does not support breaking apart files\n", "Please ensure that each individual file can fit in memory and\n", "use the keyword ``blocksize=None to remove this message``\n", "Setting ``blocksize=None``\n", " warn(\n" ] }, { "data": { "text/html": [ "
\n", " | format | \n", "grid_id | \n", "member_id | \n", "institution_id | \n", "institution | \n", "references | \n", "simulation_id | \n", "variable-long_names | \n", "variables | \n", "source_id | \n", "experiment_id | \n", "realm | \n", "grid_lable | \n", "aggregation | \n", "urlpath | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "zarr | \n", "5aff0578-9bd9-11e8-8e4a-af3d880818e6 | \n", "r1i1p1f1 | \n", "MPI-M | \n", "Max Planck Institute for Meteorology/Deutscher... | \n", "see MPIM/DWD publications | \n", "erc1011 | \n", "['10m windspeed', 'temperature in 2m'] | \n", "['sfcwind', 'tas'] | \n", "icon-esm-er | \n", "eerie-control-1950 | \n", "atmos | \n", "gr025 | \n", "2d_daily_max | \n", "reference:://work/bm1344/DKRZ/kerchunks_batche... | \n", "
1 | \n", "zarr | \n", "5aff0578-9bd9-11e8-8e4a-af3d880818e6 | \n", "r1i1p1f1 | \n", "MPI-M | \n", "Max Planck Institute for Meteorology/Deutscher... | \n", "see MPIM/DWD publications | \n", "erc1011 | \n", "['total cloud cover', 'dew point temperature i... | \n", "['clt', 'dew2', 'evspsbl', 'hfls', 'hfss', 'pr... | \n", "icon-esm-er | \n", "eerie-control-1950 | \n", "atmos | \n", "gr025 | \n", "2d_daily_mean | \n", "reference:://work/bm1344/DKRZ/kerchunks_batche... | \n", "
2 | \n", "zarr | \n", "5aff0578-9bd9-11e8-8e4a-af3d880818e6 | \n", "r1i1p1f1 | \n", "MPI-M | \n", "Max Planck Institute for Meteorology/Deutscher... | \n", "see MPIM/DWD publications | \n", "erc1011 | \n", "['temperature in 2m'] | \n", "['tas'] | \n", "icon-esm-er | \n", "eerie-control-1950 | \n", "atmos | \n", "gr025 | \n", "2d_daily_min | \n", "reference:://work/bm1344/DKRZ/kerchunks_batche... | \n", "
3 | \n", "zarr | \n", "5aff0578-9bd9-11e8-8e4a-af3d880818e6 | \n", "r1i1p1f1 | \n", "MPI-M | \n", "Max Planck Institute for Meteorology/Deutscher... | \n", "see MPIM/DWD publications | \n", "erc1011 | \n", "['vertically integrated cloud ice', 'verticall... | \n", "['clivi', 'cllvi', 'clt', 'dew2', 'evspsbl', '... | \n", "icon-esm-er | \n", "eerie-control-1950 | \n", "atmos | \n", "gr025 | \n", "2d_monthly_mean | \n", "reference:://work/bm1344/DKRZ/kerchunks_batche... | \n", "
4 | \n", "zarr | \n", "5aff0578-9bd9-11e8-8e4a-af3d880818e6 | \n", "r1i1p1f1 | \n", "MPI-M | \n", "Max Planck Institute for Meteorology/Deutscher... | \n", "see MPIM/DWD publications | \n", "erc1011 | \n", "['specific cloud ice content', 'specific cloud... | \n", "['cli', 'clw', 'gpsm', 'height_bnds', 'hus', '... | \n", "icon-esm-er | \n", "eerie-control-1950 | \n", "atmos | \n", "gr025 | \n", "model-level_monthly_mean | \n", "reference:://work/bm1344/DKRZ/kerchunks_batche... | \n", "
dkrz-catalogue catalog with 110 dataset(s) from 112 asset(s):
\n", " | unique | \n", "
---|---|
format | \n", "3 | \n", "
grid_id | \n", "3 | \n", "
member_id | \n", "1 | \n", "
institution_id | \n", "1 | \n", "
institution | \n", "2 | \n", "
references | \n", "1 | \n", "
simulation_id | \n", "3 | \n", "
variable-long_names | \n", "54 | \n", "
variables | \n", "57 | \n", "
source_id | \n", "6 | \n", "
experiment_id | \n", "9 | \n", "
realm | \n", "3 | \n", "
grid_lable | \n", "2 | \n", "
aggregation | \n", "51 | \n", "
urlpath | \n", "112 | \n", "
derived_variables | \n", "0 | \n", "
dkrz-catalogue catalog with 1 dataset(s) from 1 asset(s):
\n", " | unique | \n", "
---|---|
format | \n", "1 | \n", "
grid_id | \n", "1 | \n", "
member_id | \n", "1 | \n", "
institution_id | \n", "1 | \n", "
institution | \n", "1 | \n", "
references | \n", "1 | \n", "
simulation_id | \n", "1 | \n", "
variable-long_names | \n", "1 | \n", "
variables | \n", "1 | \n", "
source_id | \n", "1 | \n", "
experiment_id | \n", "1 | \n", "
realm | \n", "1 | \n", "
grid_lable | \n", "1 | \n", "
aggregation | \n", "1 | \n", "
urlpath | \n", "1 | \n", "
derived_variables | \n", "0 | \n", "