{ "cells": [ { "cell_type": "markdown", "id": "86a08059-c833-46a9-afd0-35e450f65087", "metadata": {}, "source": [ "# Loading data from the catalog" ] }, { "cell_type": "markdown", "id": "56ac911d-5c95-4e7c-8c7d-99be25fa378f", "metadata": { "incorrectly_encoded_metadata": "jp-MarkdownHeadingCollapsed=true", "tags": [] }, "source": [ "## Long story short:\n", "```\n", "import intake\n", "try:\n", " import outtake\n", "except:\n", " import sys\n", " print (\"\"\"Could not load outtake - tape downloads might not work. Try adding\n", "\n", "module use /work/k20200/k202134/hsm-tools/outtake/module\n", "module load hsm-tools/unstable\n", "\n", "to your ~./kernel_env file\"\"\", file=sys.stderr)\n", "\n", "\n", "catalog_file = \"/work/ka1081/Catalogs/dyamond-nextgems.json\" # nextGEMS and DYAMOND Winter\n", "cat = intake.open_esm_datastore(catalog_file)\n", "hits = cat.search(simulation_id=\"ngc2009\", variable_id=\"tas\", frequency=\"30minute\")\n", "dataset_dict = hits.to_dataset_dict(cdf_kwargs={\"chunks\": {\"time\": 1}})\n", "keys = list(dataset_dict.keys())\n", "dataset = dataset_dict[keys[0]]\n", "dataset.tas.isel(time=1).max().values\n", "\n", "# use get_from_cat from below to search a catalog\n", "```" ] }, { "cell_type": "markdown", "id": "56f31167-b732-4b5f-9e88-fca82fbf401f", "metadata": { "tags": [] }, "source": [ "## Loading the catalog" ] }, { "cell_type": "markdown", "id": "ec278650-c70a-416a-93c5-0ffb0da5ef0e", "metadata": {}, "source": [ "The [intake-esm package](https://intake-esm.readthedocs.io/en/stable/) provides a tool to access big amounts of data, without having to worry about where it comes from. We will give you a short overview of how to do use the catalog to your advantage.\n", "The root of the intake catalog, is a '.json' file." ] }, { "cell_type": "code", "execution_count": 1, "id": "203d720d-31a9-4af1-9808-91534508aa57", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "pd.set_option(\"max_colwidth\", None) # makes the tables render better\n", "\n", "import intake\n", "\n", "try:\n", " import outtake\n", "except:\n", " import sys\n", "\n", " print(\n", " \"\"\"Could not load outtake - tape downloads might not work. Try adding\n", " \n", "module use /work/k20200/k202134/hsm-tools/outtake/module\n", "module load hsm-tools/unstable\n", "\n", "to your ~./kernel_env file\"\"\",\n", " file=sys.stderr,\n", " )\n", "\n", "\n", "def get_from_cat(catalog, columns):\n", " \"\"\"A helper function for inspecting an intake catalog.\n", "\n", " Call with the catalog to be inspected and a list of columns of interest.\"\"\"\n", " import pandas as pd\n", "\n", " pd.set_option(\"max_colwidth\", None) # makes the tables render better\n", "\n", " if type(columns) == type(\"\"):\n", " columns = [columns]\n", " return (\n", " catalog.df[columns]\n", " .drop_duplicates()\n", " .sort_values(columns)\n", " .reset_index(drop=True)\n", " )" ] }, { "cell_type": "code", "execution_count": 2, "id": "7cedb06f-3243-4fce-ac8c-a033aeff5c91", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
/work/k20200/k202134/Catalogs/dng-merged catalog with 167 dataset(s) from 120310 asset(s):
\n", " | unique | \n", "
---|---|
variable_id | \n", "643 | \n", "
project | \n", "2 | \n", "
institution_id | \n", "13 | \n", "
source_id | \n", "21 | \n", "
experiment_id | \n", "5 | \n", "
simulation_id | \n", "16 | \n", "
realm | \n", "6 | \n", "
frequency | \n", "16 | \n", "
time_reduction | \n", "5 | \n", "
grid_label | \n", "11 | \n", "
level_type | \n", "6 | \n", "
time_min | \n", "3153 | \n", "
time_max | \n", "7000 | \n", "
grid_id | \n", "16 | \n", "
format | \n", "2 | \n", "
uri | \n", "120044 | \n", "
\n", " | variable_id | \n", "project | \n", "institution_id | \n", "source_id | \n", "experiment_id | \n", "simulation_id | \n", "realm | \n", "frequency | \n", "time_reduction | \n", "grid_label | \n", "level_type | \n", "time_min | \n", "time_max | \n", "grid_id | \n", "format | \n", "uri | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "(c, l, i, v, i) | \n", "DYAMOND_WINTER | \n", "CAMS | \n", "GRIST-5km | \n", "DW-ATM | \n", "r1i1p1f1 | \n", "atmos | \n", "15min | \n", "unkonwn | \n", "gn | \n", "2d | \n", "2020-01-20T00:00:00.000 | \n", "2020-01-20T23:45:00.000 | \n", "not_implemented | \n", "netcdf | \n", "/work/ka1081/DYAMOND_WINTER/CAMS/GRIST-5km/DW-ATM/atmos/15min/clivi/r1i1p1f1/2d/gn/clivi_15min_GRIST-5km_DW-ATM_r1i1p1f1_2d_gn_20200120000000-20200120234500.nc | \n", "
1 | \n", "(c, l, t) | \n", "DYAMOND_WINTER | \n", "CAMS | \n", "GRIST-5km | \n", "DW-ATM | \n", "r1i1p1f1 | \n", "atmos | \n", "15min | \n", "unkonwn | \n", "gn | \n", "2d | \n", "2020-01-20T00:00:00.000 | \n", "2020-01-20T23:45:00.000 | \n", "not_implemented | \n", "netcdf | \n", "/work/ka1081/DYAMOND_WINTER/CAMS/GRIST-5km/DW-ATM/atmos/15min/clt/r1i1p1f1/2d/gn/clt_15min_GRIST-5km_DW-ATM_r1i1p1f1_2d_gn_20200120000000-20200120234500.nc | \n", "
\n", " | project | \n", "experiment_id | \n", "source_id | \n", "simulation_id | \n", "
---|---|---|---|---|
0 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "ARPEGE-NH-2km | \n", "r1i1p1f1 | \n", "
1 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "GEM | \n", "r1i1p1f1 | \n", "
2 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "GEOS-1km | \n", "r1i1p1f1 | \n", "
3 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "GEOS-3km | \n", "r1i1p1f1 | \n", "
4 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "GRIST-5km | \n", "r1i1p1f1 | \n", "
5 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "ICON-NWP-2km | \n", "r1i1p1f1 | \n", "
6 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "ICON-SAP-5km | \n", "dpp0014 | \n", "
7 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "MPAS-3km | \n", "r1i1p1f1 | \n", "
8 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "SCREAM-3km | \n", "r1i1p1f1 | \n", "
9 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "SHiELD-3km | \n", "r1i1p1f1 | \n", "
10 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "UM-5km | \n", "r1i1p1f1 | \n", "
11 | \n", "DYAMOND_WINTER | \n", "DW-ATM | \n", "gSAM-4km | \n", "r1i1p1f1 | \n", "
12 | \n", "DYAMOND_WINTER | \n", "DW-CPL | \n", "GEOS-6km | \n", "r1i1p1f1 | \n", "
13 | \n", "DYAMOND_WINTER | \n", "DW-CPL | \n", "ICON-SAP-5km | \n", "dpp0029 | \n", "
14 | \n", "DYAMOND_WINTER | \n", "DW-CPL | \n", "ICON-SAP-5km | \n", "r1i1p1f1 | \n", "
15 | \n", "DYAMOND_WINTER | \n", "DW-CPL | \n", "IFS-4km | \n", "r1i1p1f1 | \n", "
16 | \n", "DYAMOND_WINTER | \n", "DW-CPL | \n", "IFS-9km | \n", "r1i1p1f1 | \n", "
17 | \n", "nextGEMS | \n", "Cycle1 | \n", "IFS-FESOM2-4km | \n", "hlq0 | \n", "
18 | \n", "nextGEMS | \n", "Cycle1 | \n", "IFS-NEMO-4km | \n", "hmrt | \n", "
19 | \n", "nextGEMS | \n", "Cycle1 | \n", "IFS-NEMO-9km | \n", "hmt0 | \n", "
20 | \n", "nextGEMS | \n", "Cycle1 | \n", "IFS-NEMO-DEEPon-4km | \n", "hmwz | \n", "
21 | \n", "nextGEMS | \n", "Cycle2-alpha | \n", "ICON-ESM | \n", "dpp0066 | \n", "
22 | \n", "nextGEMS | \n", "Cycle2-alpha | \n", "ICON-ESM | \n", "dpp0067 | \n", "
23 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "ICON-ESM | \n", "ngc2009 | \n", "
24 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "ICON-ESM | \n", "ngc2012 | \n", "
25 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "ICON-ESM | \n", "ngc2013 | \n", "
26 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "IFS-FESOM | \n", "HQYS | \n", "
27 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "IFS-FESOM | \n", "HR0N | \n", "
28 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "IFS-FESOM | \n", "HR2N | \n", "
29 | \n", "nextGEMS | \n", "nextgems_cycle2 | \n", "IFS-FESOM | \n", "HR2N_nodeep | \n", "
\n", " | realm | \n", "frequency | \n", "variable_id | \n", "
---|---|---|---|
0 | \n", "atm | \n", "1day | \n", "(clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
1 | \n", "atm | \n", "1day | \n", "(psl, clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
2 | \n", "atm | \n", "1month | \n", "(sfcwind, clivi, cllvi, cptgzvi, hfls, hfss, prlr, pr, prw, qgvi, qrvi, qsvi, rlds, rlus, rlut, rsds, rsdt, rsus, rsut, tauu, tauv, rpds_dir, rpds_dif, rvds_dif, rnds_dif) | \n", "
3 | \n", "atm | \n", "1month | \n", "(sfcwind, clivi, cllvi, cptgzvi, hfls, hfss, prlr, pr, prw, qgvi, qrvi, qsvi, rlds, rlus, rlut, rsds, rsdt, rsus, rsut, tauu, tauv, rpds_dir, rpds_dif, rvds_dif, rnds_dif, clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
4 | \n", "atm | \n", "1month | \n", "(sfcwind, clivi, cllvi, cptgzvi, hfls, hfss, prlr, pr, prw, qgvi, qrvi, qsvi, rlds, rlus, rlut, rsds, rsdt, rsus, rsut, tauu, tauv, rpds_dir, rpds_dif, rvds_dif, rnds_dif, psl, clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
5 | \n", "atm | \n", "1month | \n", "(ua, va, wa, ta, hus, rho, clw, cli, pfull, zghalf, zg, dzghalf) | \n", "
6 | \n", "atm | \n", "2hour | \n", "(phalf,) | \n", "
7 | \n", "atm | \n", "2minute | \n", "(fc, frland, hsurf, p, rnds_dif, rnds_dir, rsds, rvds_dif, rvds_dir, soiltype, t, u, v, w) | \n", "
8 | \n", "atm | \n", "30minute | \n", "(hydro_canopy_cond_limited_box, hydro_w_snow_box, hydro_snow_soil_dens_box) | \n", "
9 | \n", "atm | \n", "30minute | \n", "(hydro_discharge_ocean_box, hydro_drainage_box, hydro_runoff_box, hydro_transpiration_box, sse_grnd_hflx_old_box) | \n", "
10 | \n", "atm | \n", "30minute | \n", "(psl, ps, sit, sic, tas, ts, uas, vas, cfh_lnd) | \n", "
11 | \n", "atm | \n", "30minute | \n", "(sfcwind, clivi, cllvi, cptgzvi, hfls, hfss, prlr, pr, prw, qgvi, qrvi, qsvi, rlds, rlus, rlut, rsds, rsdt, rsus, rsut, tauu, tauv, rpds_dir, rpds_dif, rvds_dif, rnds_dif) | \n", "
12 | \n", "atm | \n", "6hour | \n", "(clw, cli, pfull) | \n", "
13 | \n", "atm | \n", "6hour | \n", "(hydro_w_soil_sl_box, hydro_w_ice_sl_box, sse_t_soil_sl_box) | \n", "
14 | \n", "atm | \n", "6hour | \n", "(ta, hus, rho) | \n", "
15 | \n", "atm | \n", "6hour | \n", "(ta, ua, va, clw, hus, zfull, cli, pv) | \n", "
16 | \n", "atm | \n", "6hour | \n", "(tas_gmean, rsdt_gmean, rsut_gmean, rlut_gmean, radtop_gmean, prec_gmean, evap_gmean, fwfoce_gmean) | \n", "
17 | \n", "atm | \n", "6hour | \n", "(ua, va, wa) | \n", "
18 | \n", "atm | \n", "fx | \n", "(zghalf, zg, dzghalf) | \n", "
19 | \n", "lnd | \n", "1month | \n", "(hydro_discharge_ocean_box, hydro_drainage_box, hydro_runoff_box, hydro_transpiration_box, sse_grnd_hflx_old_box, hydro_canopy_cond_limited_box, hydro_w_snow_box, hydro_snow_soil_dens_box, hydro_w_soil_sl_box, hydro_w_ice_sl_box, sse_t_soil_sl_box) | \n", "
20 | \n", "oce | \n", "1day | \n", "(atlantic_hfbasin, atlantic_hfl, atlantic_moc, atlantic_sltbasin, atlantic_wfl, global_hfbasin, global_hfl, global_moc, global_sltbasin, global_wfl, pacific_hfbasin, pacific_hfl, pacific_moc, pacific_sltbasin, pacific_wfl) | \n", "
21 | \n", "oce | \n", "1day | \n", "(atmos_fluxes_FrshFlux_Evaporation, atmos_fluxes_FrshFlux_Precipitation, atmos_fluxes_FrshFlux_Runoff, atmos_fluxes_FrshFlux_SnowFall, atmos_fluxes_HeatFlux_Latent, atmos_fluxes_HeatFlux_LongWave, atmos_fluxes_HeatFlux_Sensible, atmos_fluxes_HeatFlux_ShortWave, atmos_fluxes_HeatFlux_Total, atmos_fluxes_stress_x, atmos_fluxes_stress_xw, atmos_fluxes_stress_y, atmos_fluxes_stress_yw, conc, heat_content_seaice, heat_content_snow, heat_content_total, hi, hs, ice_u, ice_v, mlotst, Qbot, Qtop, sea_level_pressure, stretch_c, zos, verticallyTotal_mass_flux_e, Wind_Speed_10m) | \n", "
22 | \n", "oce | \n", "1day | \n", "(so, tke, to, u, v, w, A_tracer_v_to, A_veloc_v, heat_content_liquid_water) | \n", "
23 | \n", "oce | \n", "1hour | \n", "(atmos_fluxes_FrshFlux_Evaporation, atmos_fluxes_FrshFlux_Precipitation, atmos_fluxes_FrshFlux_Runoff, atmos_fluxes_FrshFlux_SnowFall, atmos_fluxes_HeatFlux_Latent, atmos_fluxes_HeatFlux_LongWave, atmos_fluxes_HeatFlux_Sensible, atmos_fluxes_HeatFlux_ShortWave, atmos_fluxes_HeatFlux_Total, atmos_fluxes_stress_x, atmos_fluxes_stress_xw, atmos_fluxes_stress_y, atmos_fluxes_stress_yw, Qbot, Qtop) | \n", "
24 | \n", "oce | \n", "1hour | \n", "(so, to, u, v, conc, hi, hs, ice_u, ice_v, mlotst, sea_level_pressure, stretch_c, Wind_Speed_10m, zos) | \n", "
25 | \n", "oce | \n", "1month | \n", "(A_tracer_v_to, tke) | \n", "
26 | \n", "oce | \n", "1month | \n", "(atmos_fluxes_FrshFlux_Evaporation, atmos_fluxes_FrshFlux_Precipitation, atmos_fluxes_FrshFlux_Runoff, atmos_fluxes_FrshFlux_SnowFall, atmos_fluxes_HeatFlux_Latent, atmos_fluxes_HeatFlux_LongWave, atmos_fluxes_HeatFlux_Sensible, atmos_fluxes_HeatFlux_ShortWave, atmos_fluxes_HeatFlux_Total, atmos_fluxes_stress_x, atmos_fluxes_stress_xw, atmos_fluxes_stress_y, atmos_fluxes_stress_yw, conc, heat_content_seaice, heat_content_snow, heat_content_total, hi, hs, ice_u, ice_v, mlotst, Qbot, Qtop, sea_level_pressure, stretch_c, zos, Wind_Speed_10m) | \n", "
27 | \n", "oce | \n", "1month | \n", "(so, tke, to, u, v, w, A_tracer_v_to, heat_content_liquid_water) | \n", "
28 | \n", "oce | \n", "1month | \n", "(so, to, u, v, w) | \n", "
29 | \n", "oce | \n", "3hour | \n", "(A_tracer_v_to, A_veloc_v, tke) | \n", "
30 | \n", "oce | \n", "3hour | \n", "(so, to, u, v, w) | \n", "
31 | \n", "oce | \n", "6hour | \n", "(total_salt, total_saltinseaice, total_saltinliquidwater, amoc26n, kin_energy_global, pot_energy_global, total_energy_global, ssh_global, sst_global, sss_global, potential_enstrophy_global, HeatFlux_Total_global, FrshFlux_Precipitation_global, FrshFlux_SnowFall_global, FrshFlux_Evaporation_global, FrshFlux_Runoff_global, FrshFlux_VolumeIce_global, FrshFlux_TotalOcean_global, FrshFlux_TotalIce_global, FrshFlux_VolumeTotal_global, totalsnowfall_global, ice_volume_nh, ice_volume_sh, ice_extent_nh, ice_extent_sh, global_heat_content, global_heat_content_solid) | \n", "
\n", " | realm | \n", "frequency | \n", "level_type | \n", "variable_id | \n", "
---|---|---|---|---|
0 | \n", "atm | \n", "1day | \n", "ml | \n", "(clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
1 | \n", "atm | \n", "1day | \n", "ml | \n", "(psl, clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
2 | \n", "atm | \n", "1month | \n", "ml | \n", "(sfcwind, clivi, cllvi, cptgzvi, hfls, hfss, prlr, pr, prw, qgvi, qrvi, qsvi, rlds, rlus, rlut, rsds, rsdt, rsus, rsut, tauu, tauv, rpds_dir, rpds_dif, rvds_dif, rnds_dif, clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
3 | \n", "atm | \n", "1month | \n", "ml | \n", "(sfcwind, clivi, cllvi, cptgzvi, hfls, hfss, prlr, pr, prw, qgvi, qrvi, qsvi, rlds, rlus, rlut, rsds, rsdt, rsus, rsut, tauu, tauv, rpds_dir, rpds_dif, rvds_dif, rnds_dif, psl, clt, evspsbl, tas, ts, rldscs, rlutcs, rsdscs, rsuscs, rsutcs) | \n", "
4 | \n", "atm | \n", "30minute | \n", "ml | \n", "(psl, ps, sit, sic, tas, ts, uas, vas, cfh_lnd) | \n", "
/work/k20200/k202134/Catalogs/dng-merged catalog with 1 dataset(s) from 817 asset(s):
\n", " | unique | \n", "
---|---|
variable_id | \n", "9 | \n", "
project | \n", "1 | \n", "
institution_id | \n", "1 | \n", "
source_id | \n", "1 | \n", "
experiment_id | \n", "1 | \n", "
simulation_id | \n", "1 | \n", "
realm | \n", "1 | \n", "
frequency | \n", "1 | \n", "
time_reduction | \n", "1 | \n", "
grid_label | \n", "1 | \n", "
level_type | \n", "1 | \n", "
time_min | \n", "817 | \n", "
time_max | \n", "817 | \n", "
grid_id | \n", "1 | \n", "
format | \n", "1 | \n", "
uri | \n", "817 | \n", "
<xarray.Dataset>\n", "Dimensions: (time: 36722, height: 1, ncells: 20971520)\n", "Coordinates:\n", " * height (height) float64 2.0\n", " * time (time) datetime64[ns] 2020-01-20 2020-01-20T00:30:00 ... 2022-03-01\n", "Dimensions without coordinates: ncells\n", "Data variables:\n", " tas (time, height, ncells) float32 dask.array<chunksize=(1, 1, 20971520), meta=np.ndarray>\n", "Attributes: (12/13)\n", " Conventions: CF-1.6\n", " institution: Max Planck Institute for Meteorology/Deutscher W...\n", " number_of_grid_used: 15\n", " CDI: Climate Data Interface version 1.8.3rc (http://m...\n", " uuidOfHGrid: 0f1e7d66-637e-11e8-913b-51232bb4d8f9\n", " history: ./icon at 20220512 152214\\n./icon at 20220512 19...\n", " ... ...\n", " title: ICON simulation\n", " grid_file_uri: http://icon-downloads.mpimet.mpg.de/grids/public...\n", " comment: Sapphire Dyamond (k203123) on l10739 (Linux 4.18...\n", " source: git@gitlab.dkrz.de:icon/icon-aes.git@87a1eaded69...\n", " intake_esm_varname: ['tas']\n", " intake_esm_dataset_key: nextGEMS.MPI-M.ICON-ESM.nextgems_cycle2.ngc2009....
<xarray.DataArray 'tas' (time: 36722, height: 1)>\n", "dask.array<_nanmax_skip-aggregate, shape=(36722, 1), dtype=float32, chunksize=(1, 1), chunktype=numpy.ndarray>\n", "Coordinates:\n", " * height (height) float64 2.0\n", " * time (time) datetime64[ns] 2020-01-20 2020-01-20T00:30:00 ... 2022-03-01