Accessing the metadata of variables

This script creates a pandas Dataframe with all metadata properties of the variables in a simulation.

Setup:

[1]:
# basics
import intake
import dask  # memory-efficient parallel computation and delayed execution (lazy evaluation).

dask.config.set(**{"array.slicing.split_large_chunks": True})

import pandas as pd

pd.set_option("max_colwidth", None)  # makes the tables render better

# for ifs-fesom (in case you want to get the variables for gribscan-processed data sets)
try:
    import gribscan
except:
    %pip install gribscan
    import gribscan


def get_from_cat(catalog, columns):
    """A helper function for inspecting an intake catalog.

    Call with the catalog to be inspected and a list of columns of interest."""

    if type(columns) == type(""):
        columns = [columns]
    return (
        catalog.df[columns]
        .drop_duplicates()
        .sort_values(columns)
        .reset_index(drop=True)
    )

Use the catalog you want to use:

[2]:
catalog_file = "/work/ka1081/Catalogs/dyamond-nextgems.json"
cat = intake.open_esm_datastore(catalog_file)
cat

ICON-ESM catalog with 130 dataset(s) from 88823 asset(s):

unique
variable_id 546
project 2
institution_id 12
source_id 19
experiment_id 4
simulation_id 12
realm 5
frequency 12
time_reduction 4
grid_label 7
level_type 9
time_min 918
time_max 1094
grid_id 3
format 1
uri 88813

File Selection:

[3]:
get_from_cat(cat, "simulation_id")
[3]:
simulation_id
0 dpp0014
1 dpp0029
2 dpp0052
3 dpp0054
4 dpp0065
5 dpp0066
6 dpp0067
7 hlq0
8 hmrt
9 hmt0
10 hmwz
11 r1i1p1f1

We want to look up the longnames in ‘ngc2009’. We don’t want to load all files of the simulation, as that would take to long and is not necessary. The idea is, if we select the files with the lowest ‘time_min’, we still have all variables in the cropped catalog, because all variable have to be present in each respective time-interval.

[4]:
my_simulation_id = "dpp0067"
time_min = get_from_cat(cat.search(simulation_id=my_simulation_id), "time_min")
time_min
[4]:
time_min
0 2020-01-20T00:00:00
1 2020-01-21T00:00:00
2 2020-01-22T00:00:00
3 2020-01-23T00:00:00
4 2020-01-24T00:00:00
... ...
68 2020-03-28T00:00:00
69 2020-03-29T00:00:00
70 2020-03-30T00:00:00
71 2020-03-31T00:00:00
72 2020-04-01T00:00:00

73 rows × 1 columns

Now we grab the files with the lowest time_min value:

[5]:
first_files = cat.search(simulation_id=my_simulation_id, time_min=time_min.values[0])
first_files

ICON-ESM catalog with 5 dataset(s) from 14 asset(s):

unique
variable_id 106
project 1
institution_id 1
source_id 1
experiment_id 1
simulation_id 1
realm 2
frequency 3
time_reduction 2
grid_label 1
level_type 2
time_min 1
time_max 1
grid_id 1
format 1
uri 14

Now we have only a bunch files left.

Loading the data

Now we just have to load the files from first_files

How to access variable_information:

[6]:
# This sometimes goes to the ditch, and then it's nice to know for which dataset.

try:
    dataset_dict = first_files.to_dataset_dict()
except Exception as ex:
    template = "An exception of type {0} occurred. Arguments:\n{1!r}"
    message = template.format(type(ex).__name__, ex.args)
    print(message)

--> The keys in the returned dictionary of datasets are constructed as follows:
        'project.institution_id.source_id.experiment_id.simulation_id.realm.frequency.time_reduction.grid_label.level_type'
100.00% [5/5 00:04<00:00]
[7]:
all_atts = []
for name, data in dataset_dict.items():
    all_atts.extend(
        dict(variable_id=var, **data[var].attrs) for var in data.data_vars
    )  # we take the detour via dict and ** to add the variable name
frame = pd.DataFrame(all_atts)
frame
[7]:
variable_id standard_name long_name units param CDI_grid_type number_of_grid_in_reference level_type code
0 ps surface_air_pressure surface pressure Pa 0.3.0 unstructured 1.0 NaN NaN
1 psl mean sea level pressure mean sea level pressure Pa 1.3.0 unstructured 1.0 NaN NaN
2 rsdt toa_incoming_shortwave_flux toa incident shortwave radiation W m-2 201.4.0 unstructured 1.0 toa NaN
3 rsut toa_outgoing_shortwave_flux toa outgoing shortwave radiation W m-2 8.4.0 unstructured 1.0 toa NaN
4 rsutcs toa_outgoing_shortwave_flux_assuming_clear_sky toa outgoing clear-sky shortwave radiation W m-2 208.4.0 unstructured 1.0 toa NaN
... ... ... ... ... ... ... ... ... ...
130 ua eastward_wind Zonal wind m s-1 2.2.0 unstructured 1.0 NaN NaN
131 va northward_wind Meridional wind m s-1 3.2.0 unstructured 1.0 NaN NaN
132 height_2_bnds NaN NaN NaN NaN NaN NaN NaN NaN
133 wa upward_air_velocity Vertical velocity m s-1 9.2.0 unstructured 1.0 NaN NaN
134 cl cl cloud area fraction m2 m-2 22.6.0 unstructured 1.0 NaN NaN

135 rows × 9 columns

Let’s get rid of duplicates, and show the full dataset.

[8]:
pd.set_option("display.max_rows", None)
frame.drop_duplicates().sort_values([x for x in frame.columns]).reset_index(drop=True)
[8]:
variable_id standard_name long_name units param CDI_grid_type number_of_grid_in_reference level_type code
0 A_tracer_v_to A_tracer_v_to sea water potential temperature(A_tracer_v) kg/kg NaN unstructured 1.0 NaN 255.0
1 A_veloc_v A_veloc_v vertical velocity diffusion kg/kg NaN unstructured 3.0 NaN 255.0
2 Qbot Qbot Conductive heat flux at ice-ocean interface W/m^2 NaN unstructured 1.0 NaN 255.0
3 Qtop Qtop Energy flux available for surface melting W/m^2 NaN unstructured 1.0 NaN 255.0
4 Wind_Speed_10m Wind_Speed_10m Wind Speed at 10m height m/s NaN unstructured 1.0 NaN 255.0
5 atmos_fluxes_FrshFlux_Evaporation atmos_fluxes_FrshFlux_Evaporation atmos_fluxes_FrshFlux_Evaporation [m/s] NaN unstructured 1.0 NaN 255.0
6 atmos_fluxes_FrshFlux_Precipitation atmos_fluxes_FrshFlux_Precipitation atmos_fluxes_FrshFlux_Precipitation [m/s] NaN unstructured 1.0 NaN 255.0
7 atmos_fluxes_FrshFlux_Runoff atmos_fluxes_FrshFlux_Runoff atmos_fluxes_FrshFlux_Runoff [m/s] NaN unstructured 1.0 NaN 255.0
8 atmos_fluxes_FrshFlux_SnowFall atmos_fluxes_FrshFlux_SnowFall atmos_fluxes_FrshFlux_SnowFall [m/s] NaN unstructured 1.0 NaN 255.0
9 atmos_fluxes_HeatFlux_Latent atmos_fluxes_HeatFlux_Latent atmos_fluxes_HeatFlux_Latent [W/m2] NaN unstructured 1.0 NaN 255.0
10 atmos_fluxes_HeatFlux_LongWave atmos_fluxes_HeatFlux_LongWave atmos_fluxes_HeatFlux_LongWave [W/m2] NaN unstructured 1.0 NaN 255.0
11 atmos_fluxes_HeatFlux_Sensible atmos_fluxes_HeatFlux_Sensible atmos_fluxes_HeatFlux_Sensible [W/m2] NaN unstructured 1.0 NaN 255.0
12 atmos_fluxes_HeatFlux_ShortWave atmos_fluxes_HeatFlux_ShortWave atmos_fluxes_HeatFlux_ShortWave [W/m2] NaN unstructured 1.0 NaN 255.0
13 atmos_fluxes_HeatFlux_Total atmos_fluxes_HeatFlux_Total atmos_fluxes_HeatFlux_Total [W/m2] NaN unstructured 1.0 NaN 255.0
14 atmos_fluxes_stress_x atmos_fluxes_stress_x atmos_fluxes_stress_x Pa NaN unstructured 1.0 NaN 255.0
15 atmos_fluxes_stress_xw atmos_fluxes_stress_xw atmos_fluxes_stress_xw Pa NaN unstructured 1.0 NaN 255.0
16 atmos_fluxes_stress_y atmos_fluxes_stress_y atmos_fluxes_stress_y Pa NaN unstructured 1.0 NaN 255.0
17 atmos_fluxes_stress_yw atmos_fluxes_stress_yw atmos_fluxes_stress_yw Pa NaN unstructured 1.0 NaN 255.0
18 cl cl cloud area fraction m2 m-2 22.6.0 unstructured 1.0 NaN NaN
19 cli cli specific cloud ice content kg kg-1 82.1.0 unstructured 1.0 NaN NaN
20 clivi total_cloud_ice vertically integrated cloud ice kg m-2 70.1.0 unstructured 1.0 atmosphere NaN
21 cllvi total_cloud_water vertically integrated cloud water kg m-2 69.1.0 unstructured 1.0 atmosphere NaN
22 clt clt total cloud cover m2 m-2 1.6.0 unstructured 1.0 NaN NaN
23 clw clw specific cloud water content kg kg-1 22.1.0 unstructured 1.0 NaN NaN
24 conc conc ice concentration in each ice class NaN NaN unstructured 1.0 NaN 255.0
25 cptgz cptgz dry static energy m2 s-2 NaN unstructured 1.0 NaN 255.0
26 cptgzvi vertically integrated dry static energy vert_int_dry_static_energy m2 s-2 NaN unstructured 1.0 atmosphere 255.0
27 evspsbl evap evaporation kg m-2 s-1 6.1.0 unstructured 1.0 NaN NaN
28 gpsm geopotential_above_surface geopotential above surface m2 s-2 4.3.0 unstructured 1.0 NaN NaN
29 height_2_bnds NaN NaN NaN NaN NaN NaN NaN NaN
30 height_bnds NaN NaN NaN NaN NaN NaN NaN NaN
31 hfls lhflx latent heat flux W m-2 10.0.0 unstructured 1.0 NaN NaN
32 hfss shflx sensible heat flux W m-2 11.0.0 unstructured 1.0 NaN NaN
33 hi hi ice thickness m NaN unstructured 1.0 NaN 255.0
34 hs hs snow thickness m NaN unstructured 1.0 NaN 255.0
35 hus specific_humidity Specific humidity kg kg-1 0.1.0 unstructured 1.0 NaN NaN
36 hydro_discharge_box discharge local discharge kg m-2 s-1 NaN unstructured 1.0 NaN 255.0
37 hydro_discharge_ocean_box discharge_ocean discharge to the ocean m3 s-1 NaN unstructured 1.0 NaN 255.0
38 hydro_drainage_box drainage drainage kg m-2 s-1 NaN unstructured 1.0 NaN 255.0
39 hydro_fract_snow_box fraction of snow on surface NaN - 202.0.1 unstructured 1.0 NaN NaN
40 hydro_fract_water_box fraction of water on surface surface_wet_fraction NaN - 201.0.1 unstructured 1.0 NaN NaN
41 hydro_q_snocpymlt_box heating_snow_cpy_melt NaN W m-2 NaN unstructured 1.0 NaN 255.0
42 hydro_runoff_box surface_runoff Surface runoff kg m-2 s-1 NaN unstructured 1.0 NaN 255.0
43 hydro_transpiration_box surface_transpiration Transpiration from surface kg m-2 s-1 NaN unstructured 1.0 NaN 255.0
44 hydro_w_ice_sl_box Ice content in soil layers NaN m water equivalent NaN unstructured 1.0 NaN 255.0
45 hydro_w_skin_box skin_reservoir Water content in skin reservoir of surface m water equivalent 211.0.1 unstructured 1.0 NaN NaN
46 hydro_w_snow_box Water content of snow reservoir on surface NaN m water equivalent 212.0.1 unstructured 1.0 NaN NaN
47 hydro_w_soil_column_box Water content in the whole soil column NaN m water equivalent 213.0.1 unstructured 1.0 NaN NaN
48 hydro_w_soil_sl_box Water content in soil layers NaN m water equivalent NaN unstructured 1.0 NaN 255.0
49 ice_u ice_u zonal velocity m/s NaN unstructured 1.0 NaN 255.0
50 ice_v ice_v meridional velocity m/s NaN unstructured 1.0 NaN 255.0
51 mlotst mlotst ocean_mixed_layer_thickness_defined_by_sigma_t m NaN unstructured 1.0 NaN 255.0
52 pfull air_pressure Pressure Pa 0.3.0 unstructured 1.0 NaN NaN
53 pr pr precipitation flux kg m-2 s-1 52.1.0 unstructured 1.0 NaN NaN
54 prlr prlr large-scale precipitation flux (water) kg m-2 s-1 77.1.0 unstructured 1.0 NaN NaN
55 prls prls large-scale precipitation flux (snow) kg m-2 s-1 59.1.0 unstructured 1.0 NaN NaN
56 prw total_vapour vertically integrated water vapour kg m-2 64.1.0 unstructured 1.0 atmosphere NaN
57 ps surface_air_pressure surface pressure Pa 0.3.0 unstructured 1.0 NaN NaN
58 psl mean sea level pressure mean sea level pressure Pa 1.3.0 unstructured 1.0 NaN NaN
59 qgvi total_graupel vertically integrated graupel kg m-2 223.1.0 unstructured 1.0 atmosphere NaN
60 qrvi total_rain vertically integrated rain kg m-2 221.1.0 unstructured 1.0 atmosphere NaN
61 qsvi total_snow vertically integrated snow kg m-2 222.1.0 unstructured 1.0 atmosphere NaN
62 rld downwelling_longwave_flux_in_air downwelling longwave radiation W m-2 3.5.0 unstructured 1.0 NaN NaN
63 rlds surface_downwelling_longwave_flux_in_air surface downwelling longwave radiation W m-2 3.5.0 unstructured 1.0 NaN NaN
64 rldscs surface_downwelling_longwave_flux_in_air_assuming_clear_sky surface downwelling clear-sky longwave radiation W m-2 203.5.0 unstructured 1.0 NaN NaN
65 rlu upwelling_longwave_flux_in_air upwelling longwave radiation W m-2 4.5.0 unstructured 1.0 NaN NaN
66 rlus surface_upwelling_longwave_flux_in_air surface upwelling longwave radiation W m-2 199.5.0 unstructured 1.0 NaN NaN
67 rlut toa_outgoing_longwave_flux toa outgoing longwave radiation W m-2 4.5.0 unstructured 1.0 toa NaN
68 rlutcs toa_outgoing_longwave_flux_assuming_clear_sky toa outgoing clear-sky longwave radiation W m-2 204.5.0 unstructured 1.0 toa NaN
69 rsd downwelling_shortwave_flux_in_air downwelling shortwave radiation W m-2 7.4.0 unstructured 1.0 NaN NaN
70 rsds surface_downwelling_shortwave_flux_in_air surface downwelling shortwave radiation W m-2 7.4.0 unstructured 1.0 NaN NaN
71 rsdscs surface_downwelling_shortwave_flux_in_air_assuming_clear_sky surface downwelling clear-sky shortwave radiation W m-2 207.4.0 unstructured 1.0 NaN NaN
72 rsdt toa_incoming_shortwave_flux toa incident shortwave radiation W m-2 201.4.0 unstructured 1.0 toa NaN
73 rsu upwelling_shortwave_flux_in_air upwelling shortwave radiation W m-2 8.4.0 unstructured 1.0 NaN NaN
74 rsus surface_upwelling_shortwave_flux_in_air surface upwelling shortwave radiation W m-2 199.4.0 unstructured 1.0 NaN NaN
75 rsuscs surface_upwelling_shortwave_flux_in_air_assuming_clear_sky surface upwelling clear-sky shortwave radiation W m-2 209.4.0 unstructured 1.0 NaN NaN
76 rsut toa_outgoing_shortwave_flux toa outgoing shortwave radiation W m-2 8.4.0 unstructured 1.0 toa NaN
77 rsutcs toa_outgoing_shortwave_flux_assuming_clear_sky toa outgoing clear-sky shortwave radiation W m-2 208.4.0 unstructured 1.0 toa NaN
78 sea_level_pressure Sea_Level_Pressure Sea Level Pressure Pa NaN unstructured 1.0 NaN 255.0
79 sfcwind sfcwind 10m windspeed m s-1 1.2.0 unstructured 1.0 NaN NaN
80 sic sea_ice_cover fraction of ocean covered by sea ice NaN 0.2.10 unstructured 1.0 NaN NaN
81 sit siced sea ice thickness m 1.2.10 unstructured 1.0 NaN NaN
82 so sea_water_salinity sea water salinity psu NaN unstructured 1.0 NaN 5.0
83 soil_depth_energy_bnds NaN NaN NaN NaN NaN NaN NaN NaN
84 soil_depth_water_bnds NaN NaN NaN NaN NaN NaN NaN NaN
85 sse_grnd_hflx_old_box grnd_hflx_old Ground heat flux (old) J m-2 s-1 NaN unstructured 1.0 NaN 255.0
86 sse_hcap_grnd_old_box heat_capacity_ground_old Ground heat capacity (old) J m-2 K-1 NaN unstructured 1.0 NaN 255.0
87 sse_t_soil_sl_box soil_temperature NaN K NaN unstructured 1.0 NaN 255.0
88 ta air_temperature Temperature K 0.0.0 unstructured 1.0 NaN NaN
89 tas tas temperature in 2m K 0.0.0 unstructured 1.0 NaN NaN
90 tauu u_stress u-momentum flux at the surface N m-2 17.2.0 unstructured 1.0 NaN NaN
91 tauv v_stress v-momentum flux at the surface N m-2 18.2.0 unstructured 1.0 NaN NaN
92 tke tke turbulent kinetic energy m2 s-2 NaN unstructured 1.0 NaN 255.0
93 to sea_water_potential_temperature sea water potential temperature deg C NaN unstructured 1.0 NaN 2.0
94 ts surface_temperature surface temperature K 0.0.0 unstructured 1.0 NaN NaN
95 turb_fact_q_air_box fact_q_air NaN NaN NaN unstructured 1.0 NaN 255.0
96 turb_fact_qsat_srf_box fact_qsat_srf NaN NaN NaN unstructured 1.0 NaN 255.0
97 turb_fact_qsat_trans_srf_box fact_qsat_trans_srf NaN NaN NaN unstructured 1.0 NaN 255.0
98 u u u zonal velocity component m/s NaN unstructured 1.0 NaN 255.0
99 ua eastward_wind Zonal wind m s-1 2.2.0 unstructured 1.0 NaN NaN
100 uas uas zonal wind in 10m m s-1 2.2.0 unstructured 1.0 NaN NaN
101 v v v meridional velocity component m/s NaN unstructured 1.0 NaN 255.0
102 va northward_wind Meridional wind m s-1 3.2.0 unstructured 1.0 NaN NaN
103 vas vas meridional wind in 10m m s-1 3.2.0 unstructured 1.0 NaN NaN
104 vor relative_vorticity_on_cells Vorticity s-1 12.2.0 unstructured 1.0 NaN NaN
105 w w vertical velocity at cells m/s NaN unstructured 1.0 NaN 255.0
106 wa upward_air_velocity Vertical velocity m s-1 9.2.0 unstructured 1.0 NaN NaN
107 wap omega vertical velocity Pa s-1 8.2.0 unstructured 1.0 NaN NaN
108 zg geometric_height_at_full_level_center geometric height at full level center m 6.3.0 unstructured 1.0 NaN NaN
109 zos zos.TL2 surface elevation at cell center m NaN unstructured 1.0 NaN 1.0
[ ]: