Accessing the metadata of variables#
This script creates a pandas Dataframe with all metadata properties of the variables in a simulation.
Setup:#
[1]:
# basics
import intake
import dask # memory-efficient parallel computation and delayed execution (lazy evaluation).
dask.config.set(**{"array.slicing.split_large_chunks": True})
import pandas as pd
pd.set_option("max_colwidth", None) # makes the tables render better
# for ifs-fesom (in case you want to get the variables for gribscan-processed data sets)
try:
import gribscan
except:
%pip install gribscan
import gribscan
def get_from_cat(catalog, columns):
"""A helper function for inspecting an intake catalog.
Call with the catalog to be inspected and a list of columns of interest."""
if type(columns) == type(""):
columns = [columns]
return (
catalog.df[columns]
.drop_duplicates()
.sort_values(columns)
.reset_index(drop=True)
)
Use the catalog you want to use:
[2]:
catalog_file = "/work/ka1081/Catalogs/dyamond-nextgems.json"
cat = intake.open_esm_datastore(catalog_file)
cat
ICON-ESM catalog with 130 dataset(s) from 88823 asset(s):
unique | |
---|---|
variable_id | 546 |
project | 2 |
institution_id | 12 |
source_id | 19 |
experiment_id | 4 |
simulation_id | 12 |
realm | 5 |
frequency | 12 |
time_reduction | 4 |
grid_label | 7 |
level_type | 9 |
time_min | 918 |
time_max | 1094 |
grid_id | 3 |
format | 1 |
uri | 88813 |
File Selection:#
[3]:
get_from_cat(cat, "simulation_id")
[3]:
simulation_id | |
---|---|
0 | dpp0014 |
1 | dpp0029 |
2 | dpp0052 |
3 | dpp0054 |
4 | dpp0065 |
5 | dpp0066 |
6 | dpp0067 |
7 | hlq0 |
8 | hmrt |
9 | hmt0 |
10 | hmwz |
11 | r1i1p1f1 |
We want to look up the longnames in ‘ngc2009’. We don’t want to load all files of the simulation, as that would take to long and is not necessary. The idea is, if we select the files with the lowest ‘time_min’, we still have all variables in the cropped catalog, because all variable have to be present in each respective time-interval.
[4]:
my_simulation_id = "dpp0067"
time_min = get_from_cat(cat.search(simulation_id=my_simulation_id), "time_min")
time_min
[4]:
time_min | |
---|---|
0 | 2020-01-20T00:00:00 |
1 | 2020-01-21T00:00:00 |
2 | 2020-01-22T00:00:00 |
3 | 2020-01-23T00:00:00 |
4 | 2020-01-24T00:00:00 |
... | ... |
68 | 2020-03-28T00:00:00 |
69 | 2020-03-29T00:00:00 |
70 | 2020-03-30T00:00:00 |
71 | 2020-03-31T00:00:00 |
72 | 2020-04-01T00:00:00 |
73 rows × 1 columns
Now we grab the files with the lowest time_min
value:
[5]:
first_files = cat.search(simulation_id=my_simulation_id, time_min=time_min.values[0])
first_files
ICON-ESM catalog with 5 dataset(s) from 14 asset(s):
unique | |
---|---|
variable_id | 106 |
project | 1 |
institution_id | 1 |
source_id | 1 |
experiment_id | 1 |
simulation_id | 1 |
realm | 2 |
frequency | 3 |
time_reduction | 2 |
grid_label | 1 |
level_type | 2 |
time_min | 1 |
time_max | 1 |
grid_id | 1 |
format | 1 |
uri | 14 |
Now we have only a bunch files left.
Loading the data#
Now we just have to load the files from first_files
How to access variable_information:#
[6]:
# This sometimes goes to the ditch, and then it's nice to know for which dataset.
try:
dataset_dict = first_files.to_dataset_dict()
except Exception as ex:
template = "An exception of type {0} occurred. Arguments:\n{1!r}"
message = template.format(type(ex).__name__, ex.args)
print(message)
--> The keys in the returned dictionary of datasets are constructed as follows:
'project.institution_id.source_id.experiment_id.simulation_id.realm.frequency.time_reduction.grid_label.level_type'
[7]:
all_atts = []
for name, data in dataset_dict.items():
all_atts.extend(
dict(variable_id=var, **data[var].attrs) for var in data.data_vars
) # we take the detour via dict and ** to add the variable name
frame = pd.DataFrame(all_atts)
frame
[7]:
variable_id | standard_name | long_name | units | param | CDI_grid_type | number_of_grid_in_reference | level_type | code | |
---|---|---|---|---|---|---|---|---|---|
0 | ps | surface_air_pressure | surface pressure | Pa | 0.3.0 | unstructured | 1.0 | NaN | NaN |
1 | psl | mean sea level pressure | mean sea level pressure | Pa | 1.3.0 | unstructured | 1.0 | NaN | NaN |
2 | rsdt | toa_incoming_shortwave_flux | toa incident shortwave radiation | W m-2 | 201.4.0 | unstructured | 1.0 | toa | NaN |
3 | rsut | toa_outgoing_shortwave_flux | toa outgoing shortwave radiation | W m-2 | 8.4.0 | unstructured | 1.0 | toa | NaN |
4 | rsutcs | toa_outgoing_shortwave_flux_assuming_clear_sky | toa outgoing clear-sky shortwave radiation | W m-2 | 208.4.0 | unstructured | 1.0 | toa | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
130 | ua | eastward_wind | Zonal wind | m s-1 | 2.2.0 | unstructured | 1.0 | NaN | NaN |
131 | va | northward_wind | Meridional wind | m s-1 | 3.2.0 | unstructured | 1.0 | NaN | NaN |
132 | height_2_bnds | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
133 | wa | upward_air_velocity | Vertical velocity | m s-1 | 9.2.0 | unstructured | 1.0 | NaN | NaN |
134 | cl | cl | cloud area fraction | m2 m-2 | 22.6.0 | unstructured | 1.0 | NaN | NaN |
135 rows × 9 columns
Let’s get rid of duplicates, and show the full dataset.
[8]:
pd.set_option("display.max_rows", None)
frame.drop_duplicates().sort_values([x for x in frame.columns]).reset_index(drop=True)
[8]:
variable_id | standard_name | long_name | units | param | CDI_grid_type | number_of_grid_in_reference | level_type | code | |
---|---|---|---|---|---|---|---|---|---|
0 | A_tracer_v_to | A_tracer_v_to | sea water potential temperature(A_tracer_v) | kg/kg | NaN | unstructured | 1.0 | NaN | 255.0 |
1 | A_veloc_v | A_veloc_v | vertical velocity diffusion | kg/kg | NaN | unstructured | 3.0 | NaN | 255.0 |
2 | Qbot | Qbot | Conductive heat flux at ice-ocean interface | W/m^2 | NaN | unstructured | 1.0 | NaN | 255.0 |
3 | Qtop | Qtop | Energy flux available for surface melting | W/m^2 | NaN | unstructured | 1.0 | NaN | 255.0 |
4 | Wind_Speed_10m | Wind_Speed_10m | Wind Speed at 10m height | m/s | NaN | unstructured | 1.0 | NaN | 255.0 |
5 | atmos_fluxes_FrshFlux_Evaporation | atmos_fluxes_FrshFlux_Evaporation | atmos_fluxes_FrshFlux_Evaporation | [m/s] | NaN | unstructured | 1.0 | NaN | 255.0 |
6 | atmos_fluxes_FrshFlux_Precipitation | atmos_fluxes_FrshFlux_Precipitation | atmos_fluxes_FrshFlux_Precipitation | [m/s] | NaN | unstructured | 1.0 | NaN | 255.0 |
7 | atmos_fluxes_FrshFlux_Runoff | atmos_fluxes_FrshFlux_Runoff | atmos_fluxes_FrshFlux_Runoff | [m/s] | NaN | unstructured | 1.0 | NaN | 255.0 |
8 | atmos_fluxes_FrshFlux_SnowFall | atmos_fluxes_FrshFlux_SnowFall | atmos_fluxes_FrshFlux_SnowFall | [m/s] | NaN | unstructured | 1.0 | NaN | 255.0 |
9 | atmos_fluxes_HeatFlux_Latent | atmos_fluxes_HeatFlux_Latent | atmos_fluxes_HeatFlux_Latent | [W/m2] | NaN | unstructured | 1.0 | NaN | 255.0 |
10 | atmos_fluxes_HeatFlux_LongWave | atmos_fluxes_HeatFlux_LongWave | atmos_fluxes_HeatFlux_LongWave | [W/m2] | NaN | unstructured | 1.0 | NaN | 255.0 |
11 | atmos_fluxes_HeatFlux_Sensible | atmos_fluxes_HeatFlux_Sensible | atmos_fluxes_HeatFlux_Sensible | [W/m2] | NaN | unstructured | 1.0 | NaN | 255.0 |
12 | atmos_fluxes_HeatFlux_ShortWave | atmos_fluxes_HeatFlux_ShortWave | atmos_fluxes_HeatFlux_ShortWave | [W/m2] | NaN | unstructured | 1.0 | NaN | 255.0 |
13 | atmos_fluxes_HeatFlux_Total | atmos_fluxes_HeatFlux_Total | atmos_fluxes_HeatFlux_Total | [W/m2] | NaN | unstructured | 1.0 | NaN | 255.0 |
14 | atmos_fluxes_stress_x | atmos_fluxes_stress_x | atmos_fluxes_stress_x | Pa | NaN | unstructured | 1.0 | NaN | 255.0 |
15 | atmos_fluxes_stress_xw | atmos_fluxes_stress_xw | atmos_fluxes_stress_xw | Pa | NaN | unstructured | 1.0 | NaN | 255.0 |
16 | atmos_fluxes_stress_y | atmos_fluxes_stress_y | atmos_fluxes_stress_y | Pa | NaN | unstructured | 1.0 | NaN | 255.0 |
17 | atmos_fluxes_stress_yw | atmos_fluxes_stress_yw | atmos_fluxes_stress_yw | Pa | NaN | unstructured | 1.0 | NaN | 255.0 |
18 | cl | cl | cloud area fraction | m2 m-2 | 22.6.0 | unstructured | 1.0 | NaN | NaN |
19 | cli | cli | specific cloud ice content | kg kg-1 | 82.1.0 | unstructured | 1.0 | NaN | NaN |
20 | clivi | total_cloud_ice | vertically integrated cloud ice | kg m-2 | 70.1.0 | unstructured | 1.0 | atmosphere | NaN |
21 | cllvi | total_cloud_water | vertically integrated cloud water | kg m-2 | 69.1.0 | unstructured | 1.0 | atmosphere | NaN |
22 | clt | clt | total cloud cover | m2 m-2 | 1.6.0 | unstructured | 1.0 | NaN | NaN |
23 | clw | clw | specific cloud water content | kg kg-1 | 22.1.0 | unstructured | 1.0 | NaN | NaN |
24 | conc | conc | ice concentration in each ice class | NaN | NaN | unstructured | 1.0 | NaN | 255.0 |
25 | cptgz | cptgz | dry static energy | m2 s-2 | NaN | unstructured | 1.0 | NaN | 255.0 |
26 | cptgzvi | vertically integrated dry static energy | vert_int_dry_static_energy | m2 s-2 | NaN | unstructured | 1.0 | atmosphere | 255.0 |
27 | evspsbl | evap | evaporation | kg m-2 s-1 | 6.1.0 | unstructured | 1.0 | NaN | NaN |
28 | gpsm | geopotential_above_surface | geopotential above surface | m2 s-2 | 4.3.0 | unstructured | 1.0 | NaN | NaN |
29 | height_2_bnds | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
30 | height_bnds | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
31 | hfls | lhflx | latent heat flux | W m-2 | 10.0.0 | unstructured | 1.0 | NaN | NaN |
32 | hfss | shflx | sensible heat flux | W m-2 | 11.0.0 | unstructured | 1.0 | NaN | NaN |
33 | hi | hi | ice thickness | m | NaN | unstructured | 1.0 | NaN | 255.0 |
34 | hs | hs | snow thickness | m | NaN | unstructured | 1.0 | NaN | 255.0 |
35 | hus | specific_humidity | Specific humidity | kg kg-1 | 0.1.0 | unstructured | 1.0 | NaN | NaN |
36 | hydro_discharge_box | discharge | local discharge | kg m-2 s-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
37 | hydro_discharge_ocean_box | discharge_ocean | discharge to the ocean | m3 s-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
38 | hydro_drainage_box | drainage | drainage | kg m-2 s-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
39 | hydro_fract_snow_box | fraction of snow on surface | NaN | - | 202.0.1 | unstructured | 1.0 | NaN | NaN |
40 | hydro_fract_water_box | fraction of water on surface surface_wet_fraction | NaN | - | 201.0.1 | unstructured | 1.0 | NaN | NaN |
41 | hydro_q_snocpymlt_box | heating_snow_cpy_melt | NaN | W m-2 | NaN | unstructured | 1.0 | NaN | 255.0 |
42 | hydro_runoff_box | surface_runoff | Surface runoff | kg m-2 s-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
43 | hydro_transpiration_box | surface_transpiration | Transpiration from surface | kg m-2 s-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
44 | hydro_w_ice_sl_box | Ice content in soil layers | NaN | m water equivalent | NaN | unstructured | 1.0 | NaN | 255.0 |
45 | hydro_w_skin_box | skin_reservoir | Water content in skin reservoir of surface | m water equivalent | 211.0.1 | unstructured | 1.0 | NaN | NaN |
46 | hydro_w_snow_box | Water content of snow reservoir on surface | NaN | m water equivalent | 212.0.1 | unstructured | 1.0 | NaN | NaN |
47 | hydro_w_soil_column_box | Water content in the whole soil column | NaN | m water equivalent | 213.0.1 | unstructured | 1.0 | NaN | NaN |
48 | hydro_w_soil_sl_box | Water content in soil layers | NaN | m water equivalent | NaN | unstructured | 1.0 | NaN | 255.0 |
49 | ice_u | ice_u | zonal velocity | m/s | NaN | unstructured | 1.0 | NaN | 255.0 |
50 | ice_v | ice_v | meridional velocity | m/s | NaN | unstructured | 1.0 | NaN | 255.0 |
51 | mlotst | mlotst | ocean_mixed_layer_thickness_defined_by_sigma_t | m | NaN | unstructured | 1.0 | NaN | 255.0 |
52 | pfull | air_pressure | Pressure | Pa | 0.3.0 | unstructured | 1.0 | NaN | NaN |
53 | pr | pr | precipitation flux | kg m-2 s-1 | 52.1.0 | unstructured | 1.0 | NaN | NaN |
54 | prlr | prlr | large-scale precipitation flux (water) | kg m-2 s-1 | 77.1.0 | unstructured | 1.0 | NaN | NaN |
55 | prls | prls | large-scale precipitation flux (snow) | kg m-2 s-1 | 59.1.0 | unstructured | 1.0 | NaN | NaN |
56 | prw | total_vapour | vertically integrated water vapour | kg m-2 | 64.1.0 | unstructured | 1.0 | atmosphere | NaN |
57 | ps | surface_air_pressure | surface pressure | Pa | 0.3.0 | unstructured | 1.0 | NaN | NaN |
58 | psl | mean sea level pressure | mean sea level pressure | Pa | 1.3.0 | unstructured | 1.0 | NaN | NaN |
59 | qgvi | total_graupel | vertically integrated graupel | kg m-2 | 223.1.0 | unstructured | 1.0 | atmosphere | NaN |
60 | qrvi | total_rain | vertically integrated rain | kg m-2 | 221.1.0 | unstructured | 1.0 | atmosphere | NaN |
61 | qsvi | total_snow | vertically integrated snow | kg m-2 | 222.1.0 | unstructured | 1.0 | atmosphere | NaN |
62 | rld | downwelling_longwave_flux_in_air | downwelling longwave radiation | W m-2 | 3.5.0 | unstructured | 1.0 | NaN | NaN |
63 | rlds | surface_downwelling_longwave_flux_in_air | surface downwelling longwave radiation | W m-2 | 3.5.0 | unstructured | 1.0 | NaN | NaN |
64 | rldscs | surface_downwelling_longwave_flux_in_air_assuming_clear_sky | surface downwelling clear-sky longwave radiation | W m-2 | 203.5.0 | unstructured | 1.0 | NaN | NaN |
65 | rlu | upwelling_longwave_flux_in_air | upwelling longwave radiation | W m-2 | 4.5.0 | unstructured | 1.0 | NaN | NaN |
66 | rlus | surface_upwelling_longwave_flux_in_air | surface upwelling longwave radiation | W m-2 | 199.5.0 | unstructured | 1.0 | NaN | NaN |
67 | rlut | toa_outgoing_longwave_flux | toa outgoing longwave radiation | W m-2 | 4.5.0 | unstructured | 1.0 | toa | NaN |
68 | rlutcs | toa_outgoing_longwave_flux_assuming_clear_sky | toa outgoing clear-sky longwave radiation | W m-2 | 204.5.0 | unstructured | 1.0 | toa | NaN |
69 | rsd | downwelling_shortwave_flux_in_air | downwelling shortwave radiation | W m-2 | 7.4.0 | unstructured | 1.0 | NaN | NaN |
70 | rsds | surface_downwelling_shortwave_flux_in_air | surface downwelling shortwave radiation | W m-2 | 7.4.0 | unstructured | 1.0 | NaN | NaN |
71 | rsdscs | surface_downwelling_shortwave_flux_in_air_assuming_clear_sky | surface downwelling clear-sky shortwave radiation | W m-2 | 207.4.0 | unstructured | 1.0 | NaN | NaN |
72 | rsdt | toa_incoming_shortwave_flux | toa incident shortwave radiation | W m-2 | 201.4.0 | unstructured | 1.0 | toa | NaN |
73 | rsu | upwelling_shortwave_flux_in_air | upwelling shortwave radiation | W m-2 | 8.4.0 | unstructured | 1.0 | NaN | NaN |
74 | rsus | surface_upwelling_shortwave_flux_in_air | surface upwelling shortwave radiation | W m-2 | 199.4.0 | unstructured | 1.0 | NaN | NaN |
75 | rsuscs | surface_upwelling_shortwave_flux_in_air_assuming_clear_sky | surface upwelling clear-sky shortwave radiation | W m-2 | 209.4.0 | unstructured | 1.0 | NaN | NaN |
76 | rsut | toa_outgoing_shortwave_flux | toa outgoing shortwave radiation | W m-2 | 8.4.0 | unstructured | 1.0 | toa | NaN |
77 | rsutcs | toa_outgoing_shortwave_flux_assuming_clear_sky | toa outgoing clear-sky shortwave radiation | W m-2 | 208.4.0 | unstructured | 1.0 | toa | NaN |
78 | sea_level_pressure | Sea_Level_Pressure | Sea Level Pressure | Pa | NaN | unstructured | 1.0 | NaN | 255.0 |
79 | sfcwind | sfcwind | 10m windspeed | m s-1 | 1.2.0 | unstructured | 1.0 | NaN | NaN |
80 | sic | sea_ice_cover | fraction of ocean covered by sea ice | NaN | 0.2.10 | unstructured | 1.0 | NaN | NaN |
81 | sit | siced | sea ice thickness | m | 1.2.10 | unstructured | 1.0 | NaN | NaN |
82 | so | sea_water_salinity | sea water salinity | psu | NaN | unstructured | 1.0 | NaN | 5.0 |
83 | soil_depth_energy_bnds | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
84 | soil_depth_water_bnds | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
85 | sse_grnd_hflx_old_box | grnd_hflx_old | Ground heat flux (old) | J m-2 s-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
86 | sse_hcap_grnd_old_box | heat_capacity_ground_old | Ground heat capacity (old) | J m-2 K-1 | NaN | unstructured | 1.0 | NaN | 255.0 |
87 | sse_t_soil_sl_box | soil_temperature | NaN | K | NaN | unstructured | 1.0 | NaN | 255.0 |
88 | ta | air_temperature | Temperature | K | 0.0.0 | unstructured | 1.0 | NaN | NaN |
89 | tas | tas | temperature in 2m | K | 0.0.0 | unstructured | 1.0 | NaN | NaN |
90 | tauu | u_stress | u-momentum flux at the surface | N m-2 | 17.2.0 | unstructured | 1.0 | NaN | NaN |
91 | tauv | v_stress | v-momentum flux at the surface | N m-2 | 18.2.0 | unstructured | 1.0 | NaN | NaN |
92 | tke | tke | turbulent kinetic energy | m2 s-2 | NaN | unstructured | 1.0 | NaN | 255.0 |
93 | to | sea_water_potential_temperature | sea water potential temperature | deg C | NaN | unstructured | 1.0 | NaN | 2.0 |
94 | ts | surface_temperature | surface temperature | K | 0.0.0 | unstructured | 1.0 | NaN | NaN |
95 | turb_fact_q_air_box | fact_q_air | NaN | NaN | NaN | unstructured | 1.0 | NaN | 255.0 |
96 | turb_fact_qsat_srf_box | fact_qsat_srf | NaN | NaN | NaN | unstructured | 1.0 | NaN | 255.0 |
97 | turb_fact_qsat_trans_srf_box | fact_qsat_trans_srf | NaN | NaN | NaN | unstructured | 1.0 | NaN | 255.0 |
98 | u | u | u zonal velocity component | m/s | NaN | unstructured | 1.0 | NaN | 255.0 |
99 | ua | eastward_wind | Zonal wind | m s-1 | 2.2.0 | unstructured | 1.0 | NaN | NaN |
100 | uas | uas | zonal wind in 10m | m s-1 | 2.2.0 | unstructured | 1.0 | NaN | NaN |
101 | v | v | v meridional velocity component | m/s | NaN | unstructured | 1.0 | NaN | 255.0 |
102 | va | northward_wind | Meridional wind | m s-1 | 3.2.0 | unstructured | 1.0 | NaN | NaN |
103 | vas | vas | meridional wind in 10m | m s-1 | 3.2.0 | unstructured | 1.0 | NaN | NaN |
104 | vor | relative_vorticity_on_cells | Vorticity | s-1 | 12.2.0 | unstructured | 1.0 | NaN | NaN |
105 | w | w | vertical velocity at cells | m/s | NaN | unstructured | 1.0 | NaN | 255.0 |
106 | wa | upward_air_velocity | Vertical velocity | m s-1 | 9.2.0 | unstructured | 1.0 | NaN | NaN |
107 | wap | omega | vertical velocity | Pa s-1 | 8.2.0 | unstructured | 1.0 | NaN | NaN |
108 | zg | geometric_height_at_full_level_center | geometric height at full level center | m | 6.3.0 | unstructured | 1.0 | NaN | NaN |
109 | zos | zos.TL2 | surface elevation at cell center | m | NaN | unstructured | 1.0 | NaN | 1.0 |
[ ]: