Finding files on the command line with query_yaml#
To load query_yaml
, use
module use /work/k20200/k202134/hsm-tools/outtake/module
module load hsm-tools/unstable
Then you can search for files with query_yaml
. Just calling it without any other arguments will display a tree view of the nextGEMS catalog. Adding names of sub-trees will limit the search (e.g. query_yaml ICON
). Once you have limited it to one dataset, the contents of this dataset will be listed (query_yaml ICON ngc4008
).
In general, using --cdo
with --var NAME
on one specific dataset is a good choice if you want to use the output of query_yaml
with cdo
.
The full list of options can be obtained from the help function
usage: query_yaml.py [-h] [-c CATALOG_FILE] [-s [SEARCH_ARGS ...]] [--uri] [--var VAR] [--cdo] [-v] [branches ...]
Query contents of a YAML catalog.
positional arguments:
branches specify branches of the tree to follow, e.g.
IFS tco2559-ng5-cycle3 2D_1h_native
options:
-h, --help show this help message and exit
-c CATALOG_FILE, --catalog_file CATALOG_FILE
catalog to search, default = https://data.nextgems-h2020.eu/catalog.yaml
-s [SEARCH_ARGS ...], --search_args [SEARCH_ARGS ...]
specify search arguments for the YAML dataset at the end of the tree, e.g.
zoom=5 time=P1D
--uri print uris of files in this dataset
--var VAR only print uris for files containing VAR
--cdo Format output for CDO
-v, --verbose print debugging output
Try things like
query_yaml
query_yaml ICON
query_yaml ICON ngc3028
query_yaml ICON ngc3028 --search_args time=PT3H zoom=5
query_yaml FESOM tco2559-ng5-cycle3 2d_vertices_daily --uri --var vice
cdo -s --eccodes -infov [ -select,name=2t,timestep=1/15 $(query_yaml IFS IFS_9-NEMO_25-cycle3 2D_monthly_0.25deg --cdo --var=2t) ]
Dealing with dataset variants#
zarr datasets with various variants#
variants will be indicated in parentenses behind the dataset name, e.g.
ngc4008 (time, zoom)
.query_yaml
will be fast.use queries with
--search_args
, e.g.--search_args time=PT3H zoom=5
to get the desired file set.combine with
--cdo
to get the decorations needed for opening with cdo (or other libnetcdf-based utilities).Note that the resulting dataset will still contain a lot of variables (i.e. don’t just feed it into
cdo -timmean
)
Datasets spread over various netCDF/files (no kerchunk)#
query_yaml
will be slow to show the contents of the dataset (without--uri`
), as it has to open all files to check for their contents.just using
query_yaml
with--uri
, but without--var NAME
will dump all files on you, regardless of your interest in the variable (may or may not be useful).combine
--uri
with--var
to get files for a specific variable:query_yaml.py FESOM IFS_4.4-FESOM_5-cycle3 2D_1h_native --uri --var sst
Datasets represented via kerchunk (some netCDF, FDB/GRIB)#
query_yaml
will be fast.Plain
--uri
will lead you to the indexUse
--cdo
with--var NAME
to get actual file namesFile names will be sorted alphabetically as a best guess. If this is the right order in time depends on the person creating the files.
see also