(dyamond-library)= # The DYAMOND Data Library ## The Library Structure Since the DYAMOND data sets contain several petabytes of data, most of the data is archived in DKRZ's tape archive. To access the data sets associated with the project, we provide the `get_dyamond_summer` and `get_dyamond_winter` tools (see below). Files will be downloaded to the levante hard disks, and shared among all DYAMOND users. Please note that the quota for this storage is limited when downloading large amounts of data. To save space, data sets will be removed from the disks after two weeks without access. For any questions, please contact us at . ## Searching for a Data Set ### Load the hsm-tools module To use `get_dyamond_summer` and `get_dyamond_winter`, you first need to load the `hsm-tools` module. Run the following commands, or add them to your {file}`.bash_profile` to load the module automatically when you log in. Since there may be conflicts between different versions of `slk`, we recommend unloading `packems` from modules before proceeding: ```{code-block} bash module use /work/k20200/k202134/hsm-tools/outtake/module module load hsm-tools/unstable ``` Use `slk login` to authenticate with the tape library: ```{code-block} bash slk login ``` If you do not have access to the tape library, please see the documentation for [slk login](https://docs.dkrz.de/doc/datastorage/hsm/cli.html#slk-login) and [known issues on slk](https://docs.dkrz.de/doc/datastorage/hsm/known_issues.html#ldap-user-not-known-to-stronglink-prior-to-first-login). ### Search and retrieve your files `get_dyamond_summer` and `get_dyamond_winter` use the same syntax. For brevity, we will only show examples for `get_dyamond_summer` here. The same commands apply to `get_dyamond_winter`. Searching for files with `get_dyamond_summer` is done using regex syntax (`.*` for matching a sequence of characters). Downloaded files will be printed to stdout for piping and re-using in other commands, any other output goes to stderr. Here are some examples: ```{code-block} bash # Get the description files of all runs get_dyamond_summer datadescription.txt # get all files containing MPAS-3.75km/history.2016-08-02 get_dyamond_summer MPAS-3.75km/history.2016-08-02 # A more free search using .* for filling a gap in the filename get_dyamond_summer FV3-3.25km.*v200_C3072_144x72.fre.nc # Anything from FV3-3.25km, 2>&1 redirects stderr to stdout for less get_dyamond_summer FV3-3.25km 2>&1 |less ``` The above commands will return a list of files that match the search string. To retrieve the files, add the `--get` flag to the command and use `sbatch` to submit the job to the batch system: ```{code-block} bash sbatch get_dyamond_summer --get MPAS-3.75km/history.2016-08-02 sbatch get_dyamond_summer --get FV3-3.25km.*v200_C3072_144x72.fre.nc ``` The download will run in the background and **may take hours to days to complete**. You can check the status of the running job via `squeue -me` and by looking at the log file it creates: {file}`get_dyamond_summer.log${SLURM_JOB_ID}`. **You must call sbatch from a writable directory to ensure the successful creation of the log file. Otherwise, the job will crash immediately and without any messages.**. You can run the command again without the `--get` flag to see the files that are already downloaded. ```{warning} Files in the {file}`scratch` directory where {file}`get_dyamond_summer` downloads to will be deleted after two weeks without access. Download and process your data promptly. ``` ```{note} `get_dyamond_summer` requires regular expressions to search for files. Please use regex syntax when searching for patterns. For example, instead of only an asterisk (`*`), use a dot before (`.*`) to search for any characters containing a specific string. ```