Working with Intake-ESM and the Tape archive#
The combination of intake and the tape archive is still very experimental. Don’t be surprised if things don’t work as described. Simply ask for help using the usual channels…
As disk space is limited, regularly, files have to be moved to the tape archive. In the Intake-ESM catalog, these files then get
slk:// URIs. To avoid multiple local copies of these files, we currently use a central cache directory for the downloads, and remove unused files from it on a regular basis.
You need to be in project 1153, to download files to the central cache directory. Join via luv.
To use the tape system, you need to authenticate with slk on a monthly basis. Start a shell and, run
module load slk slk login
Otherwise you will receive slk error messages along the lines of
... Please check your SLK config file: /home/.../.slk/config.json ...
To provide access to these files, there is the slkspec python library.
When working in python, you need to either install slkspec yourself (and keep it up to date), or load it from a local copy by adding
module use /work/k20200/k202134/hsm-tools/outtake/module module load hsm-tools/unstable
~/.kernel_env file (create it if necessary). This should usually contain an up-to-date version of slkspec and friends. In your python code, you can then use
to load the necessary modules.
Once you have narrowed down your search to the data you actually want to use (see other examples for reference), you can either use the usual
search_result.to_dataset_dict() call, which will then trigger downloads, or use
outtake.get(search_result) to make the downloading a bit more explicit and separate it from the remainder of the processing. In either case, files that are not in the central cache will be downloaded to the central cache.