Working with Intake-ESM and the Tape archive#
Warning
The combination of intake and the tape archive is still very experimental. Don’t be surprised if things don’t work as described. Simply ask for help using the usual channels…
As disk space is limited, regularly, files have to be moved to the tape archive. In the Intake-ESM catalog, these files then get slk://
URIs. To avoid multiple local copies of these files, we currently use a central cache directory for the downloads, and remove unused files from it on a regular basis.
Tip
You need to be in project 1153, to download files to the central cache directory. Join via luv.
Tip
To use the tape system, you need to authenticate with slk on a monthly basis. Start a shell and, run
module load slk
slk login
Otherwise you will receive slk error messages along the lines of
... Please check your SLK config file:
/home/.../.slk/config.json ...
To provide access to these files, there is the slkspec python library.
When working in the shell on levante, you can use find_files to access the data as described on the find_files page.
When working in python, you need to either install slkspec yourself (and keep it up to date), or load it from a local copy by adding
module use /work/k20200/k202134/hsm-tools/outtake/module
module load hsm-tools/unstable
to your ~/.kernel_env
file (create it if necessary). This should usually contain an up-to-date version of slkspec and friends. In your python code, you can then use
import outtake
to load the necessary modules.
Once you have narrowed down your search to the data you actually want to use (see other examples for reference), you can either use the usual search_result.to_dataset_dict()
call, which will then trigger downloads, or use outtake.get(search_result)
to make the downloading a bit more explicit and separate it from the remainder of the processing. In either case, files that are not in the central cache will be downloaded to the central cache.