Dask chunks and best practices#

This notebook is meant to be used in DKRZ’s Jupyterhub with the python3 unstable kernel.

We aim to understand

  • the different types of chunks, especially

    • storage chunks

      • The smallest unit of binary data that is stored within a climate data file (zarr, netcdf, grib)

    • dask chunks

      • The chunks that dask uses inside a dask array. The smallest unit for which dask executes a workflow at once.

  • the best practices recommended by dask.

We use the following packages which needs to be in your environment/kernel:

[1]:
import intake
import xarray as xr
import hvplot.xarray
import numpy as np
from xhistogram.xarray import histogram
import dask
from dask.distributed import performance_report
from distributed import Client
import time
import glob