Define the encoding of a Zarr store#
[57]:
import intake
import numcodecs
import numpy as np
import xarray as xr
cat = intake.open_catalog("https://tcodata.mpimet.mpg.de/internal.yaml")
ds = cat.HIFS(datetime="2024-09-01").to_dask()
ds
[57]:
<xarray.Dataset> Size: 7GB Dimensions: (time: 64, cell: 196608, crs: 1, level: 13) Coordinates: * crs (crs) float64 8B nan * level (level) int64 104B 50 100 150 200 250 300 ... 600 700 850 925 1000 * time (time) datetime64[ns] 512B 2024-09-01T03:00:00 ... 2024-09-11 Dimensions without coordinates: cell Data variables: (12/39) 100u (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> 100v (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> 10u (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> 10v (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> 2d (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> 2t (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> ... ... tp (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> ttr (time, cell) float32 50MB dask.array<chunksize=(6, 16384), meta=np.ndarray> u (time, level, cell) float32 654MB dask.array<chunksize=(6, 1, 16384), meta=np.ndarray> v (time, level, cell) float32 654MB dask.array<chunksize=(6, 1, 16384), meta=np.ndarray> vo (time, level, cell) float32 654MB dask.array<chunksize=(6, 1, 16384), meta=np.ndarray> w (time, level, cell) float32 654MB dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- time: 64
- cell: 196608
- crs: 1
- level: 13
- crs(crs)float64nan
- grid_mapping_name :
- healpix
- healpix_nside :
- 128
- healpix_order :
- nest
array([nan])
- level(level)int6450 100 150 200 ... 700 850 925 1000
- axis :
- Z
- long_name :
- Air pressure at model level
- positive :
- down
- standard_name :
- air_pressure
- units :
- hPa
array([ 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])
- time(time)datetime64[ns]2024-09-01T03:00:00 ... 2024-09-11
- axis :
- T
array(['2024-09-01T03:00:00.000000000', '2024-09-01T06:00:00.000000000', '2024-09-01T09:00:00.000000000', '2024-09-01T12:00:00.000000000', '2024-09-01T15:00:00.000000000', '2024-09-01T18:00:00.000000000', '2024-09-01T21:00:00.000000000', '2024-09-02T00:00:00.000000000', '2024-09-02T03:00:00.000000000', '2024-09-02T06:00:00.000000000', '2024-09-02T09:00:00.000000000', '2024-09-02T12:00:00.000000000', '2024-09-02T15:00:00.000000000', '2024-09-02T18:00:00.000000000', '2024-09-02T21:00:00.000000000', '2024-09-03T00:00:00.000000000', '2024-09-03T03:00:00.000000000', '2024-09-03T06:00:00.000000000', '2024-09-03T09:00:00.000000000', '2024-09-03T12:00:00.000000000', '2024-09-03T15:00:00.000000000', '2024-09-03T18:00:00.000000000', '2024-09-03T21:00:00.000000000', '2024-09-04T00:00:00.000000000', '2024-09-04T03:00:00.000000000', '2024-09-04T06:00:00.000000000', '2024-09-04T09:00:00.000000000', '2024-09-04T12:00:00.000000000', '2024-09-04T15:00:00.000000000', '2024-09-04T18:00:00.000000000', '2024-09-04T21:00:00.000000000', '2024-09-05T00:00:00.000000000', '2024-09-05T03:00:00.000000000', '2024-09-05T06:00:00.000000000', '2024-09-05T09:00:00.000000000', '2024-09-05T12:00:00.000000000', '2024-09-05T15:00:00.000000000', '2024-09-05T18:00:00.000000000', '2024-09-05T21:00:00.000000000', '2024-09-06T00:00:00.000000000', '2024-09-06T03:00:00.000000000', '2024-09-06T06:00:00.000000000', '2024-09-06T09:00:00.000000000', '2024-09-06T12:00:00.000000000', '2024-09-06T15:00:00.000000000', '2024-09-06T18:00:00.000000000', '2024-09-06T21:00:00.000000000', '2024-09-07T00:00:00.000000000', '2024-09-07T06:00:00.000000000', '2024-09-07T12:00:00.000000000', '2024-09-07T18:00:00.000000000', '2024-09-08T00:00:00.000000000', '2024-09-08T06:00:00.000000000', '2024-09-08T12:00:00.000000000', '2024-09-08T18:00:00.000000000', '2024-09-09T00:00:00.000000000', '2024-09-09T06:00:00.000000000', '2024-09-09T12:00:00.000000000', '2024-09-09T18:00:00.000000000', '2024-09-10T00:00:00.000000000', '2024-09-10T06:00:00.000000000', '2024-09-10T12:00:00.000000000', '2024-09-10T18:00:00.000000000', '2024-09-11T00:00:00.000000000'], dtype='datetime64[ns]')
- 100u(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- 100 metre U wind component
- standard_name :
- eastward_wind
- type :
- forecast
- units :
- m s**-1
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - 100v(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- 100 metre V wind component
- standard_name :
- northward_wind
- type :
- forecast
- units :
- m s**-1
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - 10u(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- 10 metre U wind component
- standard_name :
- eastward_wind
- type :
- forecast
- units :
- m s**-1
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - 10v(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- 10 metre V wind component
- standard_name :
- northward_wind
- type :
- forecast
- units :
- m s**-1
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - 2d(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- 2 metre dewpoint temperature
- standard_name :
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - 2t(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- 2 metre temperature
- standard_name :
- air_temperature
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - asn(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Snow albedo
- standard_name :
- type :
- forecast
- units :
- (0 - 1)
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - cape(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- entireAtmosphere
- long_name :
- Convective available potential energy
- standard_name :
- type :
- forecast
- units :
- J kg**-1
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - d(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Divergence
- standard_name :
- divergence_of_wind
- type :
- forecast
- units :
- s**-1
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - gh(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Geopotential height
- standard_name :
- geopotential_height
- type :
- forecast
- units :
- gpm
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - lsm(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Land-sea mask
- standard_name :
- land_binary_mask
- type :
- forecast
- units :
- (0 - 1)
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - mn2t6(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- Minimum temperature at 2 metres in the last 6 hours
- standard_name :
- air_temperature
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - msl(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- meanSea
- long_name :
- Mean sea level pressure
- standard_name :
- air_pressure_at_mean_sea_level
- type :
- forecast
- units :
- Pa
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - mx2t6(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- heightAboveGround
- long_name :
- Maximum temperature at 2 metres in the last 6 hours
- standard_name :
- air_temperature
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - q(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Specific humidity
- standard_name :
- specific_humidity
- type :
- forecast
- units :
- kg kg**-1
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - r(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Relative humidity
- standard_name :
- relative_humidity
- type :
- forecast
- units :
- %
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - ro(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Runoff
- standard_name :
- type :
- forecast
- units :
- m
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - skt(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Skin temperature
- standard_name :
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - sp(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Surface pressure
- standard_name :
- surface_air_pressure
- type :
- forecast
- units :
- Pa
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - ssr(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Surface net short-wave (solar) radiation
- standard_name :
- surface_net_downward_shortwave_flux
- type :
- forecast
- units :
- J m**-2
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - ssrd(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Surface short-wave (solar) radiation downwards
- standard_name :
- surface_downwelling_shortwave_flux_in_air
- type :
- forecast
- units :
- J m**-2
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - st(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Soil temperature
- standard_name :
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - stl2(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Soil temperature level 2
- standard_name :
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - stl3(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Soil temperature level 3
- standard_name :
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - stl4(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Soil temperature level 4
- standard_name :
- type :
- forecast
- units :
- K
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - str(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Surface net long-wave (thermal) radiation
- standard_name :
- surface_net_upward_longwave_flux
- type :
- forecast
- units :
- J m**-2
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - strd(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Surface long-wave (thermal) radiation downwards
- standard_name :
- type :
- forecast
- units :
- J m**-2
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - swvl1(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Volumetric soil water layer 1
- standard_name :
- type :
- forecast
- units :
- m**3 m**-3
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - swvl2(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Volumetric soil water layer 2
- standard_name :
- type :
- forecast
- units :
- m**3 m**-3
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - swvl3(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Volumetric soil water layer 3
- standard_name :
- type :
- forecast
- units :
- m**3 m**-3
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - swvl4(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- depthBelowLandLayer
- long_name :
- Volumetric soil water layer 4
- standard_name :
- type :
- forecast
- units :
- m**3 m**-3
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - t(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Temperature
- standard_name :
- air_temperature
- type :
- forecast
- units :
- K
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - tcwv(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- entireAtmosphere
- long_name :
- Total column vertically-integrated water vapour
- standard_name :
- lwe_thickness_of_atmosphere_mass_content_of_water_vapor
- type :
- forecast
- units :
- kg m**-2
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - tp(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- surface
- long_name :
- Total precipitation
- standard_name :
- type :
- forecast
- units :
- m
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - ttr(time, cell)float32dask.array<chunksize=(6, 16384), meta=np.ndarray>
- levtype :
- nominalTop
- long_name :
- Top net long-wave (thermal) radiation
- standard_name :
- toa_outgoing_longwave_flux
- type :
- forecast
- units :
- J m**-2
Array Chunk Bytes 48.00 MiB 384.00 kiB Shape (64, 196608) (6, 16384) Dask graph 132 chunks in 2 graph layers Data type float32 numpy.ndarray - u(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- U component of wind
- standard_name :
- eastward_wind
- type :
- forecast
- units :
- m s**-1
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - v(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- V component of wind
- standard_name :
- northward_wind
- type :
- forecast
- units :
- m s**-1
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - vo(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Vorticity (relative)
- standard_name :
- atmosphere_relative_vorticity
- type :
- forecast
- units :
- s**-1
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray - w(time, level, cell)float32dask.array<chunksize=(6, 1, 16384), meta=np.ndarray>
- levtype :
- isobaricInhPa
- long_name :
- Vertical velocity
- standard_name :
- lagrangian_tendency_of_air_pressure
- type :
- forecast
- units :
- Pa s**-1
Array Chunk Bytes 624.00 MiB 384.00 kiB Shape (64, 13, 196608) (6, 1, 16384) Dask graph 1716 chunks in 2 graph layers Data type float32 numpy.ndarray
- crsPandasIndex
PandasIndex(Index([nan], dtype='float64', name='crs'))
- levelPandasIndex
PandasIndex(Index([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000], dtype='int64', name='level'))
- timePandasIndex
PandasIndex(DatetimeIndex(['2024-09-01 03:00:00', '2024-09-01 06:00:00', '2024-09-01 09:00:00', '2024-09-01 12:00:00', '2024-09-01 15:00:00', '2024-09-01 18:00:00', '2024-09-01 21:00:00', '2024-09-02 00:00:00', '2024-09-02 03:00:00', '2024-09-02 06:00:00', '2024-09-02 09:00:00', '2024-09-02 12:00:00', '2024-09-02 15:00:00', '2024-09-02 18:00:00', '2024-09-02 21:00:00', '2024-09-03 00:00:00', '2024-09-03 03:00:00', '2024-09-03 06:00:00', '2024-09-03 09:00:00', '2024-09-03 12:00:00', '2024-09-03 15:00:00', '2024-09-03 18:00:00', '2024-09-03 21:00:00', '2024-09-04 00:00:00', '2024-09-04 03:00:00', '2024-09-04 06:00:00', '2024-09-04 09:00:00', '2024-09-04 12:00:00', '2024-09-04 15:00:00', '2024-09-04 18:00:00', '2024-09-04 21:00:00', '2024-09-05 00:00:00', '2024-09-05 03:00:00', '2024-09-05 06:00:00', '2024-09-05 09:00:00', '2024-09-05 12:00:00', '2024-09-05 15:00:00', '2024-09-05 18:00:00', '2024-09-05 21:00:00', '2024-09-06 00:00:00', '2024-09-06 03:00:00', '2024-09-06 06:00:00', '2024-09-06 09:00:00', '2024-09-06 12:00:00', '2024-09-06 15:00:00', '2024-09-06 18:00:00', '2024-09-06 21:00:00', '2024-09-07 00:00:00', '2024-09-07 06:00:00', '2024-09-07 12:00:00', '2024-09-07 18:00:00', '2024-09-08 00:00:00', '2024-09-08 06:00:00', '2024-09-08 12:00:00', '2024-09-08 18:00:00', '2024-09-09 00:00:00', '2024-09-09 06:00:00', '2024-09-09 12:00:00', '2024-09-09 18:00:00', '2024-09-10 00:00:00', '2024-09-10 06:00:00', '2024-09-10 12:00:00', '2024-09-10 18:00:00', '2024-09-11 00:00:00'], dtype='datetime64[ns]', name='time', freq=None))
Data types#
Explicitly set the output datatype to single precision float for all float subtypes.
[50]:
def get_dtype(da):
if np.issubdtype(da.dtype, np.floating):
return "float32"
else:
return da.dtype
get_dtype(ds["tcwv"])
[50]:
'float32'
Chunking#
We define multi-dimensional chunks for me efficient data access. We aim at a chunk size of about 1 MB which is a reasonable choice when accessing data via HTTP. Depending on the total size of your dataset, this chunksize may results in millions (!) of individual files, which might cause problems on some file systems.
[51]:
def get_chunks(dimensions):
if "level" in dimensions:
chunks = {
"time": 24,
"cell": 4**5,
"level": 4,
}
else:
chunks = {
"time": 24,
"cell": 4**6,
}
return tuple((chunks[d] for d in dimensions))
get_chunks(ds["tcwv"].dims)
[51]:
(24, 4096)
Compression#
We compress all variables using Zstd into a blosc container. We also enable bit shuffling.
[52]:
def get_compressor():
return numcodecs.Blosc("zstd", shuffle=2)
get_compressor()
[52]:
Blosc(cname='zstd', clevel=5, shuffle=BITSHUFFLE, blocksize=0)
Plug and play#
Finally, we can put the pieces together to define an encoding for the whole dataset. The following function loops over all variables (that are not a dimension) and creates an encoding dictionary.
[53]:
def get_encoding(dataset):
return {
var: {
"compressor": get_compressor(),
"dtype": get_dtype(dataset[var]),
"chunks": get_chunks(dataset[var].dims),
}
for var in dataset.variables
if var not in dataset.dims
}
get_encoding(ds[["t", "2t"]])
[53]:
{'t': {'compressor': Blosc(cname='zstd', clevel=5, shuffle=BITSHUFFLE, blocksize=0),
'dtype': 'float32',
'chunks': (24, 4, 1024)},
'2t': {'compressor': Blosc(cname='zstd', clevel=5, shuffle=BITSHUFFLE, blocksize=0),
'dtype': 'float32',
'chunks': (24, 4096)}}
The encoding dictionary can be passed to the to_zarr()
function. When using dask, make sure that the dask chunks match the selected Zarr chunks. Otherwise the Zarr library will throw an error to prevent multiple dask chunks from writing to the same chunk on disk.
[ ]:
ds.chunk({"time": 24, "level": 4, "cell": -1}).to_zarr(
"test_dataset.zarr", encoding=get_encoding(ds)
)