(hierarchical_healpix_output)=
# Hierarchical HEALPix output

We write output in multiple resolutions (hierarchical) on the HEALPix
grid in order to **simplify** and **speed up analysis**. In order to
take advantage of this particular output format, it is necessary to know
about some particular properties of this representation. As HEALPix and
hierarchical data structures are in principle independent, we'll
introduce them separately. But we'll see that both fit together quite
snugly.

```{toctree}
---
maxdepth: 1
caption: Usage examples
---
healpix_starter
healpix_cartopy
plot-difference
time-space
regridding
pyicon_healpix
ocean_averaging
Land_sea_mask
joanne_comparison
lonlat_remap
limited_area_healpix
```

## Hierarchical Output

In this context, hierarchical output means multiple copies of the output
at different resolutions.

A combination of hierarchical output and chunked storage allows to
access data such that the amount of data required to load scales with
the size of the screen, plot or analysis domain instead of with the size
of the model. In general, this enables fast analysis of high resolution
model output.

A spatial hierarchy can probably be best illustrated with how online map
services work:

<img src="hierarchy.png" class="align-center" style="width: 40%;" alt="image" />

The map shows global temperature distribution[^1], at multiple
resolutions. The top level (`z = 0`, whereby $`z`$ denotes the zoom),
contains only a single image (data chunk) at coarse resolution. The
second level (`z= 1`), contains four images (chunks) at double the
horizontal resolution. The third and any further level continues in the
same manner.

With this structure, a user (or a library) can select a `zoom` level
appropriate for the current region of interest (e.g. `12` for a map of
Hamburg or `5` for central Europe or even `0` for global means), but no
matter of the chosen `zoom` level, the amount of data required to load
stays approximately constant.

While hierarchical datasets can dramatically speed up analysis, it's
interesting to note that **not** much more storage space is required: as
visible in the figure above, all coarser levels combined are much
smaller than only the finest level.

A similar hierarchical structure can be built for temporal aggregation:
if data is stored e.g. in 30 minute, 3 hour and daily intervals, one can
select an aggregation interval appropriate for the chosen form of
analysis. By e.g. using daily data instead of 30 minute data, 48 time
less data has to be loaded.

## HEALPix

The HEALPix ([Górski et al.,
2004](https://iopscience.iop.org/article/10.1086/427976/pdf)) grid is a
pixelation of a sphere, designed to support **hierarchical structure**,
be (exactly) **equal area** and **iso-Latitude**. It's also based on
**quadrilaterals**. To get a first impression of the grid, you might
want to check out a [grid
viewer](https://aholinch.github.io/hp3js.html). The grid is defined by a
set of equations, which often can make grid point querying operations
very fast, e.g. there is a function to *compute* (instead of look up)
the grid cell number given a pair of longitude and latitude. These
functions are implemented in the [HEALPix Software
Package](https://healpix.sourceforge.io/index.php) which comes with
bindings in various languages, including the [healpy Python
package](https://healpy.readthedocs.io).

<img src="gorski_f1.jpg" class="align-center" style="width:25.0%"
alt="image" />

### HEALPix spatial resolution

HEALPix grids comprise $12\cdot 4^z$ cells with $z$ a non-negative
integer denoting the `zoom` level (or nest). The surface area of the
earth is $510\times10^6$ square kilometers. A regular $1^\circ$
lat-lon grid would correspond to a grid cell with an area of about
$111\times 111$ km $^2$ at the equator, or 79 km of longitude at
$45^\circ$ of latitude. An equal area grid with the same cell size as
a $1^\circ$ lat-lon grid at the equator would have 41156 grid cells.
The closest HEALPix grid has $z=6$ which consists of 49152 tiles.

:::{table} HEALPix equivalent resolution table
| HEALPix $z$ | $\\sqrt{A}$ / km | number of latitudes | eff. ang. resolution |
|-------------|------------------|---------------------|----------------------|
| 0           | 6519             | 3                   | $60^\circ$           |
| 6           | 101.9            | 255                 | $0.7^\circ$          |
| 8           | 25.5             | 1023                | $0.18^\circ$         |
| 10          | 6.4              | 4095                | $0.04^\circ$         |
| 12          | 1.6              | 16383               | $0.01^\circ$         |
:::

ICON uses R$n$B$m$ grids, corresponding to $20 n^2 4^m$ cells. We
usually use the $n=2$ family of grids for which there are
$20 \cdot 4^{m+1}$ points. This means that the R2B9 grid has
20,971,520 cells and, more generally, that the R2B$m$ family of grids
have 5/3rds as many points, or $\sqrt{3/5}$ the lengthscale, as a
HEALPix grid with $z=m+1$. Hence the $z=10$ grid is often used as
the base grid for mapping from R2B9.

:::{admonition} Rules of thumb
:class: tip

- For many global applications, especially when comparing to standard
  data sets, a $z=5$ grid will suffice
- For zonal averages choosing $z=6$ gives better than 1 $^\circ$
  resolution
- For near native resolution, choose $z=m+1$, i.e. $z=10$ for the
  NextGEMS cycle 3 runs.
:::

[^1]: [1972-12-11T04:00:00 of the ICON Apollo
    simulation](https://ican.pages.gwdg.de/icon-tiler/#index=https%3A%2F%2Fswift.dkrz.de%2Fv1%2Fdkrz_a973e394-5f24-4f4d-8bbf-1a83bd387ccb%2FApollo17%2Ftiles%2Fl7%2Findex.json&v=ts&t=1972-12-11T04%3A00%3A00&lat=-1.4061088354351594&lon=0.703125&z=2)