Selecting and Reindexing of Area of Interest

The elegant way of dealing with large output

Before working through this script, it is helpful to have had a look intoTriangular Meshes and Basic Plottingto get a basic understanding of plotting on a triangular basis.

Three-dimensional global ICON output for example requires many times more memory than two-dimensional output. The handling of such large amounts of data can very quickly lead to the exhaustion of the given memory. If you are sitting right next to a supercomputer, you are tempted to just request more RAM and go for it. However, there are elegant solutions besides the powerful one and since a lot of RAM also means a lot of electricity and a lot of coolant, the motivation for an elegant way is also to save limited resources.

In many cases, we rarely look at the complete global output, but rather at a specific, selected area. It is therefore advisable to cut out only this area. For this purpose it is very helpful to reduce the grid information from the global grid file to the area of interest but in such a way that the indexing makes sense starting at 0 and counting up continuously. The advantage is that we generate a new local grid-file, which looks like the global grid-file but is much smaller in terms of storage capacity and therefore easier and faster to handle.

So let’s do something for the climate 🌍 and first of all load the necessary libraries and the global grid-file:

[1]:
import xarray as xr
import numpy as np

Importing the Grid-File

[2]:
grid = xr.open_dataset(
    "/work/mh0287/k203123/Dyamond++/icon-aes-dyw2/experiments/dpp0029/icon_grid_0015_R02B09_G.nc"
)
grid
[2]:
<xarray.Dataset>
Dimensions:                         (cell: 20971520, nv: 3, vertex: 10485762,
                                     ne: 6, edge: 31457280, no: 4, nc: 2,
                                     max_stored_decompositions: 4, two_grf: 2,
                                     cell_grf: 14, max_chdom: 1, edge_grf: 24,
                                     vert_grf: 13)
Coordinates:
    clon                            (cell) float64 ...
    clat                            (cell) float64 ...
    vlon                            (vertex) float64 ...
    vlat                            (vertex) float64 ...
    elon                            (edge) float64 ...
    elat                            (edge) float64 ...
Dimensions without coordinates: cell, nv, vertex, ne, edge, no, nc,
                                max_stored_decompositions, two_grf, cell_grf,
                                max_chdom, edge_grf, vert_grf
Data variables: (12/91)
    clon_vertices                   (cell, nv) float64 ...
    clat_vertices                   (cell, nv) float64 ...
    vlon_vertices                   (vertex, ne) float64 ...
    vlat_vertices                   (vertex, ne) float64 ...
    elon_vertices                   (edge, no) float64 ...
    elat_vertices                   (edge, no) float64 ...
    ...                              ...
    edge_dual_normal_cartesian_x    (edge) float64 ...
    edge_dual_normal_cartesian_y    (edge) float64 ...
    edge_dual_normal_cartesian_z    (edge) float64 ...
    cell_circumcenter_cartesian_x   (cell) float64 ...
    cell_circumcenter_cartesian_y   (cell) float64 ...
    cell_circumcenter_cartesian_z   (cell) float64 ...
Attributes: (12/43)
    title:                    ICON grid description
    institution:              Max Planck Institute for Meteorology/Deutscher ...
    source:                   git@git.mpimet.mpg.de:GridGenerator.git
    revision:                 d00fcac1f61fa16c686bfe51d1d8eddd09296cb5
    date:                     20180529 at 222250
    user_name:                Rene Redler (m300083)
    ...                       ...
    topography:               modified SRTM30
    subcentre:                1
    number_of_grid_used:      15
    history:                  Thu Aug 16 11:05:44 2018: ncatted -O -a ICON_gr...
    ICON_grid_file_uri:       http://icon-downloads.mpimet.mpg.de/grids/publi...
    NCO:                      netCDF Operators version 4.7.5 (Homepage = http...

Since Max Planck was an important person in science, we will investigate the area around his birthplace. According to this wiki (sorry, only available in german) of the city of Kiel, he was born in a house (which unfortunately no longer exists) under these coordinates: 54.32 N, 10.13 E. We take this point as center and choose a 0.5°x0.5° rectangular window around it.

[3]:
max_plancks_birthplace_x, max_plancks_birthplace_y = np.array([10.13, 54.32])

left_bound = 10.13 - 0.25
right_bound = 10.13 + 0.25
top_bound = 54.32 + 0.25
bottom_bound = 54.32 - 0.25

Let’s see if we’re right:

[4]:
%matplotlib inline
import matplotlib.pylab as plt
from matplotlib.patches import Rectangle
import cartopy.crs as ccrs

ax = plt.axes(projection=ccrs.PlateCarree(central_longitude=0))
ax.coastlines()
ax.set_extent([left_bound - 1, right_bound + 1, bottom_bound - 1, top_bound + 1])
ax.plot(max_plancks_birthplace_x, max_plancks_birthplace_y, "-ro")
ax.add_patch(
    Rectangle(
        (left_bound, bottom_bound),
        np.abs(np.diff([left_bound, right_bound])),
        np.abs(np.diff([bottom_bound, top_bound])),
        edgecolor="darkgreen",
        facecolor="none",
        lw=3,
    )
)
gl = ax.gridlines(draw_labels=True, x_inline=False, y_inline=False)
ax.set_title("Max Planck's birthplace")

plt.show()
/sw/spack-levante/mambaforge-4.11.0-0-Linux-x86_64-sobz6z/lib/python3.9/site-packages/numpy/core/_asarray.py:102: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return array(a, dtype, copy=False, order=order)
../../_images/Processing_playing_with_triangles_cutting_grid_window_9_1.png

… that looks good ! Now we select the corresponding cells within the dark green window and display the indices.

Cells

[5]:
window_cell = (
    (grid.clat >= np.deg2rad(bottom_bound))
    & (grid.clat <= np.deg2rad(top_bound))
    & (grid.clon >= np.deg2rad(left_bound))
    & (grid.clon <= np.deg2rad(right_bound))
).values

(window_cell_indices,) = np.where(window_cell)
window_cell_indices
[5]:
array([4282376, 4282377, 4282378, 4282400, 4282401, 4282402, 4282403,
       4282404, 4282405, 4282407, 4282408, 4282409, 4282410, 4282411,
       4282412, 4282413, 4282414, 4282415, 4282420, 4282421, 4282422,
       4282480, 4282481, 4282482, 4282483, 4282484, 4282485, 4282486,
       4282487, 4282488, 4282489, 4282490, 4282491, 4282493, 4282497,
       4282501, 4282508, 4282510, 4282511, 4282512, 4282513, 4282514,
       4282515, 4282516, 4282517, 4282518, 4282519, 4282520, 4282521,
       4282522, 4282523, 4282527, 4282558, 4282936, 4282938, 4282939,
       4282941, 4282960, 4282961, 4282962, 4282967, 4282968, 4282969,
       4282970, 4282971, 4282972, 4282973, 4283128, 4283129, 4283130])

What we have shown above are all the indices of the cells within the green window of the native grid. Since we selected them via np.where() we obtained the indices in 0-based python thinking. We now do the same for the vertices and edges that appear in our selected window:

Vertices

We select with the function .isel() and the vertex_of_cell information from the grid all the vertices of the cells we have cut out one step before. We also sort the indices of the vertices and delete all duplicates vianp.unique(). Since the ICON code is written in Fortran, i.e. the integer-values are 1-based, we subtract 1 to get back to python thinking, hence 0-based.

[6]:
window_vertex_indices = (
    np.unique(grid.vertex_of_cell.isel(cell=window_cell_indices).values) - 1
)
window_vertex_indices
[6]:
array([2144929, 2144932, 2144933, 2144935, 2144937, 2144938, 2144939,
       2144953, 2144954, 2144955, 2144956, 2144957, 2144958, 2144959,
       2144960, 2144961, 2144962, 2144963, 2144968, 2144978, 2144983,
       2144984, 2145002, 2145003, 2145004, 2145005, 2145006, 2145007,
       2145008, 2145010, 2145011, 2145014, 2145020, 2145021, 2145022,
       2145023, 2145024, 2145025, 2145026, 2145217, 2145218, 2145220,
       2145222, 2145223, 2145224, 2145231, 2145237, 2145238, 2145239,
       2145286], dtype=int32)

Edges

Same as for the vertices, we select with the function .isel() and the edge_of_cell information from the grid all the edges of the cells we have cut out two steps before. We also sort the indices of the edges and delete all duplicates via np.unique(). We subtract 1 to get back to python thinking.

[7]:
window_edge_indices = (
    np.unique(grid.edge_of_cell.isel(cell=window_cell_indices).values) - 1
)
window_edge_indices
[7]:
array([6427302, 6427311, 6427312, 6427313, 6427314, 6427315, 6427316,
       6427317, 6427318, 6427352, 6427353, 6427354, 6427355, 6427356,
       6427357, 6427358, 6427359, 6427360, 6427361, 6427362, 6427363,
       6427365, 6427366, 6427367, 6427368, 6427369, 6427370, 6427371,
       6427372, 6427373, 6427374, 6427375, 6427376, 6427377, 6427385,
       6427387, 6427388, 6427389, 6427390, 6427425, 6427426, 6427482,
       6427483, 6427484, 6427485, 6427486, 6427487, 6427488, 6427489,
       6427490, 6427491, 6427492, 6427493, 6427494, 6427495, 6427496,
       6427497, 6427498, 6427499, 6427500, 6427501, 6427504, 6427507,
       6427508, 6427513, 6427516, 6427527, 6427528, 6427529, 6427531,
       6427532, 6427533, 6427534, 6427535, 6427536, 6427537, 6427538,
       6427539, 6427540, 6427541, 6427542, 6427543, 6427544, 6427545,
       6427546, 6427547, 6427548, 6427549, 6427550, 6427553, 6427602,
       6428151, 6428152, 6428160, 6428161, 6428162, 6428164, 6428165,
       6428166, 6428167, 6428170, 6428202, 6428203, 6428204, 6428205,
       6428206, 6428207, 6428208, 6428213, 6428214, 6428215, 6428216,
       6428217, 6428218, 6428219, 6428410, 6428418, 6428419, 6428420],
      dtype=int32)

Constructing New Grid with Selected Cells, Vertices and Edges

Wow, that’s already great ! We have received a lot of information in the form of indices in individual arrays about our green window. We merge them into one dataset so that everything is compact:

[8]:
selected_indices = xr.Dataset(
    {
        "cell": ("cell", window_cell_indices),
        "vertex": ("vertex", window_vertex_indices),
        "edge": ("edge", window_edge_indices),
    }
)

selected_indices
[8]:
<xarray.Dataset>
Dimensions:  (cell: 70, vertex: 50, edge: 119)
Coordinates:
  * cell     (cell) int64 4282376 4282377 4282378 ... 4283128 4283129 4283130
  * vertex   (vertex) int32 2144929 2144932 2144933 ... 2145238 2145239 2145286
  * edge     (edge) int32 6427302 6427311 6427312 ... 6428418 6428419 6428420
Data variables:
    *empty*

It could be that we need more variables for future calculations. Therefore we create a dictionary with further interesting variables, which we reindex as a precaution.

[9]:
vars_to_renumber = {
    "cell": [
        "adjacent_cell_of_edge",
        "cells_of_vertex",
        "neighbor_cell_index",
        "cell_index",
    ],
    "vertex": ["vertex_of_cell", "edge_vertices", "vertices_of_vertex"],
    "edge": ["edge_of_cell", "edges_of_vertex"],
}

We now come to the heart of this script: the reindexing. Several things happen here, which is why it is best to define a function. This function reindex_grid() needs 3 inputs and returns 1 output. The inputs are the original, complete grid and the parts of the grid that should be reindexed, hence indices and vars_to_renumber. The indices define the cells, vertices and edges. The vars_to_renumber are all variables that we are still interested in and can be composed of cells, vertices and edges. Output of our function will be a new_grid containing all indices and variables for our green window around the birthplace of Max Planck in such a way that everything starts counting at 0.

Let’s go through it step by step:

Line 1: We define a function wiht 3 input variables.

Line 2: We define as new_grid the area in the old grid that contains the indices we selected at the beginning of this script for the cells, vertices and edges. For this we use the .load() function, which loads the 17GB file into memory and processes it there: this is a little faster.

Line 3: We open a for-loop that accesses the coordinates and the entries of the array selected_indices.

Line 4: We open an array, which is only filled with -2 (exceptional value like nan but as an integer) in the original, old grid length and call it renumbering.

Line 5: We start counting at 0 at the index positions of the long renumbering array, which belong to the indices of the selected dark green area, until we have reached the length of the short, previously selected array. So what we get is an array with the length of the original grid dimension (20971520 cells, 10485762 vertices, 31457280 edges), which contains a value other than -2 only at that position within the array which is inside the dark green selected window.

Line 6: we open another for loop over the remaining variables vars_to_renumber to be reindexed.

Line 7: For the variables stored in the dictionary of a particular dimension (cell, vertex, edge), we take one item and access it in new_grid (line 2) and subtract 1 to work in python 0-based system; this is done in the square brackets on the right side of the equal sign. We use this to select the valid position in the renumbering array but in total we add 1 to output the new_grid in the same 1-based thinking as the original grid.

Line 8: We output the new_grid.

[10]:
def reindex_grid(grid, indices, vars_to_renumber):
    new_grid = grid.load().isel(
        cell=indices.cell, vertex=indices.vertex, edge=indices.edge
    )
    for dim, idx in indices.coords.items():
        renumbering = np.full(grid.dims[dim], -2, dtype="int")
        renumbering[idx] = np.arange(len(idx))
        for name in vars_to_renumber[dim]:
            new_grid[name].data = renumbering[new_grid[name].data - 1] + 1
    return new_grid

After long theory we want to use our function and create the actual new_grid:

[11]:
new_grid = reindex_grid(grid, selected_indices, vars_to_renumber)
new_grid
[11]:
<xarray.Dataset>
Dimensions:                         (cell: 70, nv: 3, vertex: 50, ne: 6,
                                     edge: 119, no: 4, nc: 2,
                                     max_stored_decompositions: 4, two_grf: 2,
                                     cell_grf: 14, max_chdom: 1, edge_grf: 24,
                                     vert_grf: 13)
Coordinates:
    clon                            (cell) float64 0.1804 0.1805 ... 0.174
    clat                            (cell) float64 0.9453 0.946 ... 0.9488
    vlon                            (vertex) float64 0.1815 0.1807 ... 0.1717
    vlat                            (vertex) float64 0.9456 0.9466 ... 0.9482
    elon                            (edge) float64 0.1811 0.1814 ... 0.1724
    elat                            (edge) float64 0.9461 0.9471 ... 0.9487
  * cell                            (cell) int64 4282376 4282377 ... 4283130
  * vertex                          (vertex) int32 2144929 2144932 ... 2145286
  * edge                            (edge) int32 6427302 6427311 ... 6428420
Dimensions without coordinates: nv, ne, no, nc, max_stored_decompositions,
                                two_grf, cell_grf, max_chdom, edge_grf, vert_grf
Data variables: (12/91)
    clon_vertices                   (cell, nv) float64 0.1794 0.1802 ... 0.173
    clat_vertices                   (cell, nv) float64 0.9457 0.9446 ... 0.9492
    vlon_vertices                   (vertex, ne) float64 9.969e+36 ... 9.969e+36
    vlat_vertices                   (vertex, ne) float64 9.969e+36 ... 9.969e+36
    elon_vertices                   (edge, no) float64 0.1807 0.1805 ... 0.1728
    elat_vertices                   (edge, no) float64 0.9466 0.946 ... 0.9485
    ...                              ...
    edge_dual_normal_cartesian_x    (edge) float64 -0.6706 -0.7373 ... -0.7362
    edge_dual_normal_cartesian_y    (edge) float64 -0.5071 0.4974 ... 0.4973
    edge_dual_normal_cartesian_z    (edge) float64 0.5414 0.4572 ... 0.4589
    cell_circumcenter_cartesian_x   (cell) float64 0.576 0.5755 ... 0.5739
    cell_circumcenter_cartesian_y   (cell) float64 0.105 0.105 ... 0.1003 0.1009
    cell_circumcenter_cartesian_z   (cell) float64 0.8107 0.8111 ... 0.8127
Attributes: (12/43)
    title:                    ICON grid description
    institution:              Max Planck Institute for Meteorology/Deutscher ...
    source:                   git@git.mpimet.mpg.de:GridGenerator.git
    revision:                 d00fcac1f61fa16c686bfe51d1d8eddd09296cb5
    date:                     20180529 at 222250
    user_name:                Rene Redler (m300083)
    ...                       ...
    topography:               modified SRTM30
    subcentre:                1
    number_of_grid_used:      15
    history:                  Thu Aug 16 11:05:44 2018: ncatted -O -a ICON_gr...
    ICON_grid_file_uri:       http://icon-downloads.mpimet.mpg.de/grids/publi...
    NCO:                      netCDF Operators version 4.7.5 (Homepage = http...

Let’s see if everything worked as we wanted it and choose reindexed variable like .vertex_of_cell and sort it:

[12]:
np.unique(new_grid.vertex_of_cell)
[12]:
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

Voilà ! That worked and we are done and have now built a new grid-file tailored to the area of our interest, which was provided with a new indexing.

For further processing, the two datasets selected_indices and new_grid can be saved to have them quickly accessible for further calculations.

[13]:
selected_indices.to_netcdf(
    f"selected_indices_region_{bottom_bound}-{top_bound}_{left_bound}-{right_bound}.nc",
    mode="w",
)
new_grid.to_netcdf(
    f"new_grid_region_{bottom_bound}-{top_bound}_{left_bound}-{right_bound}.nc",
    mode="w",
)