Resampling to lon-lat grid#
This notebook shows a quick way to resample HEALPix data to a lon/lat grid using nearest neighbors.
Remapping via source indices#
Likely, the easiest method for resampling HEALPix data is to compute which index of the HEALPix grid corresponds to each pixel of the target grid. We’ll take this approach and build a function which computes the HEALPix pixel indices for each lon/lat pair in a rectilinear grid of longitudes and latitudes:
[1]:
import numpy as np
import xarray as xr
import healpy
def get_nn_lon_lat_index(nside, lons, lats):
lons2, lats2 = np.meshgrid(lons, lats)
return xr.DataArray(
healpy.ang2pix(nside, lons2, lats2, nest=True, lonlat=True),
coords=[("lat", lats), ("lon", lons)],
)
Let’s try this function on a coarse HEALPix grid (nside=2
) and for a coarse global lon/lat grid. We’ll obtain an array of cell indices:
[2]:
get_nn_lon_lat_index(2, np.linspace(-180, 180, 10), np.linspace(-90, 90, 5))
[2]:
<xarray.DataArray (lat: 5, lon: 10)> array([[40, 40, 40, 44, 44, 32, 32, 36, 36, 40], [42, 42, 41, 46, 45, 34, 33, 38, 37, 42], [25, 25, 30, 29, 18, 17, 22, 21, 26, 25], [10, 10, 9, 14, 13, 2, 1, 6, 5, 10], [11, 11, 11, 15, 15, 3, 3, 7, 7, 11]]) Coordinates: * lat (lat) float64 -90.0 -45.0 0.0 45.0 90.0 * lon (lon) float64 -180.0 -140.0 -100.0 -60.0 ... 60.0 100.0 140.0 180.0
This doesn’t look too bad (we expect values in the range of \(0\) to \(12 N_\rm{side}^2-1\) or \(0\) to \(47\) in this case). But let’s also check this result visually with a few more pixels:
[3]:
get_nn_lon_lat_index(2, np.linspace(-180, 180, 800), np.linspace(-90, 90, 400)).plot(
figsize=(8, 4)
)
None
Indeed, this shows us the HEALPix pixel numbers on a lon/lat grid as depicted in Górski et al., 2004:
This seems to be correct, so let’s apply it to some model output:
Remapping model output#
Of course, we first need access to some model output:
[4]:
import intake
cat = intake.open_catalog("https://data.nextgems-h2020.eu/catalog.yaml")
When selecting the zoom
level, it’s best to find a level which approximately matches your desired output resolution. Please have a look at the HEALPix intro for a comparison between angular resolution and typical HEALPix zoom
levels. In this case, we have an angular resolution of about 0.05°, so zoom=10
is probably appropriate.
[5]:
zoom = 10
ds = cat.ICON.ngc3028(zoom=zoom, time="PT30M").to_dask()
Now that we have the output available, let’s compute remapping indices for a region including the Amazon and the Caribbean:
[6]:
idx = get_nn_lon_lat_index(
2**zoom, np.linspace(-70, -55, 300), np.linspace(5, 20, 300)
)
We can now use the index to select from the cell
dimension to obtain a new xr.DataArray
with surface temperature on a lon
lat
grid:
[7]:
tas_lon_lat = ds.tas.isel(time=0, cell=idx)
tas_lon_lat
[7]:
<xarray.DataArray 'tas' (lat: 300, lon: 300)> [90000 values with dtype=float32] Coordinates: time datetime64[ns] 2020-01-20T00:30:00 * lat (lat) float64 5.0 5.05 5.1 5.151 5.201 ... 19.85 19.9 19.95 20.0 * lon (lon) float64 -70.0 -69.95 -69.9 -69.85 ... -55.1 -55.05 -55.0 Attributes: cell_methods: time: point component: atmo grid_mapping: crs long_name: temperature in 2m standard_name: air_temperature units: K vgrid: height_2m
As the data is now on a rectangular grid, we can plot it quite easily:
[8]:
tas_lon_lat.plot()
None
Anti-Aliasing#
Nearest-Neighbor remapping can lead to aliasing: along steep gradients (e.g. the temperature difference between land and water along the Amazon river), data is picked seemingly randomly from either side of the gradient (it depends on how the source and target grids fit onto each other on the very fine scale). While this effect likely averages out when analyzing larger areas, it can disturb the local scale. Several methods exist to overcome this problem. A simple approximative way is supersampling, where we first interpolate to a finer grid and then average the interpolated data back to our target grid. Uniform supersampling can be implemented as follows:
[9]:
supersampling = {"lon": 4, "lat": 4}
idx = get_nn_lon_lat_index(
2**zoom,
np.linspace(-70, -55, supersampling["lon"] * 300),
np.linspace(5, 20, supersampling["lat"] * 300),
)
tas_lon_lat_aa = ds.tas.isel(time=0, cell=idx).coarsen(supersampling).mean()
tas_lon_lat_aa.plot()
None
While the output barely changes for uniform areas, regions with gradients apprear much smoother now.
Saving to disk#
Of course, any data remapped this way can be saved to disk by the usual means of xarray I/O methods, including netCDF
and zarr
formats. If data is opened using dask
(remember to use some chunks
definition while opening), it is possible to perform the regridding lazily, chunk by chunk while writing the output.
Selecting zoom level automatically#
If we want a simple automatic way of selecting an appropriate zoom level, we can also compute HEALPix indices for multiple zoom
levels and observe how many unique index values we obtain:
If we have only a few unique values, a single model output pixel will end up in many pixels in the lon/lat projection: the HEALPix resolution is too coarse for the desired lon/lat grid.
If we have as many unique values as lon/lat pixels, every lon/lat pixel will get data from a different model output pixel, but we might skip a bunch of model pixels, thus subsample the output and might see aliasing effects.
So to choose an appropriate zoom
level, we might want to search for the zoom where the unique_fraction
goes towards 1 but not necessarily be exactly 1.
Selecting a good zoom
level comes with multiple advantages:
We don’t load excessive amounts of data -> our code becomes faster
Using pre-aggregated data can reduce aliasing effects (if the output hierarchy uses area averaging)
[10]:
@lambda f: np.vectorize(f, excluded={1, 2})
def unique_fraction(nside, lons, lats):
idx = get_nn_lon_lat_index(nside, lons, lats)
return np.unique(idx).size / idx.size
Let’s try this function for several zoom
levels and different grids:
[11]:
import matplotlib.pylab as plt
zoom = np.arange(15)
grids = [
(np.linspace(-70, -55, 300), np.linspace(5, 20, 300), "Carribean, fine"),
(np.linspace(-70, -55, 100), np.linspace(5, 20, 100), "Carribean, coarse"),
(np.linspace(-180, 180, 360), np.linspace(-90, 90, 180), "Globe, 1° by 1°"),
]
for lons, lats, label in grids:
plt.plot(zoom, unique_fraction(2**zoom, lons, lats), label=label)
plt.xlabel("zoom")
plt.ylabel("unique fraction")
plt.legend()
None
As the change between 0 and 1 typically is rather steep, we could get a simple criterion for “approaching 1” by just picking the first zoom level where unique_fraction > 0.5
and use that as a suggested zoom level:
[12]:
for lons, lats, label in grids:
print(f"{label:30s} {np.argmax(unique_fraction(2**zoom, lons, lats) > 0.5)}")
Carribean, fine 10
Carribean, coarse 9
Globe, 1° by 1° 6
So based on this criterion, the best zoom level for our initial example would indeed be 10
. We can also see that level 6
could be appropriate for a global 1° by 1° grid as suggested in the HEALPix intro.