spatial interpolation with gaussion process regression

Solution for spatial interpolation with gaussion process regression
is Given Below:

I have a csv-file with 140.000 points(rows). It consists of:

  1. longitude value
  2. latitude value
  3. subsidence value at specific points. I assume that these points are spatially correlated.

I want to perform a spatial interpolation analysis of the area of the points. Meaning, I will do a geostatistical interpolation analysis using for example Kriging i.e gaussian process regression.

I’m reading on the sci-kit learn page about gaussian regression. But I’m unsure how to implement it.

What characteristics determine which kernel I can use? How do I implement this with my spatial data correctly?

First, you should convert your data to a projected coordinate system. The best one depends on where your data are located; essentially you want the conformal projection with the least amount of distortion for your location (e.g. Mercator near the equator, or Transverse Mercator if your data are all close to a single meridian. You can achieve this in geopandas for example:

import pandas as pd
import geopandas as gpd

data = {'latitude': [54, 56, 58], 'longitude': [-62, -63, -64], 'subsidence': [10, 20, 30]}
df = pd.DataFrame(data)

params ={
    'geometry': gpd.points_from_xy(df.longitude, df.latitude),
    'crs': 'epsg:4326',  # WGS84
}

gdf_ = gpd.GeoDataFrame(df, **params)
gdf = gdf_.to_crs('epsg:2961')   # UTM20N
gdf

This GeoDataFrame is now in projected coordinates. Now you can do some spatial prediction:

import numpy as np
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process import GaussianProcessRegressor

kernel = RBF(length_scale=100_000)
gpr = GaussianProcessRegressor(kernel=kernel)

X = np.array([gdf.geometry.x, gdf.geometry.y]).T
y = gdf.subsidence
gpr.fit(X, y)

Now you can predict at a location, e.g. gpr.predict([(500_000, 5_900_000)]) gives array([22.86764555]) for my toy data.

To predict on a grid, you could do this:

x_min, x_max = np.min(gdf.geometry.x) - 10_000, np.max(gdf.geometry.x) + 10_000
y_min, y_max = np.min(gdf.geometry.y) - 10_000, np.max(gdf.geometry.y) + 10_000
grid_y, grid_x = np.mgrid[y_min:y_max:10_000, x_min:x_max:10_000]
X_grid = np.stack([grid_x.ravel(), grid_y.ravel()]).T
y_grid = gpr.predict(X_grid).reshape(grid_x.shape)

Things to think about:

  • You should read the docs for geopandas and sklearn.gaussian_process
  • You should fit the kernel to your data.
  • You might want to use an anisotropic kernel.
  • The estimator has a few hypterparameters which you should pay attention to.
  • Don’t forget to do some validation of your estimates,¬†check the distribution of the residuals, etc.
  • You might want to use a specialist geostats package like gstools, which will do a lot of the fiddly things for you.