Solution for spatial interpolation with gaussion process regression
is Given Below:
I have a csv-file with 140.000 points(rows). It consists of:
- longitude value
- latitude value
- subsidence value at specific points. I assume that these points are spatially correlated.
I want to perform a spatial interpolation analysis of the area of the points. Meaning, I will do a geostatistical interpolation analysis using for example Kriging i.e gaussian process regression.
I’m reading on the sci-kit learn page about gaussian regression. But I’m unsure how to implement it.
What characteristics determine which kernel I can use? How do I implement this with my spatial data correctly?
First, you should convert your data to a projected coordinate system. The best one depends on where your data are located; essentially you want the conformal projection with the least amount of distortion for your location (e.g. Mercator near the equator, or Transverse Mercator if your data are all close to a single meridian. You can achieve this in geopandas
for example:
import pandas as pd
import geopandas as gpd
data = {'latitude': [54, 56, 58], 'longitude': [-62, -63, -64], 'subsidence': [10, 20, 30]}
df = pd.DataFrame(data)
params ={
'geometry': gpd.points_from_xy(df.longitude, df.latitude),
'crs': 'epsg:4326', # WGS84
}
gdf_ = gpd.GeoDataFrame(df, **params)
gdf = gdf_.to_crs('epsg:2961') # UTM20N
gdf
This GeoDataFrame
is now in projected coordinates. Now you can do some spatial prediction:
import numpy as np
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process import GaussianProcessRegressor
kernel = RBF(length_scale=100_000)
gpr = GaussianProcessRegressor(kernel=kernel)
X = np.array([gdf.geometry.x, gdf.geometry.y]).T
y = gdf.subsidence
gpr.fit(X, y)
Now you can predict at a location, e.g. gpr.predict([(500_000, 5_900_000)])
gives array([22.86764555])
for my toy data.
To predict on a grid, you could do this:
x_min, x_max = np.min(gdf.geometry.x) - 10_000, np.max(gdf.geometry.x) + 10_000
y_min, y_max = np.min(gdf.geometry.y) - 10_000, np.max(gdf.geometry.y) + 10_000
grid_y, grid_x = np.mgrid[y_min:y_max:10_000, x_min:x_max:10_000]
X_grid = np.stack([grid_x.ravel(), grid_y.ravel()]).T
y_grid = gpr.predict(X_grid).reshape(grid_x.shape)
Things to think about:
- You should read the docs for
geopandas
andsklearn.gaussian_process
- You should fit the kernel to your data.
- You might want to use an anisotropic kernel.
- The estimator has a few hypterparameters which you should pay attention to.
- Don’t forget to do some validation of your estimates, check the distribution of the residuals, etc.
- You might want to use a specialist geostats package like
gstools
, which will do a lot of the fiddly things for you.