The whole contents of [Spatial Interpolation Theory] series are from the text book ‘Michael L. Stein, Interpolation of Spatial Data, New York: Springer, 1999.’
1. Linear Prediction
In the usual interpolation theory, it uses the linear model with random fields to estimate the target property. Therefore, to understand the spatial interpolation theory, we have to see the theory of linear model and random fields.
The random fields can be understood as a generalization of stochastic process. Remark that stochastic process is a sequence of random variables with univariate index. For example, \(\{Z(t): t=1,2,...,n\}\). They are used at analyzing sequential event such as weather forecasting, stock pricing or reinforcement learning.
The random fields simply extended the univariate index into more general domain such as Euclidean Space \(\mathbb{R}^n\) or Topological space \(T\). The most popular vertical using the theory is spatial geographical data. Notice that if we set the domain as \(\mathbb{R}^2\) then we can build coordinate indexed random variables with (longitude, latitude) or as \(\mathbb{R}^3\) then we can build 3-d random variables with (longitude, latitude, altitude).
Let’s set the random field \(\{Z(x) :x \in \mathbb{R}^d \}\). We’ll handle it with mean function $\mu(x) := E(Z(x))$ and covariance function $K(x,y)=cov(Z(x),Z(y))$.
One of the main concern of the spatial analysis is predict \(Z(x_0)\) called predictand using observation at \(Z(x_1),Z(x_2),...,Z(x_n)\). For instance, predicting the stock price at \(n+1\) time point \((Z(t_{n+1}))\) using observation of \(1,2,...,n\) time points \((Z(t_{1}),Z(t_{2}),...,Z(t_{n}))\).
Most common way to predict it is using linear predictors. Suppose we have observations \(\textbf{Z} = (Z(x_1),Z(x_2),...,Z(x_n))^t\) and predict \(Z(x_0)\) as \(\lambda_0 + \Lambda^t \textbf{Z}\). The predictors minimizing the Mean Squared Error (MSE) called Best Linear Predictors (BLP) and it is known as follow. \(\Lambda = \textbf{K}^{-1}\textbf{k} \\ \lambda_0 = \mu(x_0)-\textbf{k}^t \textbf{K}^{-1}\mu(\textbf{x}) \\ \hat{Z}(x_0) = \mu(x_0)+\textbf{k}^t\textbf{K}^{-1}(\textbf{Z}-\mu(\textbf{x}))\) Where \(\mu(\textbf{x}) = \{m(x_i)\}_{i=1}^n, \textbf{K} = \{Cov(Z(x_i),Z(x_j)\}_{i,j=1}^{n},\textbf{k}= \{Cov(Z(x_i),Z(x_0)\}_{i=1}^n\).
2. Best Linear Unbiased Prediction (BLUP)
Suppose we have the following model for a random field Z, \(Z(x) =\textbf{m}(x)^t \beta + \epsilon(x) \\ \epsilon(x) \sim N(0,\sigma_x)\) The \(\textbf{m}:\mathbb{R}^d \rightarrow \mathbb{R}^p\) is independent variable and \(\textbf{Z}: \mathbb{R}^d \rightarrow \mathbb{R}\) is dependent variable. That is, it follows a normal distribution \(Z(x) \sim N(\textbf{m}(x)^t\beta, \sigma_x)\)
Assume we have observations at $x=x_1,…,x_n$ and want to predict $Z(x_0)$. Denote sample matrix \(\textbf{M} := (\textbf{m}(x_1),\textbf{m}(x_2),...,\textbf{m}(x_n))^t \in \mathbb{R}^{n \times p},\textbf{Z}:=(\textbf{Z}(x_1),\textbf{Z}(x_2),...,\textbf{Z}(x_n))^t \in \mathbb{R}^{n \times 1}\). If we assume to know \(\beta\), then BLP of \(Z(x_0)\) is \(Z(x_0) = \textbf{m}(x_0)^t\beta + \textbf{k}^t\textbf{K}^{-1}(\textbf{Z}-\textbf{M}^t\beta)\).
If the \(\beta\) is also unknown, it have to be estimated. By using the Generalized Least Square Estimator, we can estimate \(\hat{\beta} = (\textbf{M}^t\textbf{K}^{-1}\textbf{M})^{-1}\textbf{M}^t\textbf{K}^{-1}\textbf{Z}\). (Remark that the GLS of Linear model is used when the covariance of error term is not identical i.e. \(Y \sim N(X\beta, \Sigma)\))
As a result, the Best Linear Unbiased Predictor (BLUP) of $\textbf{Z}(x_0)$ is \(Z(x_0) = \textbf{m}(x_0)^t\hat{\beta} + \textbf{k}^t\textbf{K}^{-1}(\textbf{Z}-\textbf{M}^t\hat{\beta})\).
3. About kriging
Assume some geographical data problem such as predicting the price of a house with the price of neighborhood house. It become some kind of interpolation or weighted mean. In this situation, the BLUP of the $Z(x_0)$ is called Kriging named after the South African Mining Engineer D. G. Krige.
There are various types of kriging.
-
**Ordinary Kriging : ** Assume \(\textbf{m}(x) \equiv 1\), so that $Z(x) \sim N(\mu, \sigma_x)$. It regards that the mean of the process is just unknown constant.
-
**Simple Kriging : ** Assume that $\mu(x)=0$.
-
**Universial Kriging : ** Assume \(\textbf{m}\) to be general.