Sufficient Dimension Reduction - 4. Central Mean Space
1. Central Mean Subspace
SDR Algorithms we have seen are algorithms whose target is Central Space. But, it could be seen that its target is too broad to work well. Therefore, by narrowing target space, we could make some improvement. This is an idea of Central Mean Space.
Remark that our former target is finding \(\beta\) such that \(Y \perp X \mid \beta^t X\).
This means \(f(Y \mid X) = f(Y \mid \beta^t X)\) and therefore can be seen as finding the whole conditional distribution of Y given X.
We can compromise this target as \(E(Y \mid X) = E(Y \mid \beta^t X)\) which can be seen as finding conditional mean space rather than whole space.
Like central space, we can define Central Mean Space $S_{E(Y \mid X)}$ as an intersection of all \(span(\beta^t)\). And \(S_{E(Y \mid X)} \subset S_{Y \mid x}\) holds. Even though it could be directly seen with intuition, it can be proved analytically too.
To see it more intuitively, we can imagine following model \(Y = f(\beta_1^t X) +f(\beta_2^t X) \epsilon\) where \(E(\epsilon) = 0, X \perp \epsilon\)
Notice that \(E(Y \mid X) = E(E(Y \mid X , \epsilon)) = E(f(\beta^t_1 X) +f(\beta_2^t X)\epsilon) = f(\beta_1^t X)\)
It means that \(E(Y \mid X) = E(Y \mid \beta_1^t X) = f(\beta_1 ^t X)\) and \(f(Y \mid X) \neq f(Y \mid \beta^t_1 X)\) therefore, we can interpret as $\beta_1$ recover central mean space and $\beta_2$ recover remaining central space.
Now let’s find what algorithm can recover central mean space.
2. Ordinary Least Square
OLS is an algorithm to find coefficients of linear regression. Because linear regression is \(E(Y \mid X) = X\beta\), we can see that it is the simplest structure of the central mean subspace.
Theorem
If X satisfies \(LCM(\beta)\) where \(\beta\) is a basis of \(S_{E(Y \mid X)}\) and \(E(\mid X \mid ^p) < \infty\),
then \(\Sigma_{XX}^{-1}\Sigma_{XY} \in S_{E(Y \mid X)}\)
Even though this algorithm can successfully find some part of space, it can only find just one direction of the space. Therefore, we need another algorithm to find enough direction.
3. Principal Hessian Direction
Principal Hessian Direction uses 3rd moment \(E(XXY) = \Sigma_{XXY}\). By using it. We can derive a broader estimator of SDR space.
Theorem
If \(LCM(\beta)\) and \(CCV(\beta)\) are satisfied and \(E(\mid X \mid^p)<\infty\),
then \(span(\Sigma_{XX}^{-1}\Sigma_{XXY} \Sigma_{XX}^{-1}) \subset S_{E(Y \mid X)}\)
pf)
\(E(XX^tY)\) can be changed as follow
\[E(XX^t Y) = E(XX^t E(Y \mid X)) = E(XX^t E(Y \mid \beta^t X)) = E(E(XX^t \mid \beta^t X)Y)\] \[Cov(X \mid \beta^t X) = E(XX^t \mid \beta^t X) - E(X \mid \beta^t X) E(X^t \mid \beta^t X)\] \[\Sigma_{XX} Q_\beta(\Sigma_{xx}) = E(XX^t \mid \beta^t X) -P^t_{\beta}(\Sigma_{xx})XX^tP_{\beta}(\Sigma_{xx})\] \[E(XX^tY) = E([\Sigma_{XX}Q_\beta(\Sigma_{xx})+P_{\beta}^t(\Sigma_{xx})XX^tP_{\beta}(\Sigma_{xx})]Y)\] \(= [\Sigma_{XX}Q_\beta(\Sigma_{xx})E(Y)+P_{\beta}^t(\Sigma_{xx})E(XX^tY)P_{\beta}(\Sigma_{xx})]\)
\(=P_{\beta}^t(\Sigma_{xx})E(XX^tY)P_{\beta}(\Sigma _{xx})\)
\[span(\Sigma_{xx}^{-1}\Sigma_{xxy}\Sigma_{xx}^{-1}) \subset span(P_{\beta}(\Sigma_{xx})) = S_{E(Y \mid X)}\]Therefore, by solving \(GEV(E_n(ZZ^tY))\), we can implement PHD subspace.
4. Iterative Hessian Transformation
There is another advanced algorithm to find central mean space. Iterative Hessian Transformation (IHT) has two advantages over algorithms I have mentioned.
- It doesn’t depend on CCV condition
- It can recover much more vector than OLS or SIR.
This algorithm uses the property of projection matrix. Because operator $\Sigma_{xxy}$ transforms the vector to \(S_{E(Y \mid X)}\), the transformation of the vectors of \(S_{E(Y \mid X)}\) are in \(S_{E(Y \mid X)}\) too. Therefore, by iteratively applying transformation, we can recover more vectors in \(S_{E(Y \mid X)}\).
Theorem
If \(LCM(\beta)\) is satisfied, then \(S_{E(Y \mid Z)}\) is an invariant subspace of \(\Sigma_{zzy}\)
Therefore, we can recover IHT subspace by using \(\Sigma_{zy},\Sigma_{zzy}\Sigma_{zy},\Sigma_{zzy}^2\Sigma_{zy},...\)
To make more systematically, solving GEV of \(\Lambda_{iht} = MM^t \mbox{(where }M = (\Sigma_{zy},\Sigma_{zzy}\Sigma_{zy},\Sigma_{zzy}^2\Sigma_{zy},...))\) can recover central mean space.
Debnath, L.& Mikusinski, P. (2005). Introduction to Hilbert Space.London,UK.:Elsevier Academic Press
Li, B. (2018). Sufficient Dimension Reduction. Boca Raton,FL:CRC Press.