Envelope Theory is theory about subspace targeting much broader concepts then traditional statistical models. The term “envelop” is defined as minimal subspace containing some target space like envelopes. Therefore, the main mathematical tools of envelope theory is ordinary linear algebra and vectorization techniques.

Before we see the envelope theory, let’s see the preliminary first.

0. Preliminary about Vector Space

Matrices

  • \(\mathbb{R}^{r \times p}\) : a Class of all real matrices of dimension \(r \times p\)
  • \(\mathbb{S}^{r \times r}\) : a Class of all symmetric matrices of dimension \(r \times r\)
  • \(A^{\dagger}\) : Moore-Penrose Pseudo Inverse of matrix A
  • \(I_r : r \times r\) identity matrix
  • \(1_n : n \times 1\) one vector
  • \((A)_{ij} :\) ijth element of a matrix \(A\)
  • \(M>0 : M\) is positive definite matrix
  • \(M \geq 0 : M\) is positive semi definite matrix
  • \(\mid M \mid :\) Determinant of matrix \(M\)
  • \(\mid M\mid_0:\) Product of all nonzero eigenvalue of \(M\)
  • \(\parallel M \parallel :\) The spectral norm of matrix \(M\) i.e. \(\sqrt{\lambda_{max}}\)

Vectorization and Kronecker Product

  • \(vec(M) : \mathbb{R}^{r \times p} \rightarrow \mathbb{R}^{rp}\) is operator making a vector by stacking columns of matrix
  • \(vech(M) : \mathbb{S}^{r \times r} \rightarrow \mathbb{R}^{r(r+1)/2}\) is operator making a vector by stacking j to r-th elements of j th columns.
  • \(C_r :\) Contraction matrix that \(vech(A) = C_r vec(A)\) for \(A \in \mathbb{S}^{r \times r}\)
  • \(E_r:\) Expansion matrix that \(vec(A) = E_r vech(A)\) for \(A \in \mathbb{S}^{r \times r}\)
  • \(K_{rp} :\) Commutation matrix that \(vec(A^{T})=K_{rp} vec(A)\) for \(A \in \mathbb{R}^{r \times p}\)
  • \(\bigotimes : \mathbb{R}^{r \times p} \times \mathbb{R}^{t\times q} \rightarrow \mathbb{R}^{rt \times pq}\) that \(A \otimes B = (a_{ij}B)\) for \(i=1,2,...,r,j=1,2,...,p\)
  • \(bdiag(A_1,...,A_q) :\) block diagonal matrix using \(A_i\)
  • \(\bigoplus : \mathbb{R}^{m \times n } \times \mathbb{R}^{p \times q} \rightarrow \mathbb{R}^{(m+p) \times (n+q)}\) that \(A \oplus B = bdiag(A,B)\)
  • There is important formula connecting vectorization and Kronecker product that \(vec(ABC) = (C^t \otimes A) vec(B)\)

Subspaces

For \(A \in \mathbb{R}^{p \times r},B \in \mathbb{R}^{r \times p }, S \subset \mathbb{R}^{r}\)

  • \[AS := \{Ax : x \in S\}\]
  • \[span(S)= \{Bx : x \in \mathbb{R}^r\}\]
  • \[S_{B} := span(B)\]
  • \(S_d (A,B)\) : Span of first d eigenvectors of \(A^{-1/2} B A^{-1/2}\) which is called span of first d eigen vectors of B relate to A
  • \[S_1 + S_2 = \{x_1+x_2 : x_1 \in S_1 , x_2 \in S_2\}\]
  • \(\langle x_1,x_2\rangle_{\Sigma} = x_1^t \Sigma x_2\) for some \(\Sigma \geq 0\)
  • \[P_{B(\Sigma)} := B(B^t\Sigma B)^{\dagger}B^t \Sigma\]
  • \[Q_{B(\Sigma)} := I_r - P_{B(\Sigma)}\]
  • \[S \backslash S_1 = span (P_S-P_{S_1})\]
  • \[S_1 \oplus S_2 = span(S_1 \oplus S_2)\]
  • Grassman Manifold \(\mathcal{G}(u,r)\), u-dimensional subspace of \(\mathbb{R}^r\)

Random Vectors

  • \(Y \sim X :\) Random variables \(X\) and \(Y\) have same distribution
  • \(Y \perp X :\) Random variables \(X\) and \(Y\) are independent
  • \(Y \mid X = x :\) Conditional Distribution of \(Y\) given \(X\) as \(x\)
  • \(avar(\cdot) :\) Asymptotic Covariance Matrix that if \(\sqrt{n}(T-\theta) \overset{D}{\rightarrow } N(0,A)\) then \(avar(\sqrt{n}T) = A\)

Finally, the definition of envelopes is as follow.

#### Envelopes

\(\mathcal{E}_{M}(S) :\) The M-envelope of S is intersection of all reducing subspaces of M that contain S for \(M \in \mathbb{S}^{r\times r}, S \subset span (M)\)

1. Response Envelopes

1.1 The multivariate Linear Model

The envelope theory is theory to explain correlation among multiple dependent variables and multiple independent variables. Therefore, before we see the Envelope theory, we have to see the multiple linear model which handle multiple random variables.

The multivariate linear model is linear models whose dependent variables are random vectors. It can be seen that the concatenation of multiple linear models and give additional covariance structure at the error terms.

\[Y_i = \alpha + \beta X_i + \epsilon_i , i =1,2,...,n \\ \mbox{Stochastic Elements : }Y_i \in \mathbb{R}^{r}, X_i \in \mathbb{R}^{p}, \epsilon_i \overset{iid}{\sim} N(0,\Sigma) \\ \mbox{Nonstochastic Elements : }\alpha \in \mathbb{R}^{r} , \beta \in \mathbb{R}^{r \times p}\]

To be more easily handle it, let assume the \(X_i\) is centerized i.e. \(\sum_{i=1}^n X_i =0\). Although the predictor \(X\) is stochastic, we can see it as nonstochastic by conditioning it i.e. \(Y_i \mid X_i = \alpha + \beta X_i + \epsilon_i\).

So the model can be written as follow.

\[\left[\begin{array}{c} Y_1 \\ Y_2 \\ \vdots \\ Y_r \end{array}\right] = \left[\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_ r \end{array}\right] + \left[\begin{array}{cccc} \beta_{11} & \beta_{12} & \cdots & \beta_{1p} \\ \beta_{21} & \beta_{22} & \cdots & \beta_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \beta_{r1} & \beta_{r2} & \cdots & \beta_{rp} \end{array}\right] \left[\begin{array}{c} X_{1} \\ X_{2} \\ \vdots \\ \vdots \\ X_p \end{array}\right] + \left[\begin{array}{c} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_r \end{array}\right]\]

Notice that this is a model of one samples. It is model to predict r numbers of responses using p numbers of predictors. The main parameter of the model is \(\alpha, \beta\) and \(\Sigma\)

To estimate each parameter, let’s use following concepts regarding samples.

  • \(Y' = \{Y_i - \bar{Y}\}_{i=1}^n : n \times r\) sample matrix
  • \(Y_0'= \{Y_i^t \}_{i=1}^n : n \times r\) sample matrix
  • \(X' = \{X_i^t \}_{i=1}^n: n \times p\) sample matrix
  • \[S_{YX} = Y'^t X' / n\]
  • \[S_{XX} = X'^t X' /n = n^{-1} \sum_{i=1}^nX_i X_i^t\]

Therefore, the model can be written as sample version.

\[Y_0' = 1_n \otimes \alpha^t + X' \beta + \epsilon\]

where \(\epsilon = (e_{ij})_{i,j=1}^{n,r}\)

#### The Maximum Likelihood Estimator of \(\alpha, \beta, \Sigma\) is as follow.

  • \[\hat{\alpha}= \bar{Y}\]
  • \[\hat{\beta} = B := Y'^tX'(X'^tX')^{-1} = Y_0'^tX'(X'^tX')^{-1}=S_{YX}S_{XX}^{-1}\]
  • \[\hat{\Sigma} = S_{Y \mid X}:= n^{-1}\sum_{i=1}^n r_i r_i^t$ where $r_i = Y_i-\hat{Y}_i\]

The estimator of \(\hat{\alpha} = \bar{Y}\) is naturally derived because \(X\) and \(Y\) are centered

The equality \(Y'^tX'= Y_0'^t X'\) holds because \(\sum_{i=1}^n X_i= 0\).

\[Y'^tX' = (\sum_{i=1}^n (Y_i-\bar{Y})^tX_i)_{j,k =1}^{r,p} = (\sum_{i=1}^n Y_i^tX_i-\bar{Y}^t\sum_{i=1}^n X_i)_{j,k =1}^{r,p} =(\sum_{i=1}^n Y_i^tX_i)_{j,k=1}^{r,p} =Y_0'^tX'\]

The sample covariance matrix of \(Y,\hat{Y},\epsilon\)

  • \[\hat{Cov}(Y)= S_Y = n^{-1} Y'^tY' = S_{Y \circ X} + S_{Y \mid X}\]
  • \[\hat{Cov}(\hat{Y})= S_{Y \circ X}= n^{-1}Y'^t P_X Y' = S_{YX}S_X^{-1}S_{XY}\]
  • \[\hat{Cov}(\epsilon) = S_{Y \mid X} = n^{-1} \sum_{i=1}^n r_i r_i^t = n^{-1} Y'^t Q_{X}Y' = S_Y-S_{Y \circ X}\]

One of the main usage of the linear model is in deriving statistical inference about \(\beta\).

Because the \(\beta\) is random matrix, it is too complicated to formulate the covariance tensor which contains the covariance between \(\beta_{ij}\) and \(\beta_{jk}\). By vectorizing the \(\beta\), we can formulate the random matrix \(\beta\) into random vector and covariance tensor into covariance matrix.

The MLE of beta is \(\hat{\beta} = Y_{0}'^t X'(X'^tX')^{-1}\). By using the formula \(vec(ABC) = (C^t \otimes A)vec(B)\), we can derive following formula.

\[vec(\hat{\beta}) = vec(Y_{0}'^t X'(X'^tX')^{-1}) = ((X'^tX')^{-1}X'^t \otimes I_r)vec(Y_0'^t)\]

Similary, \(Y_i\) can be written as follow.

\[vec(Y_0'^t) = 1_n \otimes \alpha + (X'\otimes I_r)vec(\beta) + vec(\epsilon)\] \[E(vec(Y_0'^t)) = 1_n \otimes \alpha +(X' \otimes I_r)vec(\beta)\] \[Cov(vec(Y_0'^t)) = Cov(vec(\epsilon)) = I_n \otimes \Sigma\]

By combining four equations, we can get a following result.

Statistical Inference about \(\hat{\beta}\)

\[E(vec(\hat{\beta})) = vec(\beta)\] \[Cov(vec(\hat{\beta})) = (X'^tX')^{-1}\otimes \Sigma = n^{-1} S_{X}^{-1} \otimes \Sigma\] \[\hat{Cov}(vec(\hat{\beta})) = (X'^tX')^{-1} \otimes S_{Y \mid X}\]

\[\begin{align*} E(vec(\hat{\beta})) & = E(((X'^tX')^{-1}X'^t \otimes I_r)vec(Y_0'^t)) \\ & = ((X'^tX')^{-1}X'^t \otimes I_r)E(vec(Y_0'^t)) \\ &= ((X'^tX')^{-1}X'^t \otimes I_r) (1_n \otimes \alpha + (X' \otimes I_r)vec(\beta)) \\ &= (X'^tX')^{-1}X'^t1_n \otimes \alpha + ((X'^tX')^{-1}X'^tX' \otimes I_r)vec(\beta) \\ & = 0+ I_{rp} vec(\beta) = vec(\beta)\end{align*}\]

Each equation can be calculated by formula \((A\otimes B) (C \otimes D) = AC \otimes BD\). Notice that \(X'^t1_n =\sum_{i=1}^n X_i = 0\).

\[\begin{align*} Cov(vec(\hat{\beta})) &= Cov((X'^tX')^{-1} X'^t \otimes I_r)vec(Y_0'^t)) \\ &= ((X'^tX')^{-1} X'^t \otimes I_r)Cov(vec(Y_0'^t) )(X'^t (X'^tX')^{-1} \otimes I_r) \\ &= ((X'^tX')^{-1} X'^t \otimes I_r)(I_n \otimes \Sigma)(X'^t (X'^tX')^{-1} \otimes I_r) & \\ &= (X'^tX')^{-1}\otimes \Sigma \end{align*}\]

Partitioned Models

  • For the multivariate Linear model, \(Y = \mu + \beta_1 X_1 + \beta_2 X_2 +\epsilon\), the \(\beta_1\) is preserved when the model is transformed as follow.
\[Y = \mu + \beta_1 \hat{R}_{1 \mid 2} + \beta_2^{\ast} X_2 + \epsilon \\ \mbox{ where } \hat{R}_{1 \mid 2} = X_1 - X_1 P_{2}\]
  • Remark that \(\hat{R}_{1 \mid 2} \perp X_2\), and therefore \(\beta_1\) can be obtained by regressing \(Y\) on \(\hat{R}_{1 \mid 2}\) and regressing \(\hat{R}_{Y \mid 2}\) on \(\hat{R}_{1\mid 2}\)
  • A plot of \(\hat{R}_{Y\mid 2}\) versus \(\hat{R}_{1 \mid 2}\) is called Added Variable Plot.

1.2 Envelope Model for Response Reduction

Now, let’s see the envelope theory. The main idea of envelope theory is reducing \(Y\) into much more smaller extracted variables.

A concept called as \(X\) - Invariant is important concept to understand the envelope theory.

We say the linear transfomration \(G^tY\), where \(G \in \mathbb{R}^{r \times q}\) is X-invariant if and only if \(A^tG^tY\) is \(X\) - invariant for any nonstochastic full rank \(A \in \mathbb{R}^{q \times q}\). In this case, the \(G\) can be unidentifiable but \(span(G)\) is identifiable. Therefore, we target to find subspace rather than specific coordinates.

We can define following concepts about subspace \(\mathcal{E}\) of \(\mathbb{R}\) under model 1.1

  1. \[Q_{\epsilon}Y \mid X= x_1 \sim Q_{\epsilon}Y \mid X = x_2, \forall x_1,x_2 \in \mathbb{R}^p\]
  2. \[P_{\epsilon} Y \perp Q_{\epsilon}Y \mid X\]
  • The first condition means that \(Q_{\epsilon}Y\) is invariant in \(X\). This holds if and only if \(span(\beta) \subset \mathcal{E}\) because \(Q_{\mathcal{E}}Y = Q_{\mathcal{E}}\alpha+Q_{\mathcal{E}}\beta X+ Q_{\mathcal{E}} \epsilon\) and \(Q_{\mathcal{E} }\beta X = 0\) if and only if \(Q_{\mathcal{E}} \beta= 0\).

  • The second condition means that \(Q_{\mathcal{E}}Y\) is unaffected by changes in \(X\) i.e. \(cov(P_{\mathcal{E}}Y,Q_{\mathcal{E}}Y \mid X) = P_{\mathcal{E}} \Sigma Q_{\mathcal{E}} =0\).

  • Two conditions can be understood as only the part \(P_{\epsilon}Y\) is affected by \(X\) which is main target of envelope theory.

Because the subspace concept is abstract, it needs an additional concepts to define it in detailed. A subspace \(\mathcal{R} \subset \mathbb{R}^{r}\) is said to be a reducing subspace of \(M \in \mathbb{S}^{r \times r}\) if \(\mathbb{R}\) decomposes \(M\) as \(M= P_{\mathcal{R}}MP_{\mathcal{R}} + Q_{\mathcal{R}}MQ_{\mathcal{R}}\). If \(\mathcal{R}\) is a reducing subspace of \(M\), we say that \(\mathcal{R}\) reduces \(M\).

By the additional mathematics, the envelopes \(\mathcal{E}_M(B)\) can be written as \(\mathcal{E}_M(B) = \sum_{k=1}^{q} P_{k}B\) where \(M\) has \(q\) distinct eigenvalues with projection \(P_{k}\).

As a result, we can define envelope model as follow.

Envelope Model

\(Y = \alpha + \Gamma \eta X + \epsilon\) with \(\Sigma = \Gamma \Omega \Gamma^t + \Gamma_{0} \Omega_{0} \Gamma_{0}^t\) where \(span(\Gamma) = \mathcal{E}_{\Sigma}(B),span(\Gamma_{0}) = \mathcal{E}^{\perp}_{\Sigma}(B), ( \Gamma, \Gamma_0) \in \mathbb{R}^{r \times r}\).

Notice that \(\Gamma_{0}Y = \Gamma_{0} \alpha + \Gamma_{0}\epsilon\) which means \(\Gamma_0 Y\) is invariant in \(X\) and \(Cov(\Gamma^t Y , \Gamma_{0}Y \mid X) = \Gamma^t \Sigma \Gamma_0 = 0\) which means that \(\Gamma_0\) satisfies envelope conditions.