conditional distribution of multi-variate normal variable

Theorem.

Let $X$ be a random variable, taking values in $\\mathbb{R}^{n}$ , normally distributedwith a non-singular covariance matrix $\\Sigma$ and a mean of zero.

Suppose $Y$ is defined by $Y=B^{*}X$ for some linear transformation $B\\colon\\mathbb{R}^{k}\\to\\mathbb{R}^{n}$ of maximum rank.( ${}^{\\ast}$ to denotes the transpose operator.)

Then the distribution of $X$ conditioned on $Y$ is multi-variate normal,with conditional means and covariances of:

\\mathbb{E}[X\\mid Y]=\\Sigma B{(B^{\\ast}\\Sigma B)}^{-1}Y\\,,\\quad\\operatorname{%Var}[X\\mid Y]=\\Sigma-\\Sigma B{(B^{\\ast}\\Sigma B)}^{-1}(\\Sigma B)^{\\ast}\\,.

If $k=1$ , so that $B$ is simply a vector in $\\mathbb{R}^{n}$ ,these formulas reduce to:

\\mathbb{E}[X\\mid Y]=\\frac{\\Sigma BY}{\\operatorname{Var}[Y]}\\,,\\quad%\\operatorname{Var}[X\\mid Y]=\\Sigma-\\frac{\\Sigma BB^{\\ast}\\Sigma}{\\operatorname%{Var}[Y]}\\,.

If $X$ does not have zero mean, then theformula for $\\mathbb{E}[X\\mid Y]$ is modified by adding $\\mathbb{E}[X]$ and replacing $Y$ by $Y-\\mathbb{E}[Y]$ ,and the formula for $\\operatorname{Var}[X\\mid Y]$ is unchanged.

Proof.

We split up $X$ into two stochastically independent parts,the first part containing exactly the information embodied in $Y$ .Then the conditional distribution of $X$ given $Y$ is simplythe unconditional distribution of the second part that is independent of $Y$ .

To this end, we firstchange variables to express everything in terms of a standard multi-variate normal $Z$ . Let $A\\colon\\mathbb{R}^{n}\\to\\mathbb{R}^{n}$ be a “square root” factorization ofthe covariance matrix $\\Sigma$ ,so that:

AA^{\\ast}=\\Sigma\\,,\\quad Z={A}^{-1}X\\,,\\quad X=AZ\\,,\\quad Y=B^{\\ast}AZ\\,.

We let $H\\colon\\mathbb{R}^{n}\\to\\mathbb{R}^{n}$ be the orthogonal projection onto the range of $A^{\\ast}B:\\mathbb{R}^{k}\\to\\mathbb{R}^{n}$ , and decompose $Z$ into orthogonal components:

Z=HZ+(I-H)Z\\,.

It is intuitively obvious that orthogonalityof the two random normal vectors implies their stochastic independence.To show this formally, observe that the Gaussian density function for $Z$ factors into a product:

(2\\pi)^{-n/2}\\,\\exp\\bigl{(}-\\tfrac{1}{2}\\lVert z\\rVert^{2}\\bigr{)}=(2\\pi)^{-n/%2}\\,\\exp\\bigl{(}-\\tfrac{1}{2}\\lVert Hz\\rVert^{2}\\bigr{)}\\,\\exp\\bigl{(}-\\tfrac{%1}{2}\\lVert(I-H)z\\rVert^{2}\\bigr{)}\\,.

We can construct an orthonormal system of coordinates on $\\mathbb{R}^{n}$ under which the components for $H z$ arecompletely disjoint from those components of $(I-H)z$ .On the other hand, the densities for $Z$ , $H Z$ , and $(I-H)Z$ remain invariant even after changing coordinates,because they are radially symmetric.Hence the variables $H Z$ and $(I-H)Z$ are separable in their joint densityand they are independent.

$H Z$ embodies the information in the linear combination $Y=B^{\\ast}AZ$ .For we have the identity:

Y=\\bigl{(}B^{\\ast}A\\bigr{)}Z=\\bigl{(}B^{\\ast}A\\bigr{)}\\bigl{(}HZ+(I-H)Z\\bigr{)%}=\\bigl{(}B^{\\ast}A\\bigr{)}HZ+0\\,.

The last term is null because $(I-H)Z$ is orthogonal to the range of $A^{\\ast}B$ by definition. (Equivalently, $(I-H)Z$ lies in the kernel of $(A^{\\ast}B)^{\\ast}=B^{\\ast}A$ .)Thus $Y$ can always be recovered by a linear transformation on $H Z$ .

Conversely, $Y$ completely determines $H Z$ ,from the analytical expression for $H$ that we now give.In general, the orthogonal projection onto the range of an injectivetransformation $T$ is $T{(T^{\\ast}T)}^{-1}T^{\\ast}$ . Applying this to $T=A^{\\ast}B$ , we have

	$\\displaystyle H$	$\\displaystyle=A^{\\ast}B{\\bigl{(}B^{\\ast}AA^{\\ast}B\\bigr{)}}^{-1}B^{\\ast}A$
		$\\displaystyle=A^{\\ast}B{(B^{\\ast}\\Sigma B)}^{-1}B^{\\ast}A\\,.$

We see that $HZ=A^{\\ast}B{(B^{\\ast}\\Sigma B)}^{-1}Y$ .

We have proved that conditioning on $Y$ and $H Z$ are equivalent, and so:

\\mathbb{E}[Z\\mid Y]=\\mathbb{E}[Z\\mid HZ]=\\mathbb{E}[HZ+(I-H)Z\\mid HZ]=HZ+0\\,,

and

	$\\displaystyle\\operatorname{Var}[Z\\mid Y]=\\operatorname{Var}[Z\\mid HZ]$	$\\displaystyle=\\operatorname{Var}[HZ+(I-H)Z\\mid HZ]$
		$\\displaystyle=0+\\operatorname{Var}[(I-H)Z]$
		$\\displaystyle=\\mathbb{E}\\bigl{[}(I-H)ZZ^{\\ast}(I-H)^{\\ast}\\bigr{]}$
		$\\displaystyle=(I-H)(I-H)^{\\ast}$
		$\\displaystyle=I-H-H^{\\ast}+HH^{\\ast}=I-H\\,,$

using the defining property $H^{2}=H=H^{\\ast}$ of orthogonal projections.

Now we express the result in terms of $X$ , and remove the dependence on the transformation $A$ (which is not uniquely defined from the covariance matrix):

\\mathbb{E}[X\\mid Y]=A\\,\\mathbb{E}[Z\\mid Y]=AHZ=\\Sigma B{(B^{\\ast}\\Sigma B)}^{-%1}Y

and

\\operatorname{Var}[X\\mid Y]=A\\,\\operatorname{Var}[Z\\mid Y]\\,A^{\\ast}=AA^{\\ast}%-AHA^{\\ast}=\\Sigma-\\Sigma B{(B^{\\ast}\\Sigma B)}^{-1}B^{\\ast}\\Sigma\\,.

Of course, the conditional distribution of $X$ given $Y$ is the sameas that of $(I-H)Z$ , which is multi-variate normal.

The formula in the statement of this theorem, for the single-dimensional case,follows from substituting in $\\operatorname{Var}[Y]=\\operatorname{Var}[B^{\\ast}X]=B^{\\ast}\\Sigma B$ .The formula for when $X$ does not have zeromean follows from applying the base case to the shiftedvariable $X-\\mathbb{E}[X]$ .∎