conditional distribution of multi-variate normal variable
Theorem.
Let be a random variable, taking values in , normally distributedwith a non-singular covariance matrix
and a mean of zero.
Suppose is defined by for some linear transformation of maximum rank.( to denotes the transpose operator.)
Then the distribution of conditioned on is multi-variate normal,with conditional means and covariances
of:
If , so that is simply a vector in ,these formulas reduce to:
If does not have zero mean, then theformula for is modified by adding and replacing by ,and the formula for is unchanged.
Proof.
We split up into two stochastically independent parts,the first part containing exactly the information embodied in .Then the conditional distribution of given is simplythe unconditional distribution of the second part that is independent of .
To this end, we firstchange variables to express everything in terms of a standard multi-variate normal. Let be a “square root” factorization ofthe covariance matrix ,so that:
We let be the orthogonal projection onto the range of, and decompose into orthogonal
components
:
It is intuitively obvious that orthogonalityof the two random normal vectors implies their stochastic independence.To show this formally, observe that the Gaussian density function for factors into a product
:
We can construct an orthonormal system of coordinates on under which the components for arecompletely disjoint from those components of .On the other hand, the densities for , , and remain invariant even after changing coordinates,because they are radially symmetric
.Hence the variables and are separable in their joint densityand they are independent.
embodies the information in the linear combination .For we have the identity:
The last term is null because is orthogonal to the range of by definition. (Equivalently, lies in the kernel of .)Thus can always be recovered by a linear transformation on.
Conversely, completely determines ,from the analytical expression for that we now give.In general, the orthogonal projection onto the range of an injectivetransformation
is . Applying this to , we have
We see that .
We have proved that conditioning on and are equivalent, and so:
and
using the defining propertyof orthogonal projections.
Now we express the result in terms of , and remove the dependence on the transformation (which is not uniquely defined from the covariance matrix):
and
Of course, the conditional distribution of given is the sameas that of , which is multi-variate normal.
The formula in the statement of this theorem, for the single-dimensional case,follows from substituting in .The formula for when does not have zeromean follows from applying the base case to the shiftedvariable .∎