Multivariate Normal Distributions

Published

November 10, 2025

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions.

1) Generalized Case

A Bivariate Normal Distribution follows the distribution:

\[ \begin{bmatrix} \boldsymbol{a}\\ \boldsymbol{b} \end{bmatrix} \sim MVN( \mu_{ab}, \Sigma_{ab} ), \text{where } \mu_{ab} = \begin{bmatrix} \boldsymbol{\mu_a}\\ \boldsymbol{\mu_b} \end{bmatrix} \text{ and } \boldsymbol{\Sigma_{ab}} = \begin{bmatrix} \boldsymbol{\Sigma_{aa}}\ \boldsymbol{\Sigma_{ab}}\\ \boldsymbol{\Sigma_{ba}}\ \boldsymbol{\Sigma_{bb}}\\ \end{bmatrix} \]

One interesting property of these kind of distribution is that allows us to compute the mean and the variance of the conditional distributions \(b \mid a\) (or \(a \mid b\)) as follows

\[ \mathbf{b}\mid\mathbf{a}\ \sim\ N\left( \underbrace{\boldsymbol{\mu}_b + \boldsymbol{\Sigma}_{ba}\,\boldsymbol{\Sigma}_{aa}^{-1}\,(\mathbf{a}-\boldsymbol{\mu}_a)}_{\,\mathbb{E}(\mathbf{b}\mid\mathbf{a})\,}, \underbrace{\boldsymbol{\Sigma}_{bb} - \boldsymbol{\Sigma}_{ba}\,\boldsymbol{\Sigma}_{aa}^{-1}\,\boldsymbol{\Sigma}_{ab}}_{\operatorname{Var}(\mathbf{b}\mid\mathbf{a})} \right). \]

Therefore, if we focus solely on the conditional mean and variances

\[ E(b \mid a) = \mu_b + \Sigma_{ba}(\Sigma_{aa})^{-1}(a - \mu_a) \]

\[ var(b \mid a) = \Sigma_{bb} - \Sigma_{ab} (\Sigma_{aa})^{-1}\Sigma_{ba} \]

2) Implications in Quantitative Genetics

1) Conditional \(\mathbf{E[u \mid y]}\)

Let’s illustrate this with our model where we fit a single variable (i.e, block) as fixed and another one (i.e, genotype) as random. Therefore:

\[ y = X\beta + Zu + \epsilon, \text{ where } u \sim N(0,\underbrace{I\sigma^2_u}_{G}) \text{ and } \epsilon\sim N(0, \underbrace{I\sigma^2_e}_{R}) \]

Therefore, we could write the Multivariate Normal distribution of \(y\) and \(u\) as follows.

\[ \begin{bmatrix} y\\ u \end{bmatrix} \sim MVN( \begin{bmatrix} X\beta\\ 0 \end{bmatrix}, \begin{bmatrix} V \ ZG \\ GZ'\ G \\ \end{bmatrix} ) \]

Following the previous equations, the expected value of \(u\) is

\[ \begin{align*} E[u \mid y] &= \mu_u + \Sigma_{uy}(\Sigma_{yy})^{-1}(y - \mu_y) \\ &= GZ'V^{-1}(y - X\beta) \end{align*} \]

Given that we don’t have the true \(\beta\) but the estimated ones (\(\hat \beta\)), then the expression for the estimator and predictors becomes

\[ \begin{align*} E[\hat u \mid y] &= GZ'V^{-1}(y - X\hat{\beta}) \\ &= GZ'V^{-1}\big(y - X(X'V^{-1}X)^{-1}X'V^{-1}y\big) \\ &= GZ'V^{-1}\big(I - X(X'V^{-1}X)^{-1}X'V^{-1}\big)y \\ &= GZ'\big(V^{-1} - V^{-1}X(X'V^{-1}X)^{-1}X'V^{-1}\big)y \\ &= GZ'Py \\ & \text{where } P = V^{-1} - V^{-1}X(X'V^{-1}X)^{-1}X'V^{-1} \end{align*} \]

So, briefly

\[ BLUPs = \hat u = GZ'Py \]

This BLUP definition is very useful to derive some variance and covariances that we might need in future chapters.

1) Variance of \(\mathbf{\hat u}\)

\[ \begin{align*} \mathrm{var}(\hat u) &= var(GZ'Py) \\ &= GZ'Pvar(y)PZG \\ &= GZ'PVPZG \\ &= GZ'PZG \end{align*} \]

This is possible thanks to one of the many useful properties of \(P\), that fulfills that \(PVP = P\)

2) \(\mathbf{Cov(\hat{u}, u)}\)

\[ \begin{align*} \mathrm{cov}(\hat u, u) &= cov(GZ'Py, u) \\ &= GZ'Pcov(y, u) \\ &= GZ'PZG \end{align*} \]

Interestingly

\[ \begin{align*} \mathrm{cov}(u, \hat u) &= cov(\hat u, u)' \\ &= GZ'PZG \end{align*} \]

Then \(\mathrm{cov}(u, \hat u) = \mathrm{cov}(\hat u, u)\)

3) \(\mathbf{Var(u - \hat{u})}\)

Following the properties of the variances

\[ \begin{align*} \mathrm{var}(u - \hat{u}) &= E\!\left[(u - \hat{u} - E(u - \hat{u}))(u - \hat{u} - E(u - \hat{u}))'\right] \\ &= E\!\left[(u - \hat{u})(u - \hat{u})'\right] \\ &= E(uu') - E(u\hat{u}') - E(\hat{u}u') + E(\hat{u}\hat{u}') \\ &= \mathrm{var}(u) - \mathrm{cov}(u, \hat{u}) - \mathrm{cov}(\hat{u}, u) + \mathrm{var}(\hat{u}) \end{align*} \]

Then, using the previously computed variances and covariances we can solve for

\[ \begin{align*} \operatorname{var}(u - \hat{u}) &= G - GZ'PZG - GZ'PGZ + GZ'PZG \\ &= G - GZ'PZG \end{align*} \]

Remember that this differences are represented in the so called Prediction Error Variance Matrix, so

\[ C_{22} = PEV = G - GZ'PZG \]

This is helpful since now we can prove that the marginal variance \(var(\hat{u})\) equals

\[ \begin{align*} \operatorname{var}(\hat{u}) &= G - C_{22} \\ &= G - (G - GZ'PZG) \\ &= GZ'PZG \end{align*} \]

2) Conditional \(\mathbf{\hat{u}, u}\)

Starting from the joint distribution

\[ \begin{bmatrix} \hat{u} \\[4pt] u \end{bmatrix} \sim \text{MVN}\!\left( \begin{bmatrix} 0 \\[4pt] 0 \end{bmatrix}, \begin{bmatrix} GZ'PZG & GZ'PZG \\[4pt] GZ'PZG & G \end{bmatrix} \right) \]

We can derive \(\mathrm{var}(u \mid \hat{u})\) as

\[ \begin{align*} \mathrm{var}(u \mid \hat{u}) &= \Sigma_{uu} - \Sigma_{\hat{u}u} (\Sigma_{\hat{u}\hat{u}})^{-1}\Sigma_{u\hat{u}} \\ &= G - GZ'PZG(GZ'PZG)^{-1}GZ'PZG \\ &= G - GZ'PZG \end{align*} \]

We have just proved then that

\[ \mathrm{var}(u \mid \hat{u}) = \operatorname{var}(u - \hat{u}) \]

Besides that, if we solve for the mean of the conditional distribution, we get that

\[ \begin{align*} E[u \mid \hat{u}] &= \mu_u + \Sigma_{u\hat{u}}(\Sigma_{\hat{u}\hat{u}})^{-1}(\hat{u} - \mu_\hat{u}) \\ &= GZ'PZG(GZ'PZG)^{-1}(\hat{u}) \\ &= \hat{u} \end{align*} \]

So that

\[ E[u \mid \hat{u}] = E[u \mid y] = \hat{u} = BLUP \]