## 9.4 Multivariate Normal Distribution

In this section, we introduce how to work with multivariate normal distribution in R.

First, let’s review the definition of a multivariate normal distribution. Suppose a $$p$$-dimensional random vector $${\bf x} \sim N({\bf \mu}, {\bf \Sigma})$$, where $${\bf \mu}$$ is the mean vector and $${\bf \Sigma}$$ is the covariance matrix. We have $f({\bf x}) = \frac{1}{(2\pi)^{p/2}\sqrt{|{\bf \Sigma}|}}\exp\left[-\frac{1}{2}({\bf x}-{\bf\mu})^T{\bf \Sigma}^{-1}({\bf x}-{\bf\mu})\right].$

### 9.4.1 Generate Multivate Normal Distribution

To randomly generate $${\bf x} \sim N({\bf \mu}, {\bf \Sigma})$$, you can use the mvrnorm(n = 1, mu, Sigma) function in the MASS package, which is preloaded in the base R. The function mvnorm() takes three arguments.

• n: targeted sample size
• mu: mean vector
• Sigma: covariance matrix

Let’s take a look at an example where the distribution is $$\bf\mu = (1, 2)^T$$ and ${\bf \Sigma} = \left[\begin{matrix} 1&0.5\\ 0.5&1 \end{matrix}\right]$

library(MASS)
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#>     select
n <- 10
mu <- c(1, 2)
Sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2)
x <- mvrnorm(n, mu, Sigma)
x
#>            [,1]     [,2]
#>  [1,] 0.5317478 1.591908
#>  [2,] 2.4575530 2.868658
#>  [3,] 0.5484207 2.079917
#>  [4,] 1.6036965 1.085302
#>  [5,] 0.7599533 2.066511
#>  [6,] 2.3854120 1.848962
#>  [7,] 1.0867794 1.785803
#>  [8,] 1.2315478 1.703268
#>  [9,] 0.7357121 1.083617
#> [10,] 0.7476221 1.690725

x contains $$n=10$$ two-dimensional normal distributed variable, where each row represents one observation.

### 9.4.2 Sample Mean and Sample Covariance Matrix

In practical applications, we usually have a set of observations and would like to get a sample estimate of the parameters $${\bf \mu}$$ and $${\bf \Sigma}$$.

For the $${\bf \mu}$$, you can directly use the sample mean as the estimate which is unbiased. $\hat {\bf\mu} = [\hat\mu_j],$ where $\hat\mu_j = \frac{1}{n}\sum_{i=1}^nx_{ij}.$

mu_hat <- apply(x, 2, mean)
mu_hat
#>  1.208844 1.780467

To estimate $${\bf\Sigma}$$, we will use the sample covariance matrix estimate: $\hat{\bf \Sigma} = [\hat\Sigma_{jk}],$ where $\hat\Sigma_{jk} = \frac{1}{n-1}\sum_{i=1}^n(x_{ij}-\hat\mu_j)(x_{ik}-\hat\mu_k).$

To compute this matrix, we can use the function cov() on the matrix x.

sigma_hat <- cov(x)
sigma_hat
#>           [,1]      [,2]
#> [1,] 0.5164012 0.1483479
#> [2,] 0.1483479 0.2643104
1/(n-1)*sum((x[,1]-mu_hat)*(x[,2]-mu_hat)) #compute \hat\Sigma_{12} by formula
#>  0.1483479

For cov() function, in addition to computing the sample covariance matrix, you can also use it to compute the covariance among two vectors $$y$$ and $$z$$. $cov(y, z) = \frac{1}{n-1}\sum_{i=1}^n(y_{i}-\bar y)(z_i-\bar z).$

Let’s try to compute the covariance between the first and second column of $$x$$.

cov(x[,1], x[,2])
#>  0.1483479
cov(x)[1, 2]
#>  0.1483479

You can see that this is exactly the same as $$\hat\Sigma_{12}$$.

### 9.4.3 Sample Correlation Matrix

Another useful measure is the sample correlation matrix. It is defined as $\widehat{corr}(x) = [\widehat{corr}(x)_{jk}],$ where $\widehat{corr}(x)_{jk} = \frac{\hat\Sigma_{jk}} {\sqrt{\hat\Sigma_{jj}\hat\Sigma_{kk}}}.$

You can use cor() on the matrix x to get its sample correlation matrix.

cor(x)
#>           [,1]      [,2]
#> [1,] 1.0000000 0.4015417
#> [2,] 0.4015417 1.0000000
sigma_hat[1,2] / sqrt(sigma_hat[1, 1] * sigma_hat[2, 2]) #verify the [1,2] element
#>  0.4015417

Similar to the cov() function, you can also us the cor() function to compute the sample correlation between two vectors.

cor(x[,1], x[,2])
#>  0.4015417
cor(x)[1, 2]
#>  0.4015417

You can see that the sample correlation matrix include all the pairwise sample correlations between any of the two columns.