Multi Variant Statistical Analysis

created : 2021-09-24T03:38:16+00:00
modified : 2021-12-26T08:49:51+00:00

Chapter 0. Introduction

0.1 Visualization of Multivariate Data

head(USArrests)
library(psych) # Scatter Plot Matrix
paris.panels(USArrests)

# Chernoff's Faces
library(aplpack)
faces(USArrests, face.type=1, cex=0.5))

# Star plot
stars(USArrests)

# 3-D scatter plot
library(scatterplot3d)
scatterplot3d(USArrests[, -1], type="h", highlight.3d=TRUE, angle=55, scale.y=0.7, pch=16, main="USArrests")

# 3-D rotated plot
library(rgl)
plot3d(USArrests[,-3])

# Profile plot
library(MASS)
parcoord(USArrests, col=c(1+(1:50)), var.label=T)

# Growth curves for longitudinal data
library(nlme)
head(Orthodont)
library(ggplot2)
p <- ggplot(data = Orthodont, aes(x = age, y = distance, group = Subject, colour=Subject))
p + geom_line()

p + geom_line() + facet_grid(. ~ Sex)

Summary of Introudction

Chapter 1. Linear algebra

1.1 Scalars, vectors, matrices

1.2 Operations of matrices

1.3 Trace and determinant for square matrcies

1.4 Rank of a matrix

1.5 Inverse matrix

1.6 Partitioned matrices

Example 1.6.2

1.7 Positive definite matrix

1.8 Orthogonal vectors and matrices

1.9 Eigenvalues and eigenvectors

1.10 Spectral decomposition

1.11 Cauchy-Schwarz inequality

1.12 Differentiation in Vectors and Matrices

1.13 Some useful quantities

1.14 Random vectors and matrices

1.14.1 Parameter vectors and matrices

2. Chapter 2 Multivariate Normal Distribtuion

2.1 Definitions

2.2 Properties of multivariate normal distribution

2.3 Estimation for sampling from a multivariate normal distributions

2.3.1 Likelihood function of a sample from a multivariate normal distribution

2.3.2 Maximum likelihood estimations (MLEs) from a multivariate normal distribution

2.4 Sampling distributions of $\bar X$ and $S$

2.5 Definition and Properties of the Wishart Distirubiton

2.6 Large sample distributions for $\bar X$ and $S$

2.7 Assessing the assumption of multivariate normality


2.8 Transformations to near normality


  1. Power transformations: When all observations are nonnegative, we may consider a family of power transformations. If some measurements are negative, then we first add a constant to all measurements and then apply a power transformation.:
    • $x_i + c \rightarrow (x_i + c)^{\lambda}$
  2. Box-Cox transformations: The Box-Cox transformation family is similar to the power transformation. This family continuously connects the logarithmic transform as the power $\lambda$ approaches zero.:
    • $x^{(\lambda)} = \begin{cases} \frac{x ^{\lambda} - 1}{\lambda} & for \lambda \not = 0 \\ log x & for \lambda = 0 \end{cases}$
    • for $x > 0$. We choose $\lambda$ by maximizing the log-likelihood function:
    • $l(\lambda) = - \frac{n}{2} log [\frac{1}{n} \sum_{i = 1}^n (x_i^{(\lambda)} - \bar {x ^ {(\lambda)}})^2] + (\lambda - 1) \sum_{i=1}^{n} log x_I$
  3. Note that we should not expect some transformation can always make the data close to normality.

Chapter 3 Hypothesis tests

3.1 Review of hypothesis tests for a univariate normal mean

3.1.1 When $\sigma^2$ is known

3.1.2 When $\sigma^2$ is unknown

3.2 Hypothesis test on one sample multivariate normal mean vector

3.2.1 When the covariance matrix $\Sigma$ is known

3.2.2 Hotelling’s $T^2$ Statistic: when $\Sigma$ is unknown

3.3 Hotelling’s $T^2$ and likelihodd ratio tests

3.4 Confidence regions and multiple testing

3.4.1 Simultaneous confidence intervals

Chapter 0. Introduction

0.1. Visualization of Multivariate Data

  1. Scatter Plot Matrix
     library(psych)
     pairs.panels(USArrests)
    
  2. Chernoff’s Faces
     library(aplpack)
     faces(USArrests, face.type=1, cex=0.5)
    
  3. Star plot
     stars(USArrests)
    
  4. 3-D scattor plot
     library(scatterplot3d)
     scatterplot3d(USArrests[,-3], type="ht", highlight.3d=TRUE,
                   angle=55, scale.y=0.7, pch=16, main="USArrests")
    
  5. 3-D roated plot
     library(rgl)
     plot3d(USArrests[, -3])
    
  6. Profile plot:
     library(MASS)
     parcoord(USArrests, col=c(1+(1:50)), var.label=T)
    
  7. Growth curves for longitudinal data
     library(nlme)
     library(ggplot2)
     p <- ggplot(data = Orthodont, aes(x = age, y=distnace, group = Sbuject, colour=Subject))
     p + geom_line()
     p + geom_line() + facet_grid(.~Sex)
    

    Summary of Introduction

    • Two or more variables are measured on each subject $\rightarrow$ multivariate data
    • Multivariate data are commonly correlated $\rightarrow$ we need statistical methods handling those data
    • We learn how to:
    • visualize and display multivariate data
    • apply multivariate normal distribution theory to the data
    • make inferences (estimation of parameters and testing hypothesis)
    • understand the structure of data
    • statistically extract information from multivariate data

Chapter 1. Linear algebra

1.1 Scalars, vectors, matrices

1.2 Operations of matrices

1.3 Trace and determinant for square matrices

1.4 Rank of a matrix

1.5 Inverse matrix

1.6 Partitioned matrices

1.8 Orthogonal vectors and matrices

1.9 Eigenvalues and eigenvectors

1.10 Spectral decomposition

1.11 Cauchy-Schwarz inequality

1.12 Differentaiation in Vectors and Matrices

1.13 Some useful quantities

1.14 Random vectors and matrices

1.14.1 Parameter vectors and matrices

1.14.2 Numerical summarization of multivariate data

2.1 Definitions

## 2.2 Properties of multivariate normal distribution

2.3 Estimation for sampling from a multivariate normal distributions

2.3.1 Likelihood function of a sample from a multivariate normal distribution

### 2.3.2 Maximum likelihood estimations (MLEs) from a multivariate normal distribution

2.4 Sampling distributions of $\bar X$ and $S$

2.5 Definition and Properties of the Wishart Distribution

2.6 Large sample distributions for $\bar X$ and $S$

Assessing the assumption of multivariate normality

  1. Marginal normality check for each variable (We use th univariate methods with each variable).
  2. Chi-square plot : Use $(X - \mu)^T \Sigma^{-1} (X - \mu) \sim X_m^2$ if $X \sim N_m(\mu, \Sigma)$.
    1. Calculate $d_j^2 = (X_j - \bar X)^T X^{-1}(X_j - \bar X)$.
    2. Rearrange $d_j^2$ in ascending order : $d_{(1)}^2 \le d_{(2)}^2 \le \cdots \le d_{(n)}^2$.
    3. Find $q_j$ such that $P(\chi_m^2 \le q_j) = \frac{j - \frac{1}{2}}{2}$
    4. Plot $(q_j, d_{(j)}^2)$
    5. Check whether the points are approximately on a straight line.
  3. Formal hypothesis test:
    • Mardia’s test based on based on multivariate extensions of skewness and kurtosis measures. \(MS = \frac{1}{6n} \sum_{i=1}^n \sum_{j=1}^n [ (x_i - \bar x)^T \hat \Sigma^{-1} (x_j - \bar x)]^3\) \(MK = \sqrt{\frac{n}{8m(m+2)}} \{\frac{1}{n} \sum_{i=1}^n [(x_i - \bar x)^T \hat \Sigma^{-1} (x_j - \bar x)]^2 - m(m+2)\}\) Under the null hypothesis of multivariate normality, the statistic MS will have approximately a chi-squared distribution with $\frac{1}{6}m(m+1)(m+2)$ degrees of freedom, and MK will be approximately standard normal $N(0,1)$.
    • Henze-Zirkler’s test based on the empirical characteristic function: \(HZ_\beta = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n e^{-\frac{\beta^2}{2}(x_i -x_j)^T \hat \Sigma^{-1} (x_i - x_j)} - \frac{2}{n(1 + \beta^2)^{m/2}} \sum_{i=1}^n e^{- \frac{\beta^2}{2(1 + \beta^2)} (x_ i - \bar x)^T \hat \Sigma^{-1}(x_i - \bar x)} + \frac{1}{(1 + 2 \beta^2)^{m/2}}\) where $\beta = \frac{1}{\sqrt{2}} [\frac{(2m+1)n}{4}]^{1 /(m+4)}$ is a common choice. The HZ test rejects normality if $HZ_\beta$ is too large.
    • There are many other tests such as Royston’s test, Coornik-Hansen’s test, and Energy test.
    • An R package MVN includes the above tests.

2.8 Transformations to near normality

Variable Transformation
Count $y$ $\sqrt{y}$
Proportion $\hat p$ $logit(\hat p) = log \frac{\hat p}{1 -\hat p}$
Correlation $r$ Fisher’s $z$ transform $z = log(\frac{1 + r}{1 - r})$

Chapter 3. Hypothesis tests

3.1 Review of hypothesis tests for a univariate normal mean

3.1.1 When $\sigma^2$ is known

3.1.2 When $\sigma^2$ is unknown

3.2 Hypothesis test on one sample multivariate normal mean vector

3.2.1 When the covariance matrix $\Sigma$ is known

3.2.2 Hotelling’s $T^2$ Statistic: when $\Sigma$ is unknown


  1. Note $T^2 = Z’(\frac{W}{v})^{-1}Z \sim \frac{vp}{v + 1 - p}F_{p, v + 1 - p}$, where $Z \sim N_p(0, \Sigma)$ and $W \sim Wishart(p, v, \Sigma)$ are independent.
  2. Note that $pF_{p, n-p} \rightarrow \chi_p^2$ as $n \rightarrow \infty$ so that $T^2 \sim \chi_p^2$ for a large sample under $H_0$.
  3. $T^2$ statistic is invariant under linear transformation, that is, Hotelling $T^2$ statistic does not depend on the measurement units.

3.3 Hotelling’s $T^2$ and likelihood ratio tests

3.4 Confidence regions and multiple testing

3.4.1 Simultaneous confidence intervals

3.5 Large sample inferences


Summary

  1. Hotelling’s $T^2$ test statistic for one sample normal mean vector
    • Reject $H_0: \mu = \mu_0 \text{ if } T^2 = n (\bar x - \mu_0)’S^{-1}(\bar - \mu_0) \ge \frac{p(n-1)}{n - p}F_{p, n-p, \alpha}$
  2. Confidence region and simultaneous confidence intervals with confidence level $1 - \alpha$.
Test name CI
Hotelling’s $T^2$ confidence region $n(\bar x - \mu)’S^{-1} (\bar x - \mu) \le \frac{(n-1)p}{n-p} F_{p, n-p, \alpha}$
Scheffe’s simultaneous CIs $\bar x_i \pm \sqrt{\frac{p(n-1)}{(n-p)} F_{p, n-p, \alpha}} \sqrt{\frac{s_{ii}}{n}}$
Bonferroni’s simultaneous CIs $\bar x_i \pm t_{n-1, \frac{\alpha}{2p}} \sqrt{\frac{s_{ii}}{n}}$

Chapter 4. Two Sample Comparision and MANOVA

4.1 Paired Comparisons and a Repeated Measures Design

Test name CI
Hotelling’s $T^2$ confidence region $n(\delta - \bar D)’S_d^{-1} (\delta - \bar D) \le \frac{(n-1)p}{n-p} F_{p, n-p, \alpha}$
Scheffe’s simultaneous CIs $\bar d_i \pm \sqrt{\frac{p(n-1)}{(n-p)} F_{p, n-p, \alpha}} \sqrt{\frac{s_{d_i}^2}{n}}$
Bonferroni’s simultaneous CIs $\bar d_i \pm t_{n-1, \frac{\alpha}{2p}} \sqrt{\frac{s_{d_i}^2}{n}}$

4.2 Comparing Mean Vectors from Independent Two Samples


4.2.1 When $\Sigma = \Sigma_1 = \Sigma_2$



4.4 Simultaneous Confidence Intervals for Treatment Effects

4.5 Testing for Equality of Covarinace Matrices

Chapter 5. Discriminant analysis and classification

5.1 Discriminant function

## 5.2 Discriminant functions for two groups

5.3 Classification analysis

5.4 Classification for multivariate normal distributions

5.5 discriminant analysis for several groups

5.5.1 Discrimant functions

5.6 Stepwise Discriminant Analysis

Chapter 6. Principal Component Analysis (PCA)

6.1 Introduction

6.2 Method

6.3 PCA from the correlation matrix

6.4 Plotting of principal components

6.5 How many components to retain?

Chapter 7 . Factor Analysis (FA)

7.1 Orthogonal factor model

7.2 Estimations

  1. (Principal component method) \(\begin{aligned} \Sigma &= \sum_{i=1}^p \lambda_i e_i e_i' = \sum_{i=1}^p (\sqrt{\lambda_i} e_i)(\sqrt{\lambda_i}e_i)' \\ & = (\sqrt{\lambda_i} e_1 : \cdots : \sqrt{\lambda_p} e_p) \begin{pmatrix}\sqrt{\lambda_1}e_1' \\ \vdots \\ \sqrt{\lambda_p} e_p'\end{pmatrix} \end{aligned}\) If $\lambda_{m+1}, \cdots, \lambda_p$ are small, then we can approximate the covariance matrix by: \(\Sigma \approx ( \sqrt{\lambda_1} e_1 : \cdots : \sqrt{\lambda_m} e_m) \begin{pmatrix} \sqrt{\lambda_1} e_1' \\ \vdots \\ \sqrt{\lambda_m} e_m' \end{pmatrix} + \begin{pmatrix} \psi_1 & 0 & \cdots & 0 \\ 0 & \psi_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \psi_p\end{pmatrix}\) where $\psi_i = \Sigma_{ii} - \sum_{j=1}^m l_{ij}^2$ Communalities are \(h_i^2 = l_{i1}^2 + \cdots + l_{im}^2\)
  1. (Principal factors) We initially estimate $\Phi^{(0)}$, and apply the principal component solution to $S - \Psi^{(r)}$. \(\begin{aligned} S - \Psi^{(r)} &= \sum_{j=1}^m \lambda_j^{(r)}e_j^{(r)}e_j^{(r)T} + \sum_{j=m+1}^p \lambda_j^{(r)} e_j^{(r)} e_j^{(r)T}\\ \Psi^{(r+1)} & = diag(S - L^{(r)}L^{(r)T}) \end{aligned}\)
    • Repeat these steps until converges. The common intial diagonal matrix $\Psi^{(0)}$ is chosen as $diag(S^{-1})$ for factoring the sample covariance matrix and $diag(R^{-1})$ for factoring the sample corelation matrix.
  2. (Maximum likelihood method) Assume $X_j - \mu = LF_j + \epsilon_j$ has a multivariate normal distribution. The likelihood function is given by \(\begin{aligned} L(\mu, \Sigma) &= \prod_{i=1}^N [\frac{1}{(2 \pi)^{p/2} \vert \Sigma \vert ^{1/2}} e^{-\frac{1}{2}(x_i - \mu)' \Sigma^{-1} (x_i - \mu)}] \\ &= \prod_{i=1}^N [\frac{1}{(2 \pi)^{p/2} \vert LL^T + \Psi \vert ^{1/2}} e^{-\frac{1}{2}(x_i - \mu)' (LL^T + \Psi)^{-1} (x_i - \mu)}] \\ \end{aligned}\) Since $LQQ^TL^T = LL^T$ for any $m \times m$ orthogonal matrix $Q$, it is necessary to impose a condition to obtain a unique maximum likelihood solution: we need $m(m-1)/2$ constraints. Note \((LL^T + \Psi)^{-1} = \Psi^{-1} - \Psi^{-1}L(I + L^T \Psi^{-1}L)^{-1}L^T \Psi^{-1}\) If we impose a condition that $L^T \Psi^{-1}L$ is a diagonal matrix (it is exactly $m(m-1)/2$ constraints), then we acan numerically find the MLEs. Hence, we assume \(L^T\Psi^{-1}L = \Delta \text{ a diagonal matrix }\) We numerically obtain $\hat L$ and $\hat \Psi$ asusuming $L^T \Psi^{-1}L$ is diagonal.

7.3 Hypothesis Testing on the Number of Factors

7.4 Factor Rotation

7.5 Factor scores

  1. (Weighted Least Squares Method) Bartlett suggested weighted least squares be used to estimate the common factor values: \(x - \mu = Lf + \epsilon\) \(Var(\epsilon_i) = \psi_i\) \(\text{Minimize } \sum_{i=1}^p \frac{\epsilon_i}{\psi_i} = \epsilon' \Psi^{-1} \epsilon = (x - \mu - Lf)' \Psi^{-1}(x - \mu - Lf)\) \(\hat f = (L' \Psi^{-1} L)^{-1} L' \Psi^{-1} (x - \mu)\) Hence, the estimated factor score is \(\begin{aligned} \hat f_j &= (\hat L' \hat \Psi^{-1} \hat L)^{-1} \hat L' \hat \Psi^{-1} (x_j - \bar x) \\ & = \hat \Delta ^{-1} \hat L' \hat \Psi^{-1} (x_j - \bar x) \end{aligned}\) When the correlation matrix is factored, \(\hat f_j = (\hat L_z' \hat \Psi_z ^{-1} \hat L_z)^{-1} \hat L_z' \hat \Psi_z^{-1} z_j = \hat \Delta_z ^{-1} \hat L_z' \hat \Psi_z^{-1} z_j\) where $z_j = D^{-1/2}(x_j \bar x)$ and $\hat \rho = \hat L_z \hat L_z’ + \hat \Psi_z$. When $\hat L$ and $\hat \Psi$ are determined by the maximum likelihood method, these estimates must satisfy the uniqueness condition, $\hat L’ \hat \Psi^{-1} \hat L = \hat \Delta$, a diagonal matrix.

  2. (Regression Method) Since $X - \mu = LF + \epsilon \sim N_p(0, LL’ + \Psi)$ and $F \sim N_m(0, I)$, they have aj oint normal distribution $N_{p+m} (0, \Sigma^*)$ where \(\Sigma^* = \begin{pmatrix} \Sigma = LL' + \Psi & L \\ L' & I\end{pmatrix}\) From the conditional mean vector of a partitioned normal random vector given the rest partitioned vector is \(E(F \vert x) = L'(LL' + \Psi)^{-1} (x - \mu)\) \(\hat f_j = \hat L'(\hat L \hat L' + \hat \Psi)^{-1} (x_j - \bar x)\) \(\hat f_j = \hat L' S^{-1} (x_j - \bar x)\) To reduce the ffects of a (possibly) incorrect determination of the number of factors, practitioners tend to calculate the factor scores by using $S$ (the original sample covarinace matrix) instead of $\hat \Sigma$. Inf a correlation matrix is factored, \(\hat f_j = \hat L_z ' R^{-1} z_j\)
    • Remark1. If rotated loadings $\hat L ^* = \hat L T$ are used in place of the original loadings, the subsequence factor scores $\hat f_j^$ are obtained by $\hat f_j^ = T \hat f_j$
  3. (Principal component method) When the principal component solution is used, it is common to estimate the factor scores by a ordinary least squares method: \(F = (L'L)^{-1} L'(X- \mu)\) \(\hat f_j = (\hat L' \hat L)^{-1} \hat L'(x_j - \bar x)\) Since $\hat L = (\sqrt{\lambda_1} \hat e_1 : \cdots : \sqrt{\lambda_m} \hat e_m)$, we have $\hat L \hat L = diag(\hat \lambda_1, \cdots, \hat \lambda_m)$ and \(\begin{aligned} \hat f_j & = (\hat L' \hat L)^{-1} \hat L' (x_j - \bar x) \\ & = \begin{pmatrix} \frac{1}{\lambda_1} & 0 & \cdots & 0 \\ 0 & \frac{1}{\lambda_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\lambda_m} \end{pmatrix} \begin{pmatrix} \sqrt{\hat \lambda_1} e_1 ' \\ \sqrt{\hat \lambda_2} e_2 ' \\ \vdots \\ \sqrt{\hat \lambda_m} e_m' \end{pmatrix} (x_j - \bar x) \\ &= \begin{aligned} \frac{1}{\sqrt{\lambda_1}} e_1' (x_j - \bar x) \\ \frac{1}{\sqrt{\lambda_2}} e_2' (x_j - \bar x) \\ \vdots \\ \frac{1}{\sqrt{\lambda_m}} e_m' (x_j - \bar x) \\ \end{aligned} \end{aligned}\)

7.6 Strategy for Factor Analysis

  1. Perform a principal component factor analysis, including a varimax rotation
  2. Perform a maximum likelihood factor analysis, including a varimax rotation
  3. Compare the solutions
  4. Repeat 1- 3 for other number of common factors $m$

Chapter 8 Multivariate regression

8.1 The Classical (Univariate) Linear Regression Model

8.2 Least Squares Estimation

8.3 Sum of Squares Decomposition

8.4 Inferences About the Regression Model

8.4.1 Likelihood ratio tests (LRTs)

8.5 Inferences from the Estimated Regression Function

8.6 Model Checking and Other Aspects of Regression

8.7 Multivariate Multiple Regression