class: center, middle, inverse, title-slide .title[ # Appendix ] .subtitle[ ## A series of helpful parts that did not fit neatly into lectures. ] .author[ ### ] --- ### Outline - Matrix Centering - Matrix Standardization - Generalized Least-Squares Estimation, coefficients - Ordinary Least-Squares Estimation, coefficients - Generalized Least-Squares Estimation, covariance matrix - Ordinary Least-Squares Estimation, covariance matrix - Matrix square-root - Viable test statistics for evaluating null hypotheses --- ### Matrix Centering Matrix centering is the process of subtracting from every column of a matrix its mean. After, the column means of the .blue[**mean-centered**] matrix will all be 0. Procrustes coordinates are mean-centered via GPA. Let `\(\mathbf{1}\)` be an `\(n \times 1\)` vector of 1s, with `\(n\)` the same number of observations as in the `\(n \times p\)` matrix of data, `\(\mathbf{Y}\)`. Let `\(\mathbf{H}\)` be the `\(n \times n\)` idempotent matrix found as `\(\mathbf{H} = \mathbf{1}(\mathbf{1}^T\mathbf{1})^{-1}\mathbf{1}^T\)`. Then, .blue[**mean-centered**] data are found as: `$$\mathbf{Z} = \mathbf{Y} - \mathbf{HY} = (\mathbf{I} - \mathbf{H}) \mathbf{Y}$$` --- ### Matrix Standardization Standardization of data might be useful for continuous data variables that are on different scales. (.red[This is never a concern for Procrustes coordinates.]). After standardization, each variable in a matrix will have a mean of 0 and each value will be a standard deviation. This is the matrix concept of variable standarization; i.e., `\(z=\frac{y - \hat\mu}{\hat\sigma}\)` Let `\(\mathbf{Z}_c\)` be the centered data found as `\(\mathbf{Z}_c =(\mathbf{I} -\mathbf{H})\mathbf{Y}\)`, as demonstrated in the slide on **Matrix Centering**. Let `\(\hat{\mathbf{\Sigma}}\)` be the `\(n \times n\)` residual covariance matrix found as `\((n-1)^{-1}\mathbf{Z}_c^T\mathbf{Z}_c\)`. Let `\(\mathbf{S}\)` be a diagonal matrix of standard deviations, found from the square-roots of the variances along the diagonal of `\(\hat{\mathbf{\Sigma}}\)`. Then, `\(\mathbf{Z}_s\)` are the standardized data found as `$$\mathbf{Z}_s = \mathbf{S}^{-1}\mathbf{Z}_c\mathbf{S}^{-1}$$` --- ### Generalized Least-Squares Estimation, coefficients The general formula for estimation of linear model coefficients is: `$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{\Omega}^{-1}\mathbf{X})\mathbf{X}^T\mathbf{\Omega}^{-1}\mathbf{Z}$$` These coefficients are *generalized*, as they assume that the non-independence among observations can be explained by the hypothetical covariance matrix, `\(\mathbf{\Omega}\)`. The equation above could be *simplified* if observations could be considered independent, i.e., substituting `\(\mathbf{I}\)` for `\(\mathbf{\Omega}\)`. This is shown in the next slide. --- ### Ordinary Least-Squares Estimation, coefficients The general formula for estimation of linear model coefficients is: `$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{\Omega}^{-1}\mathbf{X})\mathbf{X}^T\mathbf{\Omega}^{-1}\mathbf{Z}$$` In the case that **observations can be considered independent**, then we assume, `$$\mathbf{\Omega} = \mathbf{I}$$` Substituting `\(\mathbf{I}\)` for `\(\mathbf{\Omega}\)`, above, yields `$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{I}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\mathbf{I}^{-1}\mathbf{Z} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Z}$$` --- ### Generalized Least-Squares Estimation, covariance matrix The general formula for estimation of the residual covariance matrix from a linear model is: `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right)^T \mathbf{\Omega}^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right)$$` Where: + `\(\mathbb{E}(\mathbf{Z})\)` indicates the expected values, which are the fitted values, `\(\hat{\mathbf{Z}} = \mathbf{HZ} = \mathbf{X}(\mathbf{\tilde{X}}^T\mathbf{\Omega}^{-1}\mathbf{\tilde{X}})\mathbf{\tilde{X}}\mathbf{\Omega}^{-1}\mathbf{Z}\)`. + `\(\mathbf{\Omega}\)` is the `\(n \times n\)` covariance matrix that describes the non-independence among the `\(n\)` observations in the `\(n \times p\)` data matrix, `\(\mathbf{Z}\)`. + `\(k\)` is the number of parameters (columns) in the linear model design matrix, `\(\mathbf{X}\)`. Note the follwoing algebraic equivalency: `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \hat{\mathbf{Z}} \right)^T \mathbf{\Omega}^{-1} \left(\mathbf{Z} - \hat{\mathbf{Z}}\right)$$` `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \hat{\mathbf{Z}} \right)^T \mathbf{\Omega}^{-1/2} \mathbf{\Omega}^{-1/2}\left(\mathbf{Z} - \hat{\mathbf{Z}}\right)$$` --- ### Generalized Least-Squares Estimation, covariance matrix `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{\Omega}^{-1/2}\mathbf{Z} - \mathbf{\Omega}^{-1/2}\hat{\mathbf{Z}} \right)^T \left(\mathbf{\Omega}^{-1/2}\mathbf{Z} - \mathbf{\Omega}^{-1/2}\hat{\mathbf{Z}}\right)$$` `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{\Omega}^{-1/2} (\mathbf{I} - \mathbf{H})\mathbf{Z} \right)^T \left(\mathbf{\Omega}^{-1/2} (\mathbf{I} - \mathbf{H})\mathbf{Z} \right)$$` `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{\Omega}^{-1/2} \mathbf{I} - \mathbf{\tilde{H}}\mathbf{Z} \right)^T \left(\mathbf{\Omega}^{-1/2}\mathbf{I} - \mathbf{\tilde{H}}\mathbf{Z} \right)$$` In other words, the covariance matrix for GLS estimation requires using transformed GLS residuals, not GLS mean-centered residuals. --- ### Ordinary Least-Squares Estimation, covariance matrix The general formula for estimation of the residual covariance matrix from a linear model is: `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right)^T \mathbf{\Omega}^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right)$$` but because observations are considered independent, this simplifies to `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right)^T \mathbf{I}^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right),$$` which simplifies again to `$$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right)^T \left(\mathbf{Z} - \mathbb{E}\left(\mathbf{Z}\right)\right).$$` Where: + `\(\mathbb{E}(\mathbf{Z})\)` indicates the expected values, which are the fitted values, `\(\hat{\mathbf{Z}} = \mathbf{HZ} = \mathbf{X}(\mathbf{X}^T\mathbf{X})\mathbf{X}\mathbf{Z}\)`. + `\(k\)` is the number of parameters (columns) in the linear model design matrix, `\(\mathbf{X}\)`. + And because `\(\mathbb{E} \left(\mathbf{Z}\right) = \hat{\mathbf{Z}}\)` $$\hat{\boldsymbol{\Sigma}} = (n-k)^{-1} \left(\mathbf{Z} - \hat{\mathbf{Z}}\right)^T \left(\mathbf{Z} - \hat{\mathbf{Z}}\right) = (n-k)^{-1} \mathbf{E}^T\mathbf{E} $$ --- ### Matrix square-root A matrix square-root (of a symmetric matrix) is a matrix that fits the condition that: `$$\mathbf{\Omega}^{1/2}(\mathbf{\Omega}^{1/2})^T = \mathbf{\Omega}$$` There are multiple ways to obtain `\(\mathbf{\Omega}^{1/2}\)`, and choice of method will influence the properties of the result. **Diagonalization** The foremost method is **diagonalization** via SVD. If we solve, `$$\mathbf{\Omega} = \mathbf{V\Lambda V}^T,$$` then `$$\mathbf{\Omega}^{1/2} = \mathbf{V \Lambda^{1/2} V}^T,$$` where `\(\mathbf{\Lambda}^{1/2}\)` is a diagonal matrix with values equal to `\(\lambda_i^{1/2}\)`. --- ### Matrix square-root A matrix square-root (of a symmetric matrix) is a matrix that fits the condition that: `$$\mathbf{\Omega}^{1/2}(\mathbf{\Omega}^{1/2})^T = \mathbf{\Omega}$$` There are multiple ways to obtain `\(\mathbf{\Omega}^{1/2}\)`, and choice of method will influence the properties of the result. **Cholesky decomposition** Another common method is Cholesky decomposition, which finds, `$$\mathbf{\Omega} = \mathbf{TT}^T,$$` where `\(\mathbf{T}\)` is a triangular matrix. Unlike diagonalization, the the resultant matrix is not symmetric, so if a Cholesky decomposition is used, the transpose of the matrix, `\(\mathbf{T}^T\)`, has to be considered carefully to avoid computational errors. (It is easy to use `\(\mathbf{T}^T\)` rather than `\(\mathbf{T}\)`.) --- ### Viable test statistics for evaluating null hypotheses The following will be a pretty long list/table of possible statistics that can be estimated with RRPP. The purpose is not to become quickly comfortable with them but to see that there are various statistics that can be used for various purposes. Examples will indicate statistics used and this list can be quickly referenced. Focus more so on RRPP, what the null model is, how residuals are randomized, and how the opportunity to calculate these statistics is made. Recall, a good way to state the null hypothesis is: `$$\boldsymbol{\Theta}_{\tilde{\mathbf{H}}_{alt} |\tilde{\mathbf{H}}_0 } = \boldsymbol{\Theta}_{0},$$` meaning a matrix or vector has an expectation of elements of 0, if estimation from the alternative model is no different than from the null model. This is an unlikely event but a test can determine the probability that this event is so unlikely that it is unreasonable as a conclusion. --- ### Viable test statistics for evaluating null hypotheses |Statistic|Calculation|Type|OLS OK|GLS OK| Comment| |---|-----|---|--|--|-------| | `\(SS\)` | `\(\small SS = trace(\mathbf{S}_{effect})\)` | ANOVA| Yes | No | For OLS estimation, the total `\(SS\)` is consistent in every permutation. Not the case for GLS estimation.| | `\(R^2\)` | `\(\small R^2 = \frac{trace(\mathbf{S}_{effect})}{trace(\mathbf{S}_{total})}\)` | ANOVA| Yes | No | For OLS estimation, the total `\(SS\)` is consistent in every permutation. Not the case for GLS estimation. `\(\mathbf{S}_{total}\)` is the SSCP for residuals of the estimated mean.| | `\(MS\)` | `\(\small MS = (k-1)^{-1}trace(\mathbf{S}_{effect})\)` | ANOVA| Yes | No | For OLS estimation, the total `\(SS\)` is consistent in every permutation. Not the case for GLS estimation.| | `\(F\)` | `\(\small F = \frac{(k-1)^{-1}trace(\mathbf{S}_{effect})}{(n-k)^{-1}trace(\mathbf{S}_{residuals})}\)` | ANOVA| Yes | Yes | Even if the total `\(SS\)` changes across permutations, as a ratio, this statistic will accommodate the changes. `\(\mathbf{S}_{residuals}\)` is the SSCP for residuals in the alternative model. | --- ### Viable test statistics for evaluating null hypotheses |Statistic|Calculation|Type|OLS OK|GLS OK| Comment| |---|-----|---|--|--|-------| | Roy's maximum `\(\lambda\)` | `\(\lambda_{Roy} = \max(\lambda_i)\)` | MANOVA | Yes | Yes | Eigenvalues are obtained from eigenanalysis of `\(\mathbf{S}_{residuals}^{-1} \mathbf{S}_{effect}\)`| | Wilks' `\(\lambda\)` | `\(\lambda_{Wilks} = \prod(\frac{1}{1+ \lambda_i}) = \frac{\lvert\mathbf{S}_{effect}\rvert}{\lvert\mathbf{S}_{effect} + \mathbf{S}_{residuals}\rvert}\)` | MANOVA | Yes | Yes | Eigenvalues are obtained from eigenanalysis of `\(\mathbf{S}_{residuals}^{-1} \mathbf{S}_{effect}\)`| | Pillai's trace | `\(trace_{Pillais} = \sum(\frac{1}{1+ \lambda_i})\)` | MANOVA | Yes | Yes | Eigenvalues are obtained from eigenanalysis of `\(\mathbf{S}_{residuals}^{-1} \mathbf{S}_{effect}\)`| | Hotelling-Lawley trace| `\(trace_{HT} = \sum(\lambda_i)\)` | MANOVA | Yes | Yes | Eigenvalues are obtained from eigenanalysis of `\(\mathbf{S}_{residuals}^{-1} \mathbf{S}_{effect}\)`| *Note that `\(p \leq n-1\)` is required because of matrix inversion, and in some cases, unless `\(p << n\)`, results could be spurious.* --- ### Viable test statistics for evaluating null hypotheses |Statistic|Calculation|Type|OLS OK|GLS OK| Comment| |---|-----|---|--|--|-------| | `\(\log\)`-likelihood | `\(\small L(\hat{\boldsymbol{\Sigma}}_{residuals} \vert \mathbf{V}) =\)` `\(-\frac{Np}{2} log(2\pi)\)` `\(- \frac{1}{2} log(\lvert\mathbf{V}\rvert)\)` `\(-\frac{1}{2} vec(\mathbf{E}) ^T \mathbf{\Omega}^{-1}vec(\mathbf{E})\)` | Likelihood| Yes | Yes | `\(\mathbf{V} = \hat{\boldsymbol{\Sigma}}_{residuals} \otimes \mathbf{\Omega}\)`. This does not explicitly test a hypothesis but calculating the log-likelihood allows hypothesis testing. **The best use for multivariate data is to obtain an effect size `\((Z)\)` for the log-likelihood via RRPP.**| | Two-sample `\(Z\)`-test | `\(\small Z_{12} = \frac{\lvert(\theta_1 - \mu_1) - (\theta_2 - \mu_2)\rvert}{\sqrt{\sigma_1^2 + \sigma_2^2}}\)` | Likelihood| Yes | Yes | This test allows one to compare effect sizes for log-likelihoods from two RRPP distributions. `\(\mu\)` and `\(\sigma\)` refer to expected value and standard deviation of `\(\theta\)` (normalized RRPP distributions of log-likelihoods), respectively. ***There is no explicit reason that the null model has to be be nested within the alternative model.***| --- ### Viable test statistics for evaluating null hypotheses |Statistic|Calculation|Type|OLS OK|GLS OK| Comment| |---|-----|---|--|--|-------| | Coefficient `\(d\)` | `\(d_{\mathbf{b}} =(\mathbf{b}^T\mathbf{b})^{1/2}\)` | RRPP primary| Yes | Yes | Let `\(\mathbf{b}^T\)` be a vector of `\(\hat{\boldsymbol{\beta}}\)` Via RRPP, it is possible to test the null hypothesis that coefficients equal 0.| | Pairwise `\(d\)` | `\(d_{12} = \left[(\mathbf{x}_1^T\hat{\boldsymbol{\beta}} - \mathbf{x}_2^T\hat{\boldsymbol{\beta}})^T (\mathbf{x}_1^T\hat{\boldsymbol{\beta}} - \mathbf{x}_2^T\hat{\boldsymbol{\beta}})\right]^{1/2}\)` | RRPP primary| Yes | Yes | Determines if two estimates (like group means) among several are meaningfully different (in shape).| --- ### Viable test statistics for evaluating null hypotheses |Statistic|Calculation|Type|OLS OK|GLS OK| Comment| |---|-----|---|--|--|-------| | Vector MD | `\(\small MD = \lvert d_{12_A} - d_{12_B}\rvert\)` | RRPP secondary| Yes | Yes | For two groups, `\(A\)` and `\(B\)`, contrast the difference between consistent estimates for states 1 and 2. **This can only be done with RRPP.**| | Vector Correlation or Angle | `\(\small VC = \frac{\mathbf{v}_A^T\mathbf{v}_B}{\lvert\lvert \mathbf{v}_A\rvert\rvert \lvert\lvert \mathbf{v}_B\rvert\rvert}\)` `\(\small \theta = \cos^{-1}VC\)` | RRPP secondary| Yes | Yes | For two groups, `\(A\)` and `\(B\)`, find the vector correlation or angular difference between vectors that consistently estimate changes from state 1 to state 2, or slopes associated with a covariate. **This can only be done with RRPP.** Note that `\(\mathbf{v}\)` means `\((\mathbf{x}_1^T\hat{\boldsymbol{\beta}} - \mathbf{x}_2^T\hat{\boldsymbol{\beta}})\)` and `\(\lvert\lvert \mathbf{v}\rvert\rvert\)` is the length of `\(\mathbf{v}\)`.| |Trajectory MD| `\(\small MD = \lvert \sum d_A - \sum d_B \rvert\)` | RRPP secondary| Yes | Yes | For two groups, `\(A\)` and `\(B\)`, find the difference in *path distances* connecting estimates from state 1 to state 2 to state 3 to ... **This can only be done with RRPP.** This is a component of **trajectory analysis.** --- ### Viable test statistics for evaluating null hypotheses |Statistic|Calculation|Type|OLS OK|GLS OK| Comment| |---|-----|---|--|--|-------| | Trajectory Correlation or Angle | `\(\small VC = \frac{\mathbf{v}_A^T\mathbf{v}_B}{\lvert\lvert \mathbf{v}_A\rvert\rvert \lvert\lvert \mathbf{v}_B\rvert\rvert}\)` `\(\small \theta = \cos^{-1}VC\)` | RRPP secondary| Yes | Yes | For two groups, `\(A\)` and `\(B\)`, find the vector correlation or angular difference between first PC vectors found separately for each group. **This can only be done with RRPP.** | Trajectory Shape (Procrustes distance)| `\(PD = \left[(\mathbf{z}_A-\mathbf{z}_B)^T(\mathbf{z}_A-\mathbf{z}_B)\right]^{1/2}\)` |RRPP secondary| Yes | Yes | For two groups, `\(A\)` and `\(B\)`, find the Procrustes distance between two vectors of estimates, `\(\mathbf{z}_A\)` and `\(\mathbf{z}_B\)`, which were obtained from GPA on trajectory points. **This can only be done with RRPP.** | Any logical statistic | You define |RRPP| Maybe | Maybe| RRPP is most appropriate for exploring new frontiers. --- ### Viable test statistics for evaluating null hypotheses **General comments** + ANOVA statistics are based on dispersion of points in the multivariate space. **Because a distance between point estimates in the space is univariate, regardless of the dimensions of the space, the number of variables `\((p)\)` is irrelevant.** However, the covariances among variables of the data space do not influence the statistics. + MANOVA and likelihood statistics are based on covariance matrix determinants. **Because a determinant is singular if variables exceed observations, these statistics can be limited to cases of "long" rather than "wide" data sets.** However, in certain cases they can have more statistical power because of their innate ability to consider covariance structure. + RRPP-specific statistics tend to be more informative as descriptive statistics. **Because RRPP can be a proxy for the true sampling distributions of statistics that have no parametric probability density function, RRPP is uniquely qualified to test certain hypotheses.** ANOVA or MANOVA statistics often "go along for the ride" but are unneeded if good specific test statistics can be used. + Coefficient-specific test statistics generally offer little appeal, as statistics summarizing several coefficients as *effects* are easier to appreciate. But coefficients, themselves, can be informative. ---