class: center, middle, inverse, title-slide .title[ # 2. Linear Algebra ] .subtitle[ ## AKA: The important foundation for everything that follows! ] .author[ ### ] --- ### Linear Algebra: Goals Understand why this equation is a foundational equation used in geometric morphometrics (GM) `$$\small\mathbf{Z}=[trace[\mathbf{(Y-\overline{Y})(Y-\overline{Y})^T}]]^{-1/2}\mathbf{(Y-\overline{Y})H}$$` -- Understand why this equation is a foundational equation in multivariate statistics (and burn it into your memory) `$$\hat{\boldsymbol{\beta}}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Z}}\right )$$` -- Understand why this equation is a universal equation for describing the alignment of shape data (or any multivariate data) to an alternative set of data, covariance structure, or model `$$\mathbf{A}^T\mathbf{Z} =\mathbf{UDV}^T$$` --- ### Overview: - Scalars and Vectors - Vector addition and subtraction - Vector multiplication - Matrices (collections of vectors) - Special Matrices - Matrix inversion - Special Matrix Properties - General Linear Model Preview - Decompositions - Summary --- .center[ # Part I. Basics ## (vector and matrix operations) ] --- ### Scalars and Vectors Scalars `$$a = 5$$` `$$b = 2$$` `$$c = -3$$` Vectors `$$\mathbf{a} = \begin{pmatrix} 5 \\ 3 \\ -2 \end{pmatrix}$$` `$$\mathbf{a}^T = \begin{pmatrix} 5 & 3 & -2 \end{pmatrix}$$` The superscript, `\(^T\)`, means vector transpose. Note, `\(\left(\mathbf{a}^T \right)^T = \mathbf{a}\)` --- ### Scalars and Vectors (cont.) Vector addition/subtraction (consistent orientation is important) .pull-left[ `$$\mathbf{a} = \begin{pmatrix} 5 \\ 3 \\ -2 \end{pmatrix}$$` `$$\mathbf{b}^T = \begin{pmatrix} -5 & 0 & 1 \end{pmatrix}$$` ] .pull.right[ `$$\mathbf{a}^T + \mathbf{b}^T= \begin{pmatrix} 0 & 3 & -1 \end{pmatrix}$$` `$$\mathbf{a} - \mathbf{b} = \begin{pmatrix} 10 \\ 3 \\ -3 \end{pmatrix}$$` ] --- ### Vector Multiplication #### Not this! `$$\mathbf{a}^T = \begin{pmatrix} 5 & 3 & -2 \end{pmatrix}$$` `$$\mathbf{b}^T = \begin{pmatrix} -5 & 0 & 1 \end{pmatrix}$$` `$$\mathbf{a}^T \times \mathbf{b}^T= \begin{pmatrix} -25 & 0 & -2 \end{pmatrix}$$` But it is kind of that, in part... --- ### Vector Multiplication (cont.) Vector multiplication is both multiplication and summation of scalars. Before performing vector multiplication, there has to be consistency with **inner dimensions**. - Dimensions `\(\mathbf{a}: 3 \times 1\)` - Dimensions `\(\mathbf{a}^T: 1 \times 3\)` - Dimensions `\(\mathbf{b}: 3 \times 1\)` - Dimensions `\(\mathbf{b}^T: 1 \times 3\)` For any vector product, arrange the dimensions for the attempted product and see if inner dimensions match; e.g., `$$\mathbf{a} \times \mathbf{b}: 3 \times \color{blue} {1 \times 3} \times 1$$` Does not match, so this product is not possible --- ### Vector Multiplication (cont.) Vector multiplication is both multiplication and summation of scalars. Before performing vector multiplication, there has to be consistency with **inner dimensions**. - Dimensions `\(\mathbf{a}: 3 \times 1\)` - Dimensions `\(\mathbf{a}^T: 1 \times 3\)` - Dimensions `\(\mathbf{b}: 3 \times 1\)` - Dimensions `\(\mathbf{b}^T: 1 \times 3\)` For any vector product, arrange the dimensions for the attempted product and see if inner dimensions match; e.g., `$$\mathbf{a}^T \times \mathbf{b}: 1 \times \color{blue} {3 \times 3} \times 1$$` **This matches!** Multiplication is possible. --- ### Vector Multiplication (cont.): Vector Inner-products If inner dimensions match, the multiplication is possible. The product has dimensions equal to the outer dimensions of the match. There are two types of products: - Inner-product: the vector product results is a single scalar, i.e., the outer dimensions are `\(1 \times 1\)` - Outer-product: the vector product results in a series of scalar products with number and arrangement defined by outer dimensions --- ### Vector Multiplication (cont.): Vector Inner-products (cont.) #### Vector Inner-Product For `\(n \times 1\)` vectors `$$\mathbf{a}^T \mathbf{b} = \sum_{i=1}^n a_ib_i$$` For example, `$$\mathbf{a}^T = \begin{pmatrix} 5 & 3 & -2 \end{pmatrix}$$` `$$\mathbf{b} = \begin{pmatrix} -5 \\ 0 \\ 1 \end{pmatrix}$$` `$$\mathbf{a}^T \mathbf{b} = \left(5\times -5 \right) + \left(3 \times 0 \right) + \left(-2 \times 1 \right) = -27$$` ###### *Helpful mnemonic: "Run Amok Computational Demon!" for row-across and column-down pattern (will be helpful for matrices)* --- ### Vector Multiplication (cont.): Vector Outer Products If inner dimensions match, the multiplication is possible. The product has dimensions equal to the outer dimensions of the match. There are two types of products: - Inner-product: the vector product results is a single scalar, i.e., the outer dimensions are `\(1 \times 1\)` - Outer-product: the vector product results in a series of scalar products with number and arrangement defined by outer dimensions --- #### Vector Outer-Product For `\(n \times 1\)` vectors `$$\mathbf{a}\mathbf{b}^T = \begin{pmatrix} a_1b_1 & a_1b_2 & a_1b_3 & \cdots & a_1b_n\\ a_2b_1 & a_2b_2 & a_2b_3 & \cdots & a_2b_n\\ a_3b_1 & a_3b_2 & a_3b_3 & \cdots & a_3b_n\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ a_nb_1 & a_nb_2 & a_nb_3 & \cdots & a_nb_n\\ \end{pmatrix} = \mathbf{M}_{n \times n}$$` --- ### Vector Multiplication (cont.): Vector Outer Products (cont.) ### Important Notes `$$\mathbf{a}\mathbf{b}^T = \begin{pmatrix} a_1b_1 & a_1b_2 & a_1b_3 & \cdots & a_1b_n\\ a_2b_1 & a_2b_2 & a_2b_3 & \cdots & a_2b_n\\ a_3b_1 & a_3b_2 & a_3b_3 & \cdots & a_3b_n\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ a_nb_1 & a_nb_2 & a_nb_3 & \cdots & a_nb_n\\ \end{pmatrix} = \mathbf{M}_{n \times n}$$` - `\(\mathbf{M}\)` is called a matrix, with dimensions defined - `\(\mathbf{M}\)` comprises row vectors and column vectors - because `\(\mathbf{M}\)` has the same number of rows and columns, it also has a diagonal vector - the sum of the diagonal vector, called the **trace**, is the inner-product of `\(\mathbf{a}\)` and `\(\mathbf{b}\)` --- ### Matrices A matrix is more than a vector outer-product (more precisely, a vector outer-product is merely one type of matrix). A matrix is a collection of vectors, arranged in a specific way, such that linear algebra operations can be carried out, systematically. For example, the arrangement of vectors in a matrix indicates how to find a series of inner-products and display their results. The most basic matrices for statistics are **data frames**. In data frames, column vectors are variables and row vectors are observations. We can demonstrate this easily in `R` with the `data.frame` and `matrix` functions. ``` ## y1 y2 ## obs.1 2 -1 ## obs.2 3 -2 ## obs.3 2 0 ## obs.4 5 0 ## obs.5 6 1 ## obs.6 8 -1 ``` --- ### Matrices (Cont.) ``` ## y1 y2 ## obs.1 2 -1 ## obs.2 3 -2 ## obs.3 2 0 ## obs.4 5 0 ## obs.5 6 1 ## obs.6 8 -1 ``` Which is R's way of saying, `$$\small{\mathbf{Y} = \begin{pmatrix} 2 & -1 \\ 3 & -2 \\ 2 & 0 \\ 5 & 0 \\ 6 & 1 \\ 8 & -1 \\ \end{pmatrix}}$$` `\(\mathbf{Y}\)` is a matrix and a data frame. Row vectors are observations for the variables represented as column vectors. The dimensions of `\(\mathbf{Y}\)` are `\(n \times p\)` for the `\(n\)` observations of subjects for `\(p\)` variables. --- ### Matrices (Cont.) #### This time with additional `R` code ``` r y1 <- c(2, 3, 2, 5, 6, 8) y2 <- c(-1, -2, 0, 0, 1, -1) *Y <- data.frame(y1 = y1, y2 = y2) rownames (Y) <- paste("obs", 1:6, sep = ".") Y ``` ``` ## y1 y2 ## obs.1 2 -1 ## obs.2 3 -2 ## obs.3 2 0 ## obs.4 5 0 ## obs.5 6 1 ## obs.6 8 -1 ``` --- ### Matrix Addition, Subtraction, Multiplication .pull-left[ #### Matrix addition and subtraction Matrix addition and subtraction is no different than vector addition and subtraction. Matrices have to have *commensurate* dimensions (same `\(n \times p\)`). #### Matrix multiplication Matrix multiplication is nothing more than systematic calculation of vector inner-products and arrangement of these into precise corresponding elements of a new matrix. Like vectors, inner dimensions must match and the product is defined by the outer dimensions. The simplest way to define this is as follows ] .pull-right[ Let `\(\mathbf{X}\)` be an `\(n \times k\)` matrix and let `\(\mathbf{Y}\)` by an `\(n \times p\)` matrix, such that `$$\small \mathbf{X} = \begin{pmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_k \end{pmatrix}$$` and `$$\small \mathbf{Y} = \begin{pmatrix} \mathbf{y}_1 & \mathbf{y}_2 & \cdots & \mathbf{y}_p \end{pmatrix}$$` Note that each `\(\mathbf{x}_i\)` column vector is `\(n \times 1\)` in dimension for `\(k\)` vectors and each `\(\mathbf{y}_i\)` is `\(n \times 1\)` in dimension for `\(p\)` vectors. Thus, ] --- ### Matrix Addition, Subtraction, Multiplication (Cont.) `$$\mathbf{X} = \begin{pmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_k \end{pmatrix}$$` and `$$\mathbf{Y} = \begin{pmatrix} \mathbf{y}_1 & \mathbf{y}_2 & \cdots & \mathbf{y}_p \end{pmatrix}$$` and `$$\mathbf{X}^T\mathbf{Y} = \begin{pmatrix} \mathbf{x}_1^T\mathbf{y}_1 & \mathbf{x}_1^T\mathbf{y}_2 & \cdots & \mathbf{x}_1^T\mathbf{y}_p\\ \mathbf{x}_2^T\mathbf{y}_1 & \mathbf{x}_2^T\mathbf{y}_2 & \cdots & \mathbf{x}_2^T\mathbf{y}_p\\ \vdots & \vdots & & \vdots\\ \mathbf{x}_k^T\mathbf{y}_1 & \mathbf{x}_k^T\mathbf{y}_2 & \cdots & \mathbf{x}_k^T\mathbf{y}_p\\ \end{pmatrix}$$` The matrix product is a matrix with `\(k \times p\)` inner-products --- ### Matrix Multiplication (Cont.) #### Matrix multiplication example (using R script) ``` r n <- 50 p <- 3 y1 <- rnorm(50) y2 <- rnorm(50) y3 <- rnorm(50) Y <- as.matrix(data.frame(y1 = y1, y2 = y2, y3 = y3)) rownames (Y) <- paste("obs", 1:n, sep = ".") dim(Y) ``` ``` ## [1] 50 3 ``` --- ### Matrix Multiplication (Cont.) #### Example data frame, `\(\mathbf{Y}\)`
--- ### Matrix Multiplication (Cont.) #### Matrix multiplication example (using R script) ``` r X <- cbind(Intercept = rep(1, n), group = rep(c(0, 1), n / 2)) rownames(X) <- rownames(Y) ```
--- ### Matrix Multiplication (Cont.) #### Matrix multiplication example (using R script) .med[ ``` r t(X) %*% Y ``` ``` ## y1 y2 y3 ## Intercept 6.295172 0.7029465 0.3656386 ## group -1.744153 -2.7077558 -1.2747868 ``` ``` r crossprod(X, Y) ``` ``` ## y1 y2 y3 ## Intercept 6.295172 0.7029465 0.3656386 ## group -1.744153 -2.7077558 -1.2747868 ``` ] + Notice that the row names are the column names of `\(\mathbf{X}\)` and the column names are the column names of `\(\mathbf{Y}\)`. + Also notice the `crossprod` function is short for *matrix of cross-products*, which we will refer to simply as the **matrix cross-product**. This is a bit unfortunate nomenclature, as the meaning is different than *vector cross-product*, a vector produced by multiplication in a certain way of two vectors, different from the *dot-product*, which is basically the same as the inner-product. (Try not to let the terminology get to you!) --- ### ~~Matrix Division~~ This is not something we can do. We can invert some matrices, but we will come back to this in a moment. ### Important Matrix and Vector Multiplication Properties - Any matrix or vector multiplied by a scalar multiplies every element by the scalar. The name, "scalar," means that every element is scaled. - Like vectors that have inner-products (dot products) or outer-products, sometimes matrices can be similar. The tendency though is to call these cross-product matrices. - A vector inner-product of the form, `\(\mathbf{x}^T\mathbf{x}\)` is a generalized method of squaring. It is precisely the sum of squared values, rather than just a squared value. - A matrix **cross-product** of the form, `\(\mathbf{X}^T\mathbf{X}\)` is a generalized method of squaring. This matrix has summed squared values of column vectors of `\(\mathbf{X}\)` on a diagonal vector and summed cross-products between vectors in the off-diagonal elements. - A matrix **outer-product** of the form, `\(\mathbf{X}\mathbf{X}^T\)` can also be considered a generalized method of squaring. This matrix has summed squared values of row vectors of `\(\mathbf{X}\)` on a diagonal vector and summed cross-products between vectors in the off-diagonal elements. - The **trace** of a symmetric matrix is the sum of diagonal elements. One convenience is `\(trace(\mathbf{X}^T\mathbf{X}) = trace(\mathbf{X}\mathbf{X}^T)\)`. --- ### Important Matrix and Vector Multiplication Properties (cont.) - A matrix is **square** when it has equal rows and columns, e.g., `\(n \times n\)`. - If `\(\mathbf{XY}\)` is possible and `\(\mathbf{YX}\)` is possible, they will not have the same product unless both matrices are square and `\(\mathbf{X} = \mathbf{Y}\)`. The dimensions of matrix products are also likely different. - The next slide will describe some special matrices. --- .center[ # Part II. Special Matrices ## And some properties that make them special ] --- ### Special Matrices | Matrix and description | Example | Comments | | ------ | ---- | ------ | |**Square matrix:** Any matrix with the same number of rows and columns, e.g., `\(n \times n\)` | `\(\small{\mathbf{S} = \begin{pmatrix} 2 & 3 & 1 \\ 4 & 7 & 5\\ 1 & 10 & 0 \\ \end{pmatrix}}\)` | These matrices, themselves, are generally not too compelling, unless there is a relationship between the row and column variables.| |**Symmetric matrix:** A square matrix that has symmetry above and below the diagonal| `\(\small{\mathbf{S} = \begin{pmatrix} 2 & 3 & 1 \\ 3 & 7 & 10\\ 1 & 10 & 0 \\ \end{pmatrix}}\)` | These matrices have the condition that `\(\mathbf{S} = \mathbf{S}^T\)`.| |**Diagonal matrix:** A symmetric matrix that has non-0 elements only along the diagonal; all others are 0.| `\(\small{\mathbf{D} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 7 & 0\\ 0 & 0 & 0 \\ \end{pmatrix}}\)` | Note that the diagonal elements can still be 0. If all diagonal elements are equal, we can say `\(\mathbf{D} = a\mathbf{I}\)`, for a scalar, `\(a\)`, and an identity matrix, `\(\mathbf{I}\)`.| |**Identity matrix:** A diagonal matrix that has diagonal elements equal to 1.| `\(\small{\mathbf{I} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \\ \end{pmatrix}}\)` | This is the matrix equivalent to a value of 1, such that `\(\mathbf{AI} = \mathbf{A}\)` --- ### Special Matrices (cont.) | Matrix and description | Example | Comments | | ------ | ---- | ------ | |**Orthogonal matrix:** A symmetric matrix that has the property that `\(\mathbf{AA}^T = \mathbf{A}^T\mathbf{A} = \mathbf{I}\)`.| `\(\small{\mathbf{A} = \begin{pmatrix} 0.7071 & 0.7071 \\ 0.7071 & -0.7071 \end{pmatrix}}\)` | `\(\small{\mathbf{AA}^T} = \begin{pmatrix} 0.7071 & 0.7071 \\ 0.7071 & -0.7071 \end{pmatrix} \begin{pmatrix} 0.7071 & 0.7071 \\ 0.7071 & -0.7071 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\)` | |**Orthonormal matrix:** An orthogonal matrix with unit vectors (length of 1) `\(\mathbf{AA}^T = \mathbf{A}^T\mathbf{A} = \mathbf{I}\)`.| `\(\small{\mathbf{A} = \begin{pmatrix} 0.7071 & 0.7071 \\ 0.7071 & -0.7071 \end{pmatrix}}\)` | `\(\small{\lvert\lvert\mathbf{a}_1 \rvert\rvert =(\mathbf{a}_1^T\mathbf{a}_1)^{1/2} = (0.7071^2 +0.7071^2)^{1/2} =1}\)` `\(\small{\lvert\lvert\mathbf{a}_2\rvert\rvert=(\mathbf{a}_2^T\mathbf{a}_2)^{1/2} = (0.7071^2 + (-0.7071)^2)^{1/2} =1}\)`| |**Idempotent matrix:** A symmetric matrix with the property `\(\mathbf{HH} = \mathbf{H}\)` | `\(\small{\mathbf{H} = \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}}\)` | `\(\begin{align*} \small{\mathbf{HH}} = \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix} \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix} \\ = \small{\begin{pmatrix} 0.5^2 + 0.5^2 & 0.5^2 + 0.5^2 \\ 0.5^2 + 0.5^2 & 0.5^2 + 0.5^2 \end{pmatrix} = \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}} \end{align*}\)` ***Idempotency is a property that is important for orthogonal projection, covered next*** --- ### Special Matrices (cont.) and Special matrix properties | Matrix and description | Example | Comments | | ------ | ---- | ------ | |**Projection matrix:** Any matrix that can be multiplied with data that map the data to new vectors (transformation), taking the form, `\(\mathbf{PY}\)` or `\(\mathbf{YP}\)`.| `\(\small{\mathbf{P} = \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}}\)` | Although projection (transformation) matrices can have many forms, typical ones are: "hat" matrices, `\(\mathbf{H}\)`, which if idempotent perform orthogonal projections, but if merely symmetric perform shears (more on this soon); and eigenvectors ( `\(\mathbf{V}\)` ). |**Positive-definite matrix:** If `\(\mathbf{X}\)` is symmetric, `\(\mathbf{a}^T \mathbf{Xa}\)` is a positive scalar if `\(\mathbf{a}\)` has non-zero values. Thus, `\(\mathbf{X}\)` is positive-definite.| `\(\small{\mathbf{X} = \begin{pmatrix} 1 & 0.5 \\ 0.5 & 2 \end{pmatrix}}\)` | This also means all eigenvalues of `\(\mathbf{X}\)` are non-negative (more later). ***Covariance matrices for shape data are often not positive-definite because (1) generalized Procrustes analysis creates redundancies in the data and (2) there are often more shape variables than observations. We will cover these issues in much detail, later*** It should be clear that matrices can have several properties. For example, a *projection matrix* can be symmetric and idempotent. --- ### Special Matrices (cont.) and Special matrix properties #### Some additional notes - Some sources refer to "triangular" matrices. These might be symmetric matrices, where the elements above a diagonal match the elements below. However, these might refer to a matrix that looks like a symmetric matrix but values below the diagonal are all 0. - A **matrix cross-product** involving a single matrix, e.g., `\(\mathbf{X}^T\mathbf{X}\)`, will always produce a symmetric matrix. --- .center[ # Part III. Matrix projection ## This needs further discussion because it is super important! ] --- ### Matrix Projection Matrix multiplication is often used for the purpose of projection of data, `\(\mathbf{Y}\)`. In a most technical and overly mathematical sense, projection is an alignment of data in a manifold of the data space (where Euclidean geometry is at least approximately appropriate). In a simpler sense, projection is a method of rotating, stretching, flipping, and/or scaling a data space. This is the essence of linear models - finding a constrained explanation of the data. Two ways to understand projection: `\(\mathbf{HY}\)`: where `\(\mathbf{H}\)` is an `\(n \times n\)` projection matrix `\(\mathbf{YH}\)`: where `\(\mathbf{H}\)` is a `\(p \times p\)` rotation/shear/scale matrix Both of these will make more sense with exposure to linear model uses, but for now, let's consider what the latter does. --- ## Matrix Projection: Geometric Interpretations To best understand attributes of high-dimensional data spaces and algebra applied to them, consider two-dimensional data spaces and realize the algebra can be generalized to higher dimensions. #### The Result of `\(\mathbf{YH}\)`, where `\(\mathbf{H}\)` is a `\(p \times p\)` matrix `\(\mathbf{H}\)` type | Description | Result/Comment :------- | :----------- | :-------------- `\(\mathbf{I}\)` | Identity matrix | No change `\(c\mathbf{I}\)` | Scaled identity matrix | Enlarge (expand) or reduce (shrink). `\(c\mathbf{I}\)` is not idempotent. `\(\mathbf{D}\)` | Diagonal matrix | Stretching of axes (variables). `\(\mathbf{D}\)` is not idempotent. `\(\mathbf{H}_{orthogonal}\)` | `\(\mathbf{H} = \mathbf{H}^T\)` | Rigid (orthogonal) rotation of `\(\mathbf{Y}\)` (see note). `\(\mathbf{H}\)` is idempotent. `\(\mathbf{H}_{oblique}\)` | `\(\mathbf{H} = \mathbf{H}^T\)` or `\(\mathbf{H} \neq \mathbf{H}^T\)` | Shear of `\(\mathbf{Y}\)`. `\(\mathbf{H}_{oblique}\)` is not orthogonal. `\(\mathbf{H}\)` is not idempotent. --- ### Matrix Projection: Geometric Interpretations To best understand attributes of high-dimensional data spaces and algebra applied to them, consider two-dimensional data spaces and realize the algebra can be generalized to higher dimensions. #### The Result of `\(\mathbf{YH}\)`, where `\(\mathbf{H}\)` is a `\(p \times p\)` matrix **Note** that `\(\mathbf{H}_{orthogonal}\)` can be viewed as a rotation matrix, which makes sense in low-dimensional data spaces. For example, in a two-dimension data space, it can be the same as: `$$\mathbf{H}_{orthogonal} = \begin{pmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{pmatrix}$$` for a rotational angle, `\(\theta\)`, in a plane. --- ### Matrix Projection: Geometric Interpretations (Cont.) <img src="LectureData/02.matrices/projections.png" width="90%" /> --- ### Matrix Projection: Statistics Statistical analysis can largely be thought of as the comparison of alternate projections (transformations) of data, based on linear models. For example, we might compare the projections between two models: `$$\mathbf{H}_{alternative}\mathbf{Y} - \mathbf{H}_{null}\mathbf{Y} = \Delta\mathbf{H}\mathbf{Y}$$` using, perhaps, a statistic like `$$MS= trace \left((\Delta\mathbf{H}\mathbf{Y})^T \Delta\mathbf{H}\mathbf{Y}\right)/ (k_{alternative} - k_{null})$$` Note that `\(trace\)` means the sum of diagonal elements. We will learn more about statistical methods, but nearly each method will be either a comparison of projections or a decomposition of a matrix obtained from a projection. --- .center[ # Part IV. Matrix inversion ## A rather challenging operation that thankfully computers can perform under certain circumstances. ] --- ### Matrix Inversion - Matrices cannot be divided. Much like division of a scalar is the same as multiplication by its inverse, some matrices can be inverted. - Matrix needs to be square, *non-singular*; symmetric, non-singular is optimal. (A positive-definite matrix is non-singular but a non-singular matrix can be either positive-definite or positive semidefinite.) - Matrix inversion fulfills the property that `\(\mathbf{AA}^{-1} = \mathbf{I}\)` - Illustration is simplest with a `\(2 \times 2\)` matrix Let `\(\mathbf{A}\)` be a `\(2 \times 2\)` matrix, `$$\mathbf{A} = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$$` Then, `$$\mathbf{A}^{-1} = \left| \mathbf{A} \right|^{-1} \begin{pmatrix} d & -b\\ -c & a \end{pmatrix}$$` where `\(\left| \mathbf{A} \right| = ad-bc\)` If `\(|\mathbf{A}| = 0\)`, the matrix is *singular* and cannot be inverted (dividing by 0 from matrices). --- ### Matrix Inversion (Cont.) #### Example `$$\mathbf{A} = \begin{pmatrix} 3 & 4 \\ 4 & 6 \end{pmatrix}$$` `$$\left| \mathbf{A} \right| = 18-16$$` `$$\left| \mathbf{A} \right|^{-1} = 0.5$$` `$$\mathbf{A}^{-1} = \left| \mathbf{A} \right|^{-1} \begin{pmatrix} d & -b\\ -c & a \end{pmatrix} = 0.5 \begin{pmatrix} 6 & -4\\ -4 & 3 \end{pmatrix} = \begin{pmatrix} 3 & -2\\ -2 & 1.5 \end{pmatrix}$$` --- ### Matrix Inversion (Cont.) #### Example Confirm (using R script) .pull-left[ .med[ ``` r A = matrix(c(3, 4, 4, 6), nrow = 2, ncol = 2) A ``` ``` ## [,1] [,2] ## [1,] 3 4 ## [2,] 4 6 ``` ``` r solve(A) # solve function finds inverse ``` ``` ## [,1] [,2] ## [1,] 3 -2.0 ## [2,] -2 1.5 ``` ]] .pull-right[.med[ ``` r A %*% solve(A) ``` ``` ## [,1] [,2] ## [1,] 1 0 ## [2,] 0 1 ``` ]] --- ### Matrix Inversion (Cont.) - Note, inversion with 3 × 3 matrices or larger, not a trivial exercise .pull-left[ - Let `\(\mathbf{A}\)` be a `\(3 \times 3\)` matrix, `$$\mathbf{A} = \begin{pmatrix} a & b & c\\ d & e & f\\ g & h & i\\ \end{pmatrix}$$` ] .pull-right[ - Let's also assume before proof that `\(\mathbf{A}\)` is invertible. ] ***Algorithm:*** + Find the ***nine*** possible `\(2 \times 2\)` matrices within `\(\mathbf{A}\)`, e.g., .center[ `\(\begin{pmatrix} a & b\\ d & e\\ \end{pmatrix}\)`, `\(\begin{pmatrix} a & c\\ d & f\\ \end{pmatrix}\)`, `\(\begin{pmatrix} b & c\\ e & f\\ \end{pmatrix}\)`, etc. ] --- ### Matrix Inversion (Cont.) .pull-left[ - Let `\(\mathbf{A}\)` be a `\(3 \times 3\)` matrix, `$$\mathbf{A} = \begin{pmatrix} a & b & c\\ d & e & f\\ g & h & i\\ \end{pmatrix}$$` ] .pull-right[ - Let's also assume before proof that `\(\mathbf{A}\)` is invertible. ] ***Algorithm:*** + Find the determinant of each and give them names. Every other determinant is negative. The names correspond to the rows and columns not represented by each of the ***nine*** elements: .center[ `\(A = (ei - fh)\)`, `\(D = -(bi - ch)\)`, `\(G = (bf - ce)\)`, `\(B = -(di - fg)\)`, `\(E = (ai - cg)\)`, `\(H= -(af - cd)\)`, `\(C = (dh - eg)\)`, `\(F = -(ah - bg)\)`, `\(I = (ae - bd)\)` ] --- ### Matrix Inversion (Cont.) .pull-left[ - Let `\(\mathbf{A}\)` be a `\(3 \times 3\)` matrix, `$$\mathbf{A} = \begin{pmatrix} a & b & c\\ d & e & f\\ g & h & i\\ \end{pmatrix}$$` ***Algorithm:*** - Find the determinant of `\(\mathbf{A}\)`: `\(\left| \mathbf{A} \right| = aA + bB + cC\)` ] .pull-right[ - If the determinant is greater than 0, then `\(\mathbf{A}\)` is invertible. ] - Make a new matrix: `$$\mathbf{A}^{-1} = \frac{1}{\left| \mathbf{A} \right|} \begin{pmatrix} A & B & C\\ D & E & F\\ G & H & I\\ \end{pmatrix}^T = \frac{1}{\left| \mathbf{A} \right|} \begin{pmatrix} A & D & G\\ B & E & H\\ C & F & I\\ \end{pmatrix}$$` --- ### Matrix Inversion (Cont.) For the sake of sanity, we will not consider `\(4 \times 4\)` or larger matrices! Generally speaking, software packages also do not ***solve*** matrix inverses in a way that makes sense like the previous demonstration. (Also, `solve` is a function in `R` for finding a matrix inverse.) Rather, they use a library of numerical routines that find computational solutions that converge on algebraic solutions. For example, `R` relies on the **L**inear **A**lgebra **Pack**age (LAPACK), which relies on **B**asic **L**inear **A**lgebra **S**ub-programs (BLAS), the subroutines that provide building blocks for computational routines. `\(^*\)` **Basically, understanding the concept is sufficient.** One can trust that computers and their programs can be relied upon for efficient matrix inversion, within reason. Matrix inversion is, however, a ***Big Data*** concern, as computer memory is finite. .footnote[ `\(^*\)` LAPACK has roots in `Fortran`, which is used in `R`. `R` also relies on `C` and `C++`, which also use LAPACK plus other linear alegrba packages. ] --- .center[ # Part V. Practice using some matrix operations ## There is a point to this -- trust it will work out. ] --- ### Putting Our Current Knowledge to Good Use! - Recall that a matrix must be square in order to be inverted. - Recall that data frames are `\(n \times p\)` in dimension, and it might be rare that `\(n=p\)`. - Recall that matrix cross-products with a single matrix are essentially the squaring of matrices, and they are symmetric. - **Thus**, matrix cross-products are invertible (and frequently used as a step before inversion). #### An Example (recalling an earlier example) using R scripts .pull-left[.med[ ``` r y1 <- c(2, 3, 2, 5, 6, 8) y2 <- c(-1, -2, 0, 0, 1, -1) Y <- as.matrix(data.frame(y1 = y1, y2 = y2)) rownames (Y) <- paste("obs", 1:6, sep = ".") Y ``` ``` ## y1 y2 ## obs.1 2 -1 ## obs.2 3 -2 ## obs.3 2 0 ## obs.4 5 0 ## obs.5 6 1 ## obs.6 8 -1 ``` ]] .pull-right[.med[ ``` r x1 <- rep(1, 6) x2 <- c(rep(0, 3), rep(1,3)) X <- as.matrix(data.frame(x1 = x1, x2 = x2)) rownames(X) <- rownames(Y) X ``` ``` ## x1 x2 ## obs.1 1 0 ## obs.2 1 0 ## obs.3 1 0 ## obs.4 1 1 ## obs.5 1 1 ## obs.6 1 1 ``` ]] --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts We cannot invert `\(\mathbf{X}\)`, but let's invert the cross-product of `\(\mathbf{X}\)`. ``` r crossprod(X) ``` ``` ## x1 x2 ## x1 6 3 ## x2 3 3 ``` ``` r solve(crossprod(X)) ``` ``` ## x1 x2 ## x1 0.3333333 -0.3333333 ## x2 -0.3333333 0.6666667 ``` --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts While we are on the topic… Imagine that the second column is a variable, `\(X\)`, taking on the value either 0 or 1 The columns are “dummy” variables ``` r X ``` ``` ## x1 x2 ## obs.1 1 0 ## obs.2 1 0 ## obs.3 1 0 ## obs.4 1 1 ## obs.5 1 1 ## obs.6 1 1 ``` With a little bit of mind-bending, one might notice that `$$\mathbf{X}^T\mathbf{X} = \begin{pmatrix} n & \sum X\\ \sum X & \sum X^2 \end{pmatrix}$$` --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts Let's reduce the `\(\mathbf{X}\)` matrix, such that `$$\mathbf{X}_0 = \begin{pmatrix} 1\\ 1\\ 1\\ 1\\ 1\\ 1 \end{pmatrix}$$` --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts .pull-left[.med[ And now in R, let's find the cross-product of this, and its inverse ``` r X0 <- X[,1] # just the first column crossprod(X0) ``` ``` ## [,1] ## [1,] 6 ``` ]] .pull-right[.med[ ``` r solve(crossprod(X0)) ``` ``` ## [,1] ## [1,] 0.1666667 ``` ]] Thus, `\(\mathbf{X}_0^T\mathbf{X}_0 = n\)` and `\(\left( \mathbf{X}_0^T\mathbf{X}_0 \right)^{-1} = n^{-1}\)` --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts Now for fun, let's find the cross-product between `\(\mathbf{X}_0\)` and `\(\mathbf{Y}\)` .pull-left[.med[ ``` r crossprod(X0, Y) ``` ``` ## y1 y2 ## [1,] 26 -3 ``` ]] .pull-right[.med[ If we examine `\(\mathbf{Y}\)` again, you might notice something ``` ## y1 y2 ## obs.1 2 -1 ## obs.2 3 -2 ## obs.3 2 0 ## obs.4 5 0 ## obs.5 6 1 ## obs.6 8 -1 ``` ]] `\(\mathbf{X}_0^T\mathbf{Y}\)` produces vector sums, `\(\sum y_i\)` for each variable. --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts One more step! Let's find the result of two things: `\(\left( \mathbf{X}_0^T\mathbf{X}_0 \right)^{-1}\mathbf{X}_0^T\mathbf{Y}\)` and `\(\mathbf{X}_0 \left( \mathbf{X}_0^T\mathbf{X}_0 \right)^{-1}\mathbf{X}_0^T\mathbf{Y}\)` --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts One more step! Let's find the result of two things: `\(\left( \mathbf{X}_0^T\mathbf{X}_0 \right)^{-1}\mathbf{X}_0^T\mathbf{Y}\)` and `\(\mathbf{X}_0 \left( \mathbf{X}_0^T\mathbf{X}_0 \right)^{-1}\mathbf{X}_0^T\mathbf{Y}\)` .pull-left[.med[ ``` r solve(crossprod(X0)) %*% crossprod(X0, Y) ``` ``` ## y1 y2 ## [1,] 4.333333 -0.5 ``` ``` r X0 %*% solve(crossprod(X0)) %*% crossprod(X0, Y) ``` ``` ## y1 y2 ## [1,] 4.333333 -0.5 ## [2,] 4.333333 -0.5 ## [3,] 4.333333 -0.5 ## [4,] 4.333333 -0.5 ## [5,] 4.333333 -0.5 ## [6,] 4.333333 -0.5 ``` ]] .pull-right[ **The former finds a (row) vector of variable means. The latter creates a matrix of variable means for every observation in the original data matrix, `\(\mathbf{Y}\)`!** Although it might not yet be obvious, we just used the general linear model to find fitted (predicted) values for a model design that has a single mean (centroid). ] --- ### Putting Our Current Knowledge to Good Use! (Cont.) #### An Example (recalling an earlier example) using R scripts If we now substitute `\(\mathbf{X}\)` for `\(\mathbf{X}_0\)`, this is the result .pull-left[ .med[ ``` r solve(crossprod(X)) %*% crossprod(X, Y) ``` ``` ## y1 y2 ## x1 2.333333 -1 ## x2 4.000000 1 ``` ``` r X %*% solve(crossprod(X)) %*% crossprod(X, Y) ``` ``` ## y1 y2 ## obs.1 2.333333 -1 ## obs.2 2.333333 -1 ## obs.3 2.333333 -1 ## obs.4 6.333333 0 ## obs.5 6.333333 0 ## obs.6 6.333333 0 ``` ]] Recall .pull.right[ ``` ## x1 x2 ## obs.1 1 0 ## obs.2 1 0 ## obs.3 1 0 ## obs.4 1 1 ## obs.5 1 1 ## obs.6 1 1 ``` ] One can see that the second variable is an indicator variable with 0 for one group and 1 for the other (two groups each with three observations). This approach allows one to estimate two group means and create a matrix of fitted values that has two means instead of one. --- .center[ # Part VI. The General Linear Model ## Only briefly ] --- ### The General Linear Model `$$\huge \mathbf{Z}=\mathbf{X}\boldsymbol{\beta } +\mathbf{E}$$` | Component | Dimension | Description |------- | :----------- | :----------------------------- | `\(\mathbf{Z}\)` | `\(n \times p\)` | Transformation of a data matrix `\((\mathbf{Y})\)` with `\(n\)` observations for `\(p\)` variables. The transformation ***could be*** matrix centering or standardization, but it is also feasible that `\(\mathbf{Z} = \mathbf{Y}\)`. **.red[Procrustes coordinates are inherently transformed.]** | `\(\mathbf{X}\)` | `\(n \times k\)` | Linear model design matrix with `\(n\)` observations for `\(k\)` parameters | `\(\boldsymbol{\beta}\)` | `\(k \times p\)` | Matrix of coefficients expressing change in values for the `\(k\)` model parameters for each of `\(p\)` variables | `\(\mathbf{E}\)` | `\(n \times p\)` | Matrix of residuals (error) for `\(n\)` observations for `\(p\)` variables Like any area of statistics, the coefficients for a linear model (which has parameters for variable change associated with some hypothetical process) are generally unknown but exist in a population, and are, therefore, estimated from a sample. We can solve this algebraically as the solution that would be true if there was no error in the model (i.e., `\(\mathbf{E}\)` = 0). The goal is to solve for `\(\boldsymbol{\beta}\)`. --- ### The General Linear Model (Cont.) $$\small \mathbf{Z}=\mathbf{X}\mathbf{\beta } $$ $$\small \mathbf{X}^T\mathbf{Z} = \mathbf{X}^T\mathbf{X}\mathbf{\beta } $$ `$$\small \left( \mathbf{X}^T\mathbf{X} \right)^{-1} \mathbf{X}^T\mathbf{Z} = \left( \mathbf{X}^T\mathbf{X} \right)^{-1} \mathbf{X}^T\mathbf{X}\mathbf{\beta } $$ `$$\small \left( \mathbf{X}^T\mathbf{X} \right)^{-1} \mathbf{X}^T\mathbf{Z} = \mathbf{I}\mathbf{\beta } $$ `$$\small \hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{Z}\right )$$` The `\(\hat{ }\)` reminds us that this is a matrix of estimate values. Notice that this resembles what we did before, .pull-left[.med[ ``` r Z <- Y solve(crossprod(X0)) %*% crossprod(X0, Z) ``` ``` ## y1 y2 ## [1,] 4.333333 -0.5 ``` ]] .pull-left[.med[ ``` r solve(crossprod(X)) %*% crossprod(X, Z) ``` ``` ## y1 y2 ## x1 2.333333 -1 ## x2 4.000000 1 ``` ]] --- ### The General Linear Model (Cont.) ***The General Linear Model will be discussed in great detail in the Shape Statistics I lecture.*** Suffice to say that the general linear model is a set of linear algebra operations, applied to a set of data, `\(\mathbf{Z}\)`, based on a model, `\(\mathbf{X}\)`. --- .center[ # Part VII. Matrix Decompositions ## Also only briefly ] --- ### Matrix Decompositions + Also called *matrix factorization* in some treatments. + There are many (but we will not consider them all) + Operations to decompose a matrix into multiple matrices with unique properties + Decompositions might be used as steps to solve an algebraic problem, or might be the solution to an algebraic problem + **The following are a few important decompositions** --- ### Matrix Decompositions ### Eigen decomposition For an `\(p \times p\)` square matrix, `\(\mathbf{S}\)`, there exists a vector of `\(p\)` values, `\(\mathbf{\lambda}\)`, such that, `$$\mathbf{Sv} =\lambda\mathbf{v}$$` which can also be written (in a more computationally feasible way) as `$$\left( \mathbf{S}-\lambda\mathbf{I} \right) \mathbf{v} = 0$$` + Solving each *possible* eigenvector ( `\(\mathbf{v}\)` ) and eigenvalue ( `\(\lambda\)` ) via a characteristic polynomial is called, eigen decomposition or "eigenanalysis". + Systematically find the `\(p\)` solutions of `\(\lambda\)` for `\(\left| \mathbf{S}-\lambda\mathbf{I} \right|\)`. Up to `\(p\)` real numbers exist if `\(\mathbf{S}\)` is positive-definite. Some of these numbers might be 0 if `\(\mathbf{S}\)` is not full-rank. Some can be negative if `\(\mathbf{S}\)` is not symmetric. + By finding the values of `\(\lambda\)` that work, the vectors, `\(\mathbf{v}\)`, are estimated such that the solution above holds true. Computers can solve characteristic polynomials easily via numerical approximation algorithms. A popular choice is the QR algorithm, which uses QR decomposition until convergence. Do not try to perform the characteristic polynomial solution or QR algorithm by hand! --- ### Matrix Decompositions (cont.) ### QR Decomposition `$$\mathbf{X}_{n \times k} =\mathbf{Q}_{n \times k} \mathbf{R}_{k \times k}$$` `\(\mathbf{Q}\)` and `\(\mathbf{R}\)` are solved via numerical approximation algorithms `\(\mathbf{R}\)` is a triangular matrix (only diagonal and elements above diagonal can have values other than 0). If `\(\mathbf{R}\)` is not full-rank, some of the upper elements will be 0. The matrix, `\(\mathbf{X}\)` was used on purpose, as QR decomposition is often used on design matrices in linear models, for calculations of coefficients or `\(SSCP\)` matrices. --- ### Singular Value decomposition of a Rectangular Matrix `$$\mathbf{Y}_{n \times p} =\mathbf{U}_{n \times p'} \mathbf{D}_{p' \times p'} \mathbf{V}_{p' \times p'}^T$$` `\(\mathbf{U}\)` (left singular values) and `\(\mathbf{V}\)` (right singular values) are solved via numerical approximation algorithms. The number of singular values is `\(p'= \min(n - 1,p)\)`. If `\(\mathbf{Y}\)` is not full-rank, some of the singular values in `\(\mathbf{D}\)` will be 0. The singular values decrease from largest to smallest. --- ### Singular Value decomposition of a Symmetric Matrix `$$\mathbf{S}_{p \times p} =\mathbf{V}_{p \times p} \mathbf{\Lambda}_{p \times p} \mathbf{V}_{p \times p}^T$$` `\(\mathbf{V}\)` (left and right singular values) are solved via numerical approximation algorithms. If `\(\mathbf{S}\)` is symmetric, positive-definite, `\(\mathbf{V}\)` is the matrix of eigenvectors and `\(\mathbf{\Lambda}\)` the matrix of eigenvalues, equal to those found from eigenanalysis using the characteristic polynomial function. This is a "short-cut" method, often employed for principal component analysis when it is sure that `\(\mathbf{S}\)` is symmetric, positive-definite. **It is also a short-cut for eigen decomposition.** If `\(\mathbf{S}\)` is a positive-definite, symmetric matrix, the singular values are the same as eigenvalues, hence the use of `\(\mathbf{\Lambda}\)` as nomenclature, rather than `\(\mathbf{D}\)`. The matching left and right singular vectors, `\(\mathbf{V}\)` are the same as eigenvectors. --- ### Cholesky decomposition of a Symmetric Matrix `$$\mathbf{S}_{p \times p} =\mathbf{T}_{p \times p} \mathbf{T}_{p \times p}^T$$` where `\(\mathbf{T}\)` is a triangular matrix. No need for a lot of detail here but just realize that Cholesky decompositions are sometimes used as solutions to find the *square-root of a symmetric matrix*. *Matrix square-roots are more complex than just this. The can also be found with singular value decomposition, and other methods.* --- ### Matrix Decompositions (Cont.) + Matrix decompositions provide mechanisms for many multivariate methods, including ordinations, linear model coefficient calculations, and matrix covariation analyses. Those presented here will show up again later. One can return to this lecture to review. --- ### Summary/Important Points + Vectors are important. They are the points in high-dimensional data spaces. They are the algebraic work horses. They have length and direction. They are the most important component of high-dimensional data analysis. + Matrices are not really more special than vectors, but they bundle vectors in important ways. + Many multivariate analyses involve inverting matrices, which means obtaining a symmetric matrix to invert. It is imperative to understand how and why these steps are important. + Once comfortable with the process of inverting matrices, recognizing that EVERY multivariate analysis is a permutation of linear model calculations. Linear model calculations make multivariate analysis easier. + Understanding projection is also important. Recognizing that different methods are just different ways of projecting data will help simplify some complex topics. --- ### Revisit Goals Understand why this equation is a foundational equation used in geometric morphometrics (GM) `$$\small\mathbf{Z}=[trace[\mathbf{(Y-\overline{Y})(Y-\overline{Y})^T}]]^{-1/2}\mathbf{(Y-\overline{Y})H}$$` Understand why this equation is a foundational equation in multivariate statistics (and burn it into your memory) `$$\hat{\boldsymbol{\beta}}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Z}}\right )$$` Understand why this equation is a universal equation for describing the alignment of shape data (or any multivariate data) to an alternative set of data, covariance structure, or model `$$\mathbf{A}^T\mathbf{Z} =\mathbf{UDV}^T$$` ***We will revisit these some more after a couple of other lectures.***