class: center, middle, inverse, title-slide .title[ # 13: Missing Data and Shape Subcomponents ] .author[ ### ] --- ### Data Compications in GMM + GPA: Aligns specimens to the average shape (reference, consensus shape) + Removes non-shape information so that shape may be reliably evaluated -- + However, GPA requires landmark correspondence among specimens + Thus, all landmarks are present on all specimens + Specimens with missing data can cause complications -- + GPA also requires that landmarks come from a single, rigid structure + Sometimes there are positional differences among subsets of landmarks + Sometimes we wish to combine anatomical components into a single analysis -- + Here we describe approaches for accounting for these challenges --- ### I: The Problem of Missing Data .pull-left[ + GPA: Aligns specimens to the average shape (reference, consensus shape) + Removes non-shape information + Requires landmark correspondence among specimens, and that all landmarks are present on all specimens ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-1-1.png" width="90%" style="display: block; margin: auto;" /> ] + What does one do when specimens have missing data? --- ### Missing Data + In some fields (e.g., anthropology), missing data is pervasive + Specimens are incomplete, or lack structures entirely <img src="LectureData/13.missing/MissingAnthro.png" width="70%" style="display: block; margin: auto;" /> --- ### Dealing with Missing Data: Delete Specimens + Simplest solution is to remove specimens with missing data <img src="LectureData/13.missing/FishDelSpec.png" width="90%" style="display: block; margin: auto;" /> -- + Reduces sample size + No information from deleted specimens, which could be important (e.g., a rare species) + NOT an optimal solution! --- ### Dealing with Missing Data: Delete Landmarks + Another option is to delete landmarks that are missing from some specimens <img src="LectureData/13.missing/FishDelLand.png" width="90%" style="display: block; margin: auto;" /> -- + Retains original sample size + No information from deleted landmarks, which could be biologically important + NOT an optimal solution! --- ### Estimate Missing Data `\(^1\)` + An alternative is to estimate missing data in some intelligent manner + Goal is to generate ‘complete’ specimens by estimating missing landmarks + Called data *imputation* in the statistical literature + Use some reasonable procedure to predict missing landmark locations + Several approaches possible .footnote[1: See Gunz et al. (2009). *J. Hum. Evol.*] --- ### 1: Exploiting Symmetry `\(^1\)` .pull-left[ + Use biological symmetry to estimate missing landmarks + Two approaches: + 1: Mirror-image (reflect) portion of structure to the other side + 2: Create `\(\small{2}^{nd}\)` full specimen via reflection (and relabeled) + Locations of missing landmarks estimated from location in reflected portion ] .pull-right[ <img src="LectureData/13.missing/Skull1.png" width="80%" style="display: block; margin: auto;" /> ] .footnote[1: Note: the `reflectMissingLandmarks` function in `StereoMorph` may be used for symmetry-based estimation of missing landmarks] --- ### 1: Exploiting Symmetry: Example .pull-left[ + Take one specimen (in red) and eliminate some landmarks ``` ## ## No curves detected; all points appear to be fixed landmarks. ``` <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-6-1.png" width="95%" style="display: block; margin: auto;" /> ] .pull-right[ - Let's delete some landmarks then estimate by exploiting symmetry <img src="LectureData/13.missing/LizardMissing.png" width="90%" style="display: block; margin: auto;" /> ] --- #### 1: Exploiting Symmetry: Test Procedure .pull-left[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" /> `\(\small{D}_{Proc}= 0.009\)`. Pretty good! ] --- ### 1: Exploiting Symmetry: Thoughts **Advantages** + Exploits spatial relationships of anatomy within a specimen + Leverages ‘pseudoreplication’ of symmetric points **Disadvantages** + Not all objects are symmetric + Studies of asymmetry are challenged, because by definition only the symmetric portion of shape is used in the reconstruction -- + Symmetry methods can be useful but not are not a general solution (limited to symmetric structures) --- ### 2: Mean Substitution `\(^1\)` + Use landmarks in reference to estimate missing landmarks + 1: Superimpose all complete specimens + 2: Obtain reference (average) + 3: Replace missing landmarks with values from reference .footnote[1: see Arbour and Brown. (2014). *Methods. Ecol. Evol.*] --- ### 2: Mean Substitution: Example + Here we have some (simulated) fish data: ``` ## ## No curves detected; all points appear to be fixed landmarks. ``` <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-10-1.png" width="45%" style="display: block; margin: auto;" /> --- ### 2: Mean Substitution: Example 1 (Cont.) + Let's take one specimen (in red) and eliminate some landmarks <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-11-1.png" width="45%" style="display: block; margin: auto;" /> --- ### 2: Mean Substitution: Example 1 (Cont.) + Now let's delete some landmarks then estimate by mean substitution .pull-left[ <img src="LectureData/13.missing/FishMinMissing.png" width="80%" style="display: block; margin: auto;" /> Test Procedure + 1: Obtain specimen, delete landmarks + 2: Estimate landmarks from mean specimen + 3: Calculate `\(\small{D}_{Proc}\)` between original and estimated ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-13-1.png" width="75%" style="display: block; margin: auto;" /> `\(\small{D}_{Proc_{Ref-Orig}} = 0.26\)` `\(\small{D}_{Proc_{Ref-Est}} = 0.23\)` `\(\small{D}_{Proc_{Orig-Est}} = 0.13\)` **Not good at all!** ] --- ### 2: Mean Substitution: Example 2 + Or let's take a different specimen (in red) and eliminate some landmarks <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-14-1.png" width="45%" style="display: block; margin: auto;" /> --- ### 2: Mean Substitution: Example 1 (Cont.) .pull-left[ + Now let's delete some landmarks then estimate by mean substitution <img src="LectureData/13.missing/FishMaxMissing.png" width="80%" style="display: block; margin: auto;" /> Test Procedure + 1: Obtain specimen, delete landmarks + 2: Estimate landmarks from mean specimen + 3: Calculate `\(\small{D}_{Proc}\)` between original and estimated ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-16-1.png" width="75%" style="display: block; margin: auto;" /> `\(\small{D}_{Proc_{Ref-Orig}} = 0.15\)` `\(\small{D}_{Proc_{Ref-Est}} = 0.13\)` `\(\small{D}_{Proc_{Orig-Est}} = 0.08\)` **Not good at all!** ] --- ### 2: Mean Substitution: What's Going On? .pull-left[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-17-1.png" width="80%" style="display: block; margin: auto;" /> + Mean substitution does not account for systematic variation (e.g., allometry) + If shape covaries with some factor (e.g., size!), it will over- or under-estimate landmark locations ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-18-1.png" width="80%" style="display: block; margin: auto;" /> + **Mean substitution should not be used!** ] --- ### 3: TPS Interpolation `\(^1\)` + Use thin-plate spline to estimate location of missing landmarks <img src="LectureData/04.shape.vars/GorillaHuman.png" width="70%" style="display: block; margin: auto;" /> + Procedure + 1: Identify common landmarks in both reference and target + 2: Calculate TPS interpolation + 3: Missing landmarks in target estimated from their location in reference, filtered through TPS .footnote[1: Bookstein et al. (1999). *New. Anat.*; Gunz et al. (2009). *J. Hum. Evol.*] --- ### 3: TPS Interpolation: Concept `\(^1\)` + Procedure + 1: Identify common landmarks in both reference and target + 2: Calculate TPS interpolation + 3: Missing landmarks in target estimated from their location in reference, filtered through TPS <img src="LectureData/13.missing/TPSMissingConcept.png" width="70%" style="display: block; margin: auto;" /> .footnote[1: Bookstein et al. (1999). *New. Anat.*; Gunz et al. (2009). *J. Hum. Evol.*] --- ### 3: TPS Interpolation: Example 1 .pull-left[ ``` r new.tps<-estimate.missing(shapes.missing,method="TPS") ``` <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-22-1.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ `\(\small{D}_{Proc_{Ref-Orig}} = 0.26\)` `\(\small{D}_{Proc_{Ref-Est}} = 0.27\)` `\(\small{D}_{Proc_{Orig-Est}} = 0.003\)` **MUCH Better!** ] --- ### 3: TPS Interpolation: Example 2 <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-23-1.png" width="40%" style="display: block; margin: auto;" /> `\(\small{D}_{Proc_{Ref-Orig}} = 0.15\)` `\(\small{D}_{Proc_{Ref-Est}} = 0.15\)` `\(\small{D}_{Proc_{Orig-Est}} = 0.011\)` **MUCH Better!** --- ### 3: TPS Interpolation: Thoughts **Advantages** + Exploits spatial relationships of anatomy within a specimen **Disadvantages** + Less accurate if many landmarks in a region missing (common with fossils) + Does not leverage additional covariation information in sample + TPS interpolation is *very* useful, but may be improved upon --- ### 4: Regression Interpolation `\(^1\)` + Use covariation between landmarks to estimate locations + Procedure + 1: Superimpose all complete specimens + 2:Regress landmarks with missing values against complete specimens + 3: Use post-hoc prediction on regression for missing landmarks + 4: Predicted values serve as missing landmark locations .footnote[1: Note: Regression scores of PLS typically used as `\(\small{p>n}\)` 2: See Gunz et al. (2009). *J. Hum. Evol.*; reviewed in Arbour and Brown (2014). *Methods. Ecol. Evol.*] --- ### 4: Regression Interpolation: Example 1 .pull-left[ ``` r new.reg<-estimate.missing(shapes.missing,method="Reg") ``` <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-25-1.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ `\(\small{D}_{Proc_{Ref-Orig}} = 0.26\)` `\(\small{D}_{Proc_{Ref-Est}} = 0.27\)` `\(\small{D}_{Proc_{Orig-Est}} = 0.030\)` **Pretty good!** ] --- ### 4: Regression Interpolation: Example 2 <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-26-1.png" width="40%" style="display: block; margin: auto;" /> `\(\small{D}_{Proc_{Ref-Orig}} = 0.15\)` `\(\small{D}_{Proc_{Ref-Est}} = 0.15\)` `\(\small{D}_{Proc_{Orig-Est}} = 0.011\)` **Even Better!** --- ### 4: Regression Interpolation: Thoughts **Advantages** + Exploits spatial relationships of anatomy within a specimen + Leverages covariation between anatomical landmarks + Leverages covariation within a sample **Disadvantages** + May be less accurate when small samples are examined ###### NOTE: Estimation may be further improved by considering within-sample variation (e.g., use specimens within a species) -- + Regression interpolation is *VERY* useful, but is it universally better? --- ### Estimating Missing Landmarks: Method Comparisons + Few systematic comparisons among methods exist, but those that do imply that regression estimation is generally preferred, followed by TPS interpolation <img src="LectureData/13.missing/MethodsCompare.png" width="75%" style="display: block; margin: auto;" /> + Higher error with `\(\small{\uparrow}\)` missing landmarks and `\(\small{\uparrow}\)` specimens containing missing data + **Regression method generally preferred** --- ### Estimating Missing Landmarks: Flow of Computations + GMM Workflow should be augmented to account for missing landmarks + 1: Digitize data & read into geomorph + 2: **Estimate missing landmarks** + 3: GPA + projection + 4: Statistical analyses and visualization --- ### Missing Data: Conclusions + Missing data has been a major challenge to GM analyses + Deleting specimens or landmarks ignores information + Morphometric-based estimation + Exploit symmetry in data + Use TPS interpolation + ‘Classical’ statistical approaches extended to morphometrics + Regression incorporates covariation in sample + **Regression approach appears most robust** --- ### II: Special Considerations: Positional Effects + Sometimes objects positional variation of their subcomponents + Articulations frequently cause this + Thus our data have: shape effects + positional effects + GMM procedures have been developed to account for this `\(^1\)` <img src="LectureData/13.missing/ArticConcept.png" width="80%" style="display: block; margin: auto;" /> .footnote[1. Adams (1999). *Evol. Ecol. Res.*] --- ### Special Considerations: Articulations + For articulated structures, several solutions exist + Fixing the angle in all specimens through a mathematical transformation + Separating the subsets to analyse separately, etc. `\(^1\)` <img src="LectureData/13.missing/ArticMath.png" width="70%" style="display: block; margin: auto;" /> .footnote[1. Adams (1999). *Evol. Ecol. Res.*] --- ### Articulation Standardization: Flow of Computations + 1: Identify articulation point, and points on each subset (or their centroids) + 2: For `\(n = 1 \rightarrow i\)` specimens, center specimens on articulation + 3: For `\(n = 1 \rightarrow i\)` specimens, calculate `\(\theta_{i}\)` + 4: Estimate `\(\bar\theta\)` for sample + 5: For `\(n = 1 \rightarrow i\)` specimens, rotate one subset so `\(\theta_{i} = \bar\theta\)` + 6: Perform GPA on standardized specimens ##### Adams (1999). *Evol. Ecol. Res.* .footnote[Note: Fixed angle method of Adams (1999) generalized for 3D data: Vidal-Garcia et al. (2018). *Ecol. Evol.*] --- ### Articulation Standardization: Example + Standardize some data for relative jaw position ``` r jaw.fixed <- fixed.angle(gpa.rand, art.pt=1, angle.pts.1 = 5, angle.pts.2 = 6, rot.pts = c(2,3,4,5)) gpa.fixed <- gpagen(jaw.fixed, print.progress = FALSE)$coords ``` .pull-left[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-32-1.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-33-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ### III: Special Considerations: Combining Shapes + Sometimes we obtain the shape of two subcomponents (configurations) separately + We wish to combine these for an overall view of shape variation + GMM procedures have been developed for this + A **critical** consideration is the appropriate size-scaling of the configurations <img src="LectureData/13.missing/CombineSubsetConcept.png" width="45%" style="display: block; margin: auto;" /> --- ### Combining Subsets: Flow of Computations + Perform GPA on each subset separately + Scale each subset configuration by: `\(\frac{w_iCS_i}{\sqrt{\sum{w_iCS^2_i}}}\)` + where `\(CS_i\)` is the centroid size of that configuration and `\(w_i\)` is a possible weight + Combine size-scaled subset configurations + Treat as overall set of shape variables for analysis + Note: using equal `\(w_i\)` guarantees that the *relative sizes* of the subset configurations are preserved (Collyer et al. 2020) + ( see Adams 1999 for a related approach for 2 subsets) .footnote[1. Collyer, Davis, Adams. (2020). *Evol. Biol.*] --- ### Combining Subsets: Example <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-35-1.png" width="35%" style="display: block; margin: auto;" /> .center[Original Data: Heads and Tails of larval salamanders] --- ### Combining Subsets: Example Cont. ``` r comb.lm <- combine.subsets(head = head.gpa, tail = tail.gpa, gpa = TRUE) ``` .pull-left[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-37-1.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-38-1.png" width="70%" style="display: block; margin: auto;" /> ] .center[Note correct relative sizes of subset configurations] --- ### Conclusions: Special Considerations + The Procrustes paradigm provides unparalleled rigor for statistical shape analysis + But GPA requires perfect correspondence of variables among objects + Also assumes that other non-shape variation is held constant + Real biological data have challenges + Objects are sometimes incomplete (missing components) + Objects can have positional variation between parts (articulation variation) + Objects are comprised of multiple subcomponents measured separately + Adjustments to GMM protocol enable the analysis of shape from these objects