13: Missing Data and Shape Subcomponents

class: center, middle, inverse, title-slide

.title[
# 13: Missing Data and Shape Subcomponents
]
.author[
### 
]

---

### Data Compications in GMM

+ GPA: Aligns specimens to the average shape (reference, consensus shape)

+ Removes non-shape information so that shape may be reliably evaluated

+ However, GPA requires landmark correspondence among specimens
  + Thus, all landmarks are present on all specimens
  + Specimens with missing data can cause complications

+ GPA also requires that landmarks come from a single, rigid structure
  + Sometimes there are positional differences among subsets of landmarks
  + Sometimes we wish to combine anatomical components into a single analysis

+ Here we describe approaches for accounting for these challenges
---

### I: The Problem of Missing Data

.pull-left[
+ GPA: Aligns specimens to the average shape (reference, consensus shape)
  + Removes non-shape information
  + Requires landmark correspondence among specimens, and that all landmarks are present on all specimens
]
.pull-right[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-1-1.png" width="90%" style="display: block; margin: auto;" />
]
+ What does one do when specimens have missing data?
---

### Missing Data

+ In some fields (e.g., anthropology), missing data is pervasive
  + Specimens are incomplete, or lack structures entirely

<img src="LectureData/13.missing/MissingAnthro.png" width="70%" style="display: block; margin: auto;" />
---

### Dealing with Missing Data: Delete Specimens

+ Simplest solution is to remove specimens with missing data

+ Reduces sample size
+ No information from deleted specimens, which could be important (e.g., a rare species)

+ NOT an optimal solution!
---

### Dealing with Missing Data: Delete Landmarks

+ Another option is to delete landmarks that are missing from some specimens

+ Retains original sample size
+ No information from deleted landmarks, which could be biologically important

+ NOT an optimal solution!
---

### Estimate Missing Data `$^1$`

+ An alternative is to estimate missing data in some intelligent manner
+ Goal is to generate ‘complete’ specimens by estimating missing landmarks
  + Called data *imputation* in the statistical literature
+ Use some reasonable procedure to predict missing landmark locations
+ Several approaches possible

.footnote[1: See Gunz et al. (2009). *J. Hum. Evol.*]
---

### 1: Exploiting Symmetry `$^1$`

.pull-left[
+ Use biological symmetry to estimate missing landmarks
+ Two approaches:
  + 1: Mirror-image (reflect) portion of structure to the other side
  + 2: Create `$\small{2}^{nd}$` full specimen via reflection (and relabeled) 
+ Locations of missing landmarks estimated from location in reflected portion
]
.pull-right[
<img src="LectureData/13.missing/Skull1.png" width="80%" style="display: block; margin: auto;" />
]

.footnote[1: Note: the `reflectMissingLandmarks` function in `StereoMorph` may be used for symmetry-based estimation of missing landmarks]
---

### 1: Exploiting Symmetry: Example

.pull-left[
+ Take one specimen (in red) and eliminate some landmarks

```
## 
## No curves detected; all points appear to be fixed landmarks.
```

<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-6-1.png" width="95%" style="display: block; margin: auto;" />
]
.pull-right[
- Let's delete some landmarks then estimate by exploiting symmetry

<img src="LectureData/13.missing/LizardMissing.png" width="90%" style="display: block; margin: auto;" />
]
---

#### 1: Exploiting Symmetry: Test Procedure

.pull-left[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" />

]
.pull-right[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" />

`$\small{D}_{Proc}= 0.009$`. Pretty good!
]
---

### 1: Exploiting Symmetry: Thoughts

**Advantages**
  + Exploits spatial relationships of anatomy within a specimen
  + Leverages ‘pseudoreplication’ of symmetric points

**Disadvantages**
  + Not all objects are symmetric
  + Studies of asymmetry are challenged, because by definition only the symmetric portion of shape is used in the reconstruction

+ Symmetry methods can be useful but not are not a general solution (limited to symmetric structures)
---

### 2: Mean Substitution `$^1$`

+ Use landmarks in reference to estimate missing landmarks
  + 1: Superimpose all complete specimens 
  + 2: Obtain reference (average)
  + 3: Replace missing landmarks with values from reference

.footnote[1: see Arbour and Brown. (2014). *Methods. Ecol. Evol.*]
---

### 2: Mean Substitution: Example

+ Here we have some (simulated) fish data:

```
## 
## No curves detected; all points appear to be fixed landmarks.
```

<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-10-1.png" width="45%" style="display: block; margin: auto;" />
---

### 2: Mean Substitution: Example 1 (Cont.)

+ Let's take one specimen (in red) and eliminate some landmarks

<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-11-1.png" width="45%" style="display: block; margin: auto;" />
---

### 2: Mean Substitution: Example 1 (Cont.)

+ Now let's delete some landmarks then estimate by mean substitution

.pull-left[
<img src="LectureData/13.missing/FishMinMissing.png" width="80%" style="display: block; margin: auto;" />

Test Procedure

+ 1: Obtain specimen, delete landmarks
+ 2: Estimate landmarks from mean specimen
+ 3: Calculate `$\small{D}_{Proc}$` between original and estimated
]

.pull-right[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-13-1.png" width="75%" style="display: block; margin: auto;" />

`$\small{D}_{Proc_{Ref-Orig}} = 0.26$`  `$\small{D}_{Proc_{Ref-Est}} = 0.23$`

`$\small{D}_{Proc_{Orig-Est}} = 0.13$`   **Not good at all!**
]
---

### 2: Mean Substitution: Example 2

+ Or let's take a different specimen (in red) and eliminate some landmarks

<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-14-1.png" width="45%" style="display: block; margin: auto;" />
---

### 2: Mean Substitution: Example 1 (Cont.)

.pull-left[
+ Now let's delete some landmarks then estimate by mean substitution

Test Procedure

+ 1: Obtain specimen, delete landmarks
+ 2: Estimate landmarks from mean specimen
+ 3: Calculate `$\small{D}_{Proc}$` between original and estimated

]

.pull-right[

`$\small{D}_{Proc_{Ref-Orig}} = 0.15$`  `$\small{D}_{Proc_{Ref-Est}} = 0.13$`

`$\small{D}_{Proc_{Orig-Est}} = 0.08$`   **Not good at all!**
]
---

### 2: Mean Substitution: What's Going On?

.pull-left[

+ Mean substitution does not account for systematic variation (e.g., allometry)
+ If shape covaries with some factor (e.g., size!), it will over- or under-estimate landmark locations

]

.pull-right[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-18-1.png" width="80%" style="display: block; margin: auto;" />

+ **Mean substitution should not be used!**
]
---

### 3: TPS Interpolation `$^1$`

+ Use thin-plate spline to estimate location of missing landmarks

+ Procedure
  + 1: Identify common landmarks in both reference and target
  + 2: Calculate TPS interpolation
  + 3: Missing landmarks in target estimated from their location in reference, filtered through TPS

.footnote[1: Bookstein et al. (1999). *New. Anat.*; Gunz et al. (2009). *J. Hum. Evol.*]
---

### 3: TPS Interpolation: Concept `$^1$`

.footnote[1: Bookstein et al. (1999). *New. Anat.*; Gunz et al. (2009). *J. Hum. Evol.*]
---

### 3: TPS Interpolation: Example 1

.pull-left[

``` r
new.tps<-estimate.missing(shapes.missing,method="TPS")
```

<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-22-1.png" width="80%" style="display: block; margin: auto;" />
]
.pull-right[
`$\small{D}_{Proc_{Ref-Orig}} = 0.26$`  `$\small{D}_{Proc_{Ref-Est}} = 0.27$`

`$\small{D}_{Proc_{Orig-Est}} = 0.003$`   **MUCH Better!**
]
---

### 3: TPS Interpolation: Example 2

`$\small{D}_{Proc_{Ref-Orig}} = 0.15$`  `$\small{D}_{Proc_{Ref-Est}} = 0.15$`

`$\small{D}_{Proc_{Orig-Est}} = 0.011$`   **MUCH Better!**
---

### 3: TPS Interpolation: Thoughts

**Advantages**
  + Exploits spatial relationships of anatomy within a specimen

**Disadvantages**
  + Less accurate if many landmarks in a region missing (common with fossils)
  + Does not leverage additional covariation information in sample

+ TPS interpolation is *very* useful, but may be improved upon
---

### 4: Regression Interpolation `$^1$`

+ Use covariation between landmarks to estimate locations

+ Procedure
  + 1: Superimpose all complete specimens 
  + 2:Regress landmarks with missing values against complete specimens
  + 3: Use post-hoc prediction on regression for missing landmarks
  + 4: Predicted values serve as missing landmark locations

.footnote[1: Note: Regression scores of PLS typically used as `$\small{p>n}$`

2: See Gunz et al. (2009). *J. Hum. Evol.*; reviewed in Arbour and Brown (2014). *Methods. Ecol. Evol.*]
---

### 4: Regression Interpolation: Example 1

.pull-left[

``` r
new.reg<-estimate.missing(shapes.missing,method="Reg")
```

<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-25-1.png" width="80%" style="display: block; margin: auto;" />
]
.pull-right[
`$\small{D}_{Proc_{Ref-Orig}} = 0.26$`  `$\small{D}_{Proc_{Ref-Est}} = 0.27$`

`$\small{D}_{Proc_{Orig-Est}} = 0.030$`   **Pretty good!**
]
---

### 4: Regression Interpolation: Example 2

`$\small{D}_{Proc_{Ref-Orig}} = 0.15$`  `$\small{D}_{Proc_{Ref-Est}} = 0.15$`

`$\small{D}_{Proc_{Orig-Est}} = 0.011$`   **Even Better!**
---

### 4: Regression Interpolation: Thoughts

**Advantages**
  + Exploits spatial relationships of anatomy within a specimen
  + Leverages covariation between anatomical landmarks
  + Leverages covariation within a sample

**Disadvantages**
  + May be less accurate when small samples are examined

###### NOTE: Estimation may be further improved by considering within-sample variation (e.g., use specimens within a species)

+ Regression interpolation is *VERY* useful, but is it universally better? 
---

### Estimating Missing Landmarks: Method Comparisons

+ Few systematic comparisons among methods exist, but those that do imply that regression estimation is generally preferred, followed by TPS interpolation

+ Higher error with `$\small{\uparrow}$` missing landmarks and `$\small{\uparrow}$`  specimens containing missing data

+ **Regression method generally preferred**
---

### Estimating Missing Landmarks: Flow of Computations

+ GMM Workflow should be augmented to account for missing landmarks 
  + 1: Digitize data & read into geomorph
  + 2: **Estimate missing landmarks** 
  + 3: GPA + projection
  + 4: Statistical analyses and visualization
---
    
### Missing Data: Conclusions

+ Missing data has been a major challenge to GM analyses
+ Deleting specimens or landmarks ignores information

+ Morphometric-based estimation
  + Exploit symmetry in data
  + Use TPS interpolation

+ ‘Classical’ statistical approaches extended to morphometrics
  + Regression incorporates covariation in sample

+ **Regression approach appears most robust**

---

### II: Special Considerations: Positional Effects

+ Sometimes objects positional variation  of their subcomponents
  + Articulations frequently cause this 
+ Thus our data have: shape effects + positional effects 
+ GMM procedures have been developed to account for this `$^1$`

.footnote[1. Adams (1999). *Evol. Ecol. Res.*]
---

### Special Considerations: Articulations

+ For articulated structures, several solutions exist
  + Fixing the angle in all specimens through a mathematical transformation
  + Separating the subsets to analyse separately, etc. `$^1$`
    
<img src="LectureData/13.missing/ArticMath.png" width="70%" style="display: block; margin: auto;" />

.footnote[1. Adams (1999). *Evol. Ecol. Res.*]

---

### Articulation Standardization: Flow of Computations

+ 1: Identify articulation point, and points on each subset (or their centroids)
+ 2: For `$n = 1 \rightarrow i$` specimens, center specimens on articulation
+ 3: For `$n = 1 \rightarrow i$` specimens, calculate `$\theta_{i}$`
+ 4: Estimate `$\bar\theta$` for sample
+ 5: For `$n = 1 \rightarrow i$` specimens, rotate one subset so `$\theta_{i} = \bar\theta$` 
+ 6: Perform GPA on standardized specimens

##### Adams (1999). *Evol. Ecol. Res.*

.footnote[Note: Fixed angle method of Adams (1999) generalized for 3D data: Vidal-Garcia et al. (2018). *Ecol. Evol.*]
---

### Articulation Standardization: Example

+ Standardize some data for relative jaw position

``` r
jaw.fixed <- fixed.angle(gpa.rand, art.pt=1, angle.pts.1 = 5, angle.pts.2 = 6, rot.pts = c(2,3,4,5))

gpa.fixed <- gpagen(jaw.fixed, print.progress = FALSE)$coords
```

.pull-left[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-32-1.png" width="70%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-33-1.png" width="70%" style="display: block; margin: auto;" />
]

---

### III: Special Considerations: Combining Shapes

+ Sometimes we obtain the shape of two subcomponents (configurations) separately 
+ We wish to combine these for an overall view of shape variation 
+ GMM procedures have been developed for this
  + A **critical** consideration is the appropriate size-scaling of the configurations

<img src="LectureData/13.missing/CombineSubsetConcept.png" width="45%" style="display: block; margin: auto;" />
---

### Combining Subsets: Flow of Computations

+ Perform GPA on each subset separately

+ Scale each subset configuration by: `$\frac{w_iCS_i}{\sqrt{\sum{w_iCS^2_i}}}$`
  + where `$CS_i$` is the centroid size of that configuration and `$w_i$` is a possible weight

+ Combine size-scaled subset configurations

+ Treat as overall set of shape variables for analysis

+ Note: using equal `$w_i$` guarantees that the *relative sizes* of the subset configurations are preserved (Collyer et al. 2020)
  + ( see Adams 1999 for a related approach for 2 subsets)
  
.footnote[1. Collyer, Davis, Adams. (2020). *Evol. Biol.*]
---

### Combining Subsets: Example

.center[Original Data: Heads and Tails of larval salamanders]

---
### Combining Subsets: Example Cont.

``` r
comb.lm <- combine.subsets(head = head.gpa, tail = tail.gpa, gpa = TRUE)
```

.pull-left[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-37-1.png" width="70%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="13-MissingDataShapeSubComp_files/figure-html/unnamed-chunk-38-1.png" width="70%" style="display: block; margin: auto;" />
]

.center[Note correct relative sizes of subset configurations]
---

### Conclusions: Special Considerations

+ The Procrustes paradigm provides unparalleled rigor for statistical shape analysis
  + But GPA requires perfect correspondence of variables among objects
  + Also assumes that other non-shape variation is held constant
  
+ Real biological data have challenges
  + Objects are sometimes incomplete (missing components)
  + Objects can have positional variation between parts (articulation variation)
  + Objects are comprised of multiple subcomponents measured separately

+ Adjustments to GMM protocol enable the analysis of shape from these objects