Day-04: Phylogenetic Comparative Methods

Content Description

Phylogenetic comparative methods (PCMs) condition the data on the phylogeny during the analysis, to account for the lack of independence among species. This is accomplished using generalized least squares (GLS) methods, where the residual error is not modeled as iid (independent and identically distributed), but instead is found as: $\epsilon\small\sim\mathcal{N}(0,\sigma^2\mathbf\Omega)$, where $\mathbf\Omega$ is the phylogenetic covariance matrix. Geomorph provides several distinct multivariate phylogenetic comparative analyses, which we review here.

Preliminaries

IMPORTANT: performing PCMs requires a strict 1:1 match between the data and the phylogeny. Species names must be included in data matrices, and these must correspond exactly to the tips-labels of the phylogeny. Typically, the initial data files do not have this: there are often more specimens in the data than on the phylogeny, and some species on the phylogeny not in the data. So one must ‘prune’ both datasets to match, so the analytics can link the data with the phylogeny.

Let´s do that here:

library(geomorph)

## Loading required package: RRPP

## Loading required package: rgl

## Loading required package: Matrix

library(geiger)

## Loading required package: ape

plethtree <- read.tree('Data/plethtree.tre')

plethtree <- read.tree('Data/plethtree.tre')
plethland <- readland.tps('Data/PlethodonLand.tps',specID = "ID",
                          warnmsg = FALSE)
gps <- read.csv('Data/PlethGps.csv', header=TRUE, row.names=1)
Y.gpa <- gpagen(plethland, print.progress = FALSE)
M <- mshape(Y.gpa$coords)
svl <- Y.gpa$Csize

shape <- Y.gpa$coords
shape.test <- treedata(phy = plethtree, data = two.d.array(shape), warnings = TRUE)
  #no warnings. Everthing matches in this case

data.matched <- treedata(phy = plethtree, data = gps, warnings=FALSE)
elev <- as.factor(data.matched$data); names(elev) <- row.names(data.matched$data)

gdf <- geomorph.data.frame(shape=shape, svl=svl,elev = elev, plethtree=plethtree)

links <- matrix(c(4,3,2,1,1,6,7,8,9,10,1,1,11,5,5,4,2,3,7,8,9,10,11,9,10,1),
                ncol=2,byrow=FALSE)
plot(ladderize(plethtree),edge.width=3)
axisPhylo(1)

NOTE: treedata will create new objects $phy and $data that contain only the set of species in common to both. It will also provide warnings if any species are not found in both the data and the phylogeny.

plotAllSpecimens(shape, links=links)

1: Phylogenetic Least Squares Models

Now we will run some common linear models in a phylogenetic context.

1A: Phylogenetic Regression

Using these data, we can now perform regression in a phylogenetic framework, using the function procD.pgls:

pgls.reg <- procD.pgls(f1 = shape~svl, phy=plethtree, data=gdf, print.progress = FALSE)
summary(pgls.reg)

## 
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals 
## Number of permutations: 1000 
## Estimation method: Generalized Least-Squares (via OLS projection) 
## Sums of Squares and Cross-products: Type I 
## Effect sizes (Z) based on F distributions
## 
##           Df        SS         MS     Rsq      F      Z Pr(>F)  
## svl        1 0.0006581 0.00065811 0.07046 3.0323 1.9503  0.028 *
## Residuals 40 0.0086814 0.00021704 0.92954                       
## Total     41 0.0093395                                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call: procD.lm(f1 = shape ~ svl, iter = iter, seed = seed, RRPP = TRUE,  
##     SS.type = SS.type, effect.type = effect.type, int.first = int.first,  
##     Cov = Cov, data = data, print.progress = print.progress)

allom.plot <- plot(pgls.reg, type = "regression", predictor = gdf$svl,
                   reg.type ="RegScore", pch=19, cex=1.5, xlab = "SVL") # make sure to have a predictor 
fit.line <- lm(allom.plot$RegScore~gdf$svl)
abline(fit.line,col = "red")

One would then like to visualize shape changes along this regression. IMPORTANT NOTE: be sure to use the $pgls.fitted values for the shape prediction and NOT (OLS) the $fitted values!

preds <- shape.predictor(pgls.reg$GM$pgls.fitted, x= allom.plot$RegScore, Intercept = FALSE, 
                         predmin = min(allom.plot$RegScore), 
                         predmax = max(allom.plot$RegScore)) 
M <- mshape(shape)
par(mfrow = c(1,2))
plotRefToTarget(M, preds$predmin, mag=3, links = links)
mtext("Min")
plotRefToTarget(M, preds$predmax, mag=3, links = links)
mtext("Max")

par(mfrow = c(1,1))

1B: Phylogenetic ANOVA

Phylogenetic ANOVA works in an analogous fashion. Here we have an example with high vs low elevation species:

pgls.aov <- procD.pgls(f1 = shape~elev, phy=plethtree, data=gdf, print.progress = FALSE)
summary(pgls.aov)

## 
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals 
## Number of permutations: 1000 
## Estimation method: Generalized Least-Squares (via OLS projection) 
## Sums of Squares and Cross-products: Type I 
## Effect sizes (Z) based on F distributions
## 
##           Df        SS         MS     Rsq     F      Z Pr(>F)  
## elev       1 0.0004220 0.00042201 0.04519 1.893 1.3479    0.1 .
## Residuals 40 0.0089175 0.00022294 0.95481                      
## Total     41 0.0093395                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call: procD.lm(f1 = shape ~ elev, iter = iter, seed = seed, RRPP = TRUE,  
##     SS.type = SS.type, effect.type = effect.type, int.first = int.first,  
##     Cov = Cov, data = data, print.progress = print.progress)

plot.res <- gm.prcomp(shape,phy=plethtree)
plot(plot.res,phylo = FALSE, pch=21, bg=gdf$elev, cex=2)
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

… and visualized via TPS deformation grids

1C: Phylogenetic PLS

For two sets of variables, one may perform PLS in a phylogenetic framework:

land.gps<-c("A","A","A","A","A","B","B","B","B","B","B")
PLS.Y <- phylo.integration(A = gdf$shape, partition.gp = land.gps, phy= plethtree, print.progress = FALSE)
summary(PLS.Y)

## 
## Call:
## phylo.integration(A = gdf$shape, phy = plethtree, partition.gp = land.gps,  
##     print.progress = FALSE) 
## 
## 
## 
## r-PLS: 0.843
## 
## Effect Size (Z): 4.7366
## 
## P-value: 0.001
## 
## Based on 1000 random permutations

plot(PLS.Y)

picknplot.shape may be used to visualize shapes in this plot

2: Phylogenetic signal

One may also wish to evaluate the degree to which phenotypic similarity associates with phylogenetic relatedness (i.e., phylogenetic signal). This may be investigated using physignal:

PS.shape <- physignal(A=shape,phy=plethtree,iter=999, print.progress = FALSE)
summary(PS.shape)

## 
## Call:
## physignal(A = shape, phy = plethtree, iter = 999, print.progress = FALSE) 
## 
## 
## 
## Observed Phylogenetic Signal (K): 0.392
## 
## P-value: 0.001
## 
## Based on 1000 random permutations
## 
##  Use physignal.z to estimate effect size.

plot(PS.shape)

Development of multivariate comparisons of phylogenetic signal are ongoing.

3: Phylogenetic Ordination

There are three primary ways of incorporating phylogenetic information into a multivariate ordination: Phylomorphospace, phylogenetic PCA (pPCA), and phylogenetically-aligned components analysis (PACA). All may be obtained in geomorph using gm.prcomp:

Phylomorphospace

plot.pca <- gm.prcomp(shape,phy=plethtree)
plot(plot.pca,phylo = TRUE, pch=21, bg=gdf$elev, cex=2, phylo.par = list(tip.labels = FALSE, node.labels = FALSE) )
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

Phylogenetic PCA (pPCA)

plot.ppca <- gm.prcomp(shape,phy=plethtree, GLS = TRUE, transform = TRUE)
plot(plot.ppca,phylo = TRUE, pch=21, bg=gdf$elev, cex=2, phylo.par = list(tip.labels = FALSE, node.labels = FALSE) )
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

Phylogenetically-Aligned Components Analysis (PACA)

plot.paca <- gm.prcomp(shape,phy=plethtree, align.to.phy = TRUE)
plot(plot.paca,phylo = TRUE, pch=21, bg=gdf$elev, cex=2, phylo.par = list(tip.labels = FALSE, node.labels = FALSE) )
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

Side by Side

par(mfrow=c(1,3))

plot(plot.pca,phylo = TRUE, pch=21, bg=gdf$elev, cex=2, phylo.par = list(tip.labels = FALSE, node.labels = FALSE), main = "Phylomorphospace" )
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

plot(plot.ppca,phylo = TRUE, pch=21, bg=gdf$elev, cex=2, phylo.par = list(tip.labels = FALSE, node.labels = FALSE), main = "pPCA" )
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

plot(plot.paca,phylo = TRUE, pch=21, bg=gdf$elev, cex=2, phylo.par = list(tip.labels = FALSE, node.labels = FALSE), main = "PACA" )
legend("topleft", pch=21, pt.bg = unique(gdf$elev), legend = levels(gdf$elev))

par(mfrow=c(1,1))

4: Comparing Evolutionary Models

For multivariate data, comparing evolutionary rate models are robust and well-developed. Comparisons of evolutionary ‘modes’ (e.g., BM vs OU, etc.) requires additional development (see Adams and Collyer 2018; 2019).

4a: Comparing Evolutionary Rates Among Clades

In terms of comparisons of evolutionary models, two are currently appropriate for multivariate data. The first compares net rates of phenotypic evolution among clades, using the function compare.evol.rates:

ER<-compare.evol.rates(A=gdf$shape, phy=plethtree,gp=gdf$elev,iter=999, method = 'permutation',print.progress = FALSE)
summary(ER)

## 
## Call:
## 
## 
## Observed Rate Ratio: 1.0133
## 
## P-value: 0.9645
## 
## Effect Size: -1.5892
## 
## Based on 1000 random permutations
## 
## The rate for group High is 1.00093880316059e-05  
## 
## The rate for group Low is 1.01425751777744e-05

plot(ER)

4b: Comparing Evolutionary Rates Among Traits

One can also compare net rates of phenotypic evolution among sets of multivariate traits. This is accomplished using the function compare.multi.evol.rates:

EMR <- compare.multi.evol.rates(A=gdf$shape, phy=plethtree, gp=c(rep(1,5),rep(2,6)), print.progress = FALSE)
summary(EMR)

## 
## Call:
## 
## 
## Observed Rate Ratio: 1.0826
## 
## P-value: 0.975
## 
## Effect Size: -1.943
## 
## Based on 1000 random permutations
## 
## The rate for group 1 is 9.67170500545255e-06  
## 
## The rate for group 2 is 1.04710160170649e-05

plot(EMR)