RRphylo: overfitRR – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

overfitRR

Testing RRphylo methods overfit

Description

Testing the robustness of search.trend (Castiglione et al. 2019a), search.shift (Castiglione et al. 2018), and search.conv (Castiglione et al. 2019b) results to sampling effects and phylogenetic uncertainty.

Usage

overfitRR(RR,y,s=0.25,swap.args=NULL,trend.args=NULL,shift.args=NULL,conv.args=NULL,
aces=NULL,x1=NULL,aces.x1=NULL,cov=NULL,rootV=NULL,nsim=100,clus=.5)

Arguments

`RR`	an object produced by `RRphylo`.
`y`	a named vector of phenotypes.
`s`	the percentage of tips to be cut off. It is set at 25% by default.
`swap.args`	a list of arguments to be passed to the function `swapONE`, including `list(si=NULL,si2=NULL,` `node=NULL)`. If `swap.arg` is unspecified, the function automatically sets both `si` and `si2` to 0.1.
`trend.args`	a list of arguments specific to the function `search.trend`, including `list(node=NULL,x1.residuals=FALSE)`. If a trend for the whole tree is to be tested, type `trend.args = list()`. No trend is tested if left unspecified.
`shift.args`	a list of arguments specific to the function `search.shift`, including `list(node=NULL,` `state=NULL)`. Arguments `node` and `state` can be specified at the same time.
`conv.args`	a list of arguments specific to the function `search.conv`, including `list(node=NULL,` `state=NULL, declust=FALSE)`. Arguments `node` and `state` can be specified at the same time.
`aces`	if used to produce the `RR` object, the vector of those ancestral character values at nodes known in advance must be specified. Names correspond to the nodes in the tree.
`x1`	the additional predictor to be specified if the RR object has been created using an additional predictor (i.e. multiple version of `RRphylo`). `'x1'` vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running `RRphylo` on the predictor as well, and taking the vector of ancestral states and tip values to form the `x1`.
`aces.x1`	a named vector of ancestral character values at nodes for `x1`. It must be indicated if the RR object has been created using both `aces` and `x1`. Names correspond to the nodes in the tree.
`cov`	if used to produce the `RR` object, the covariate must be specified. As in `RRphylo`, the covariate vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running `RRphylo` on the covariate as well, and taking the vector of ancestral states and tip values to form the covariate.
`rootV`	if used to produce the `RR` object, the phenotypic value at the tree root must be specified.
`nsim`	number of simulations to be performed. It is set at 100 by default.
`clus`	the proportion of clusters to be used in parallel computing. To run the single-threaded version of `overfitRR` set `clus` = 0.

Details

Methods using a large number of parameters risk being overfit. This usually translates in poor fitting with data and trees other than the those originally used. With RRphylo methods this risk is usually very low. However, the user can assess how robust the results got by applying search.shift, search.trend or search.conv are by running overfitRR. With the latter, the original tree and data are subsampled by specifying a s parameter, that is the proportion of tips to be removed from the tree. In some cases, though, removing as many tips as imposed by s would delete too many tips right in clades and/or states under testing. In these cases, the function maintains no less than 5 species at least in each clade/state under testing (or all species if there is less), reducing the sampling parameter s if necessary. Internally, overfitRR further shuffles the tree by using the function swapONE. Thereby, both the potential for overfit and phylogenetic uncertainty are accounted for straight away.

Value

The function returns a 'list' containing:

$mean.sampling the mean proportion of species actually removed from the tree over the iterations.

$rootCI the 95% confidence interval around the root value.

$ace.regressions the results of linear regression between ancestral state estimates before and after the subsampling.

$conv.results a list including results for search.conv performed under clade and state conditions. If a node pair is specified within conv.args, the $clade object contains the percentage of simulations producing significant p-values for convergence between the clades. If a state vector is supplied within conv.args, the object $state contains the percentage of simulations producing significant p-values for convergence within (single state) or between states (multiple states).

$shift.results a list including results for search.shift performed under clade and sparse conditions. If one or more nodes are specified within shift.args, the $clade object contains for each node the percentage of simulations producing significant p-value separated by shift sign, and the same figures by considering all the specified nodes as evolving under a single rate (all.clades).If a state vector is supplied within shift.args, the object $sparse contains the percentage of simulations producing significant p-value separated by shift sign ($p.states).

$trend.results a list including the percentage of simulations showing significant p-values for phenotypes versus age and absolute rates versus age regressions for the entire tree separated by slope sign ($tree). If one or more nodes are specified within trend.args, the list also includes the same results at nodes ($node) and the results for comparison between nodes ($comparison).

Author(s)

Silvia Castiglione, Carmela Serio, Pasquale Raia

References

Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., Di Febbraro, M., & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9: 974-983.doi:10.1111/2041-210X.12954

Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019a) Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PLoS ONE, 14: e0210101. https://doi.org/10.1371/journal.pone.0210101

Castiglione, S., Serio, C., Tamagnini, D., Melchionna, M., Mondanaro, A., Di Febbraro, M., Profico, A., Piras, P.,Barattolo, F., & Raia, P. (2019b). A new, fast method to search for morphological convergence with shape data. PLoS ONE, 14, e0226949. https://doi.org/10.1371/journal.pone.0226949

Examples

## Not run: 
data("DataOrnithodirans")
DataOrnithodirans$treedino->treedino
DataOrnithodirans$massdino->massdino
DataOrnithodirans$statedino->statedino
cc<- 2/parallel::detectCores()

# Extract Pterosaurs tree and data
library(ape)
extract.clade(treedino,746)->treeptero
massdino[match(treeptero$tip.label,names(massdino))]->massptero
massptero[match(treeptero$tip.label,names(massptero))]->massptero


RRphylo(tree=treedino,y=massdino)->dinoRates
RRphylo(tree=treeptero,y=log(massptero))->RRptero

# Case 1 search.shift under both "clade" and "sparse" condition
search.shift(RR=dinoRates, status.type= "clade",foldername=tempdir())->SSnode
search.shift(RR=dinoRates, status.type= "sparse", state=statedino,
             foldername=tempdir())->SSstate

overfitRR(RR=dinoRates,y=massdino,swap.args =list(si=0.2,si2=0.2),
          shift.args = list(node=rownames(SSnode$single.clades),state=statedino),
          nsim=10,clus=cc)

# Case 2 search.trend on the entire tree
search.trend(RR=RRptero, y=log(massptero),nsim=100,clus=cc,
             foldername=tempdir(),cov=NULL,ConfInt=FALSE,node=NULL)->STtree

overfitRR(RR=RRptero,y=log(massptero),swap.args =list(si=0.2,si2=0.2),
          trend.args = list(),nsim=10,clus=cc)

# Case 3 search.trend at specified nodes
search.trend(RR=RRptero, y=log(massptero),node=143,clus=cc,foldername=tempdir(),
             cov=NULL,ConfInt=FALSE)->STnode

overfitRR(RR=RRptero,y=log(massptero),trend.args = list(node=143),nsim=10,clus=cc)

# Case 4 overfitRR on multiple RRphylo
data("DataCetaceans")
DataCetaceans$treecet->treecet
DataCetaceans$masscet->masscet
DataCetaceans$brainmasscet->brainmasscet
DataCetaceans$aceMyst->aceMyst

ape::drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),
                                               treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi

RRphylo(tree=treecet.multi,y=masscet.multi)->RRmass.multi
RRmass.multi$aces[,1]->acemass.multi
c(acemass.multi,masscet.multi)->x1.mass

RRphylo(tree=treecet.multi,y=brainmasscet,x1=x1.mass)->RRmulti
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,clus=cc,foldername=tempdir())->STcet
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(),x1=x1.mass,nsim=10,clus=cc)

search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,x1.residuals=TRUE,
             clus=cc,foldername=tempdir())->STcet.resi
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(x1.residuals=TRUE),
          x1=x1.mass,nsim=10,clus=cc)

# Case 5 searching convergence between clades and within a single state
data("DataFelids")
DataFelids$PCscoresfel->PCscoresfel
DataFelids$treefel->treefel
DataFelids$statefel->statefel

RRphylo(tree=treefel,y=PCscoresfel,clus=cc)->RRfel
search.conv(RR=RRfel, y=PCscoresfel, min.dim=5, min.dist="node9",
            foldername = tempdir(),clus=cc)->SC.clade
as.numeric(c(rownames(SC.clade[[1]])[1],as.numeric(as.character(SC.clade[[1]][1,1]))))->conv.nodes

overfitRR(RR=RRfel, y=PCscoresfel,conv.args =
list(node=conv.nodes,state=statefel,declust=TRUE),nsim=10,clus=cc)



## End(Not run)

RRphylo

Phylogenetic Ridge Regression Methods for Comparative Studies

v2.5.0

GPL-2

Authors

Pasquale Raia, Silvia Castiglione, Carmela Serio, Alessandro Mondanaro, Marina Melchionna, Mirko Di Febbraro, Antonio Profico, Francesco Carotenuto

Initial release

2020-12-03