Testing RRphylo methods overfit
Testing the robustness of search.trend
(Castiglione et al. 2019a), search.shift
(Castiglione et al. 2018), and search.conv
(Castiglione et al. 2019b) results to sampling effects and
phylogenetic uncertainty.
overfitRR(RR,y,s=0.25,swap.args=NULL,trend.args=NULL,shift.args=NULL,conv.args=NULL, aces=NULL,x1=NULL,aces.x1=NULL,cov=NULL,rootV=NULL,nsim=100,clus=.5)
RR |
an object produced by |
y |
a named vector of phenotypes. |
s |
the percentage of tips to be cut off. It is set at 25% by default. |
swap.args |
a list of arguments to be passed to the function
|
trend.args |
a list of arguments specific to the function
|
shift.args |
a list of arguments specific to the function
|
conv.args |
a list of arguments specific to the function
|
aces |
if used to produce the |
x1 |
the additional predictor to be specified if the RR object has been
created using an additional predictor (i.e. multiple version of
|
aces.x1 |
a named vector of ancestral character values at nodes for
|
cov |
if used to produce the |
rootV |
if used to produce the |
nsim |
number of simulations to be performed. It is set at 100 by default. |
clus |
the proportion of clusters to be used in parallel computing. To
run the single-threaded version of |
Methods using a large number of parameters risk being overfit. This
usually translates in poor fitting with data and trees other than the those
originally used. With RRphylo methods this risk is usually very low.
However, the user can assess how robust the results got by applying
search.shift, search.trend or search.conv are by
running overfitRR. With the latter, the original tree and data are
subsampled by specifying a s parameter, that is the proportion of
tips to be removed from the tree. In some cases, though, removing as many
tips as imposed by s would delete too many tips right in clades
and/or states under testing. In these cases, the function maintains no less
than 5 species at least in each clade/state under testing (or all species
if there is less), reducing the sampling parameter s if necessary.
Internally, overfitRR further shuffles the tree by using the
function swapONE. Thereby, both the potential for overfit and
phylogenetic uncertainty are accounted for straight away.
The function returns a 'list' containing:
$mean.sampling the mean proportion of species actually removed from the tree over the iterations.
$rootCI the 95% confidence interval around the root value.
$ace.regressions the results of linear regression between ancestral state estimates before and after the subsampling.
$conv.results a list including results for
search.conv performed under clade and state
conditions. If a node pair is specified within conv.args, the
$clade object contains the percentage of simulations producing
significant p-values for convergence between the clades. If a state vector
is supplied within conv.args, the object $state contains the
percentage of simulations producing significant p-values for convergence
within (single state) or between states (multiple states).
$shift.results a list including results for
search.shift performed under clade and sparse
conditions. If one or more nodes are specified within shift.args,
the $clade object contains for each node the percentage of
simulations producing significant p-value separated by shift sign, and the
same figures by considering all the specified nodes as evolving under a
single rate (all.clades).If a state vector is supplied within
shift.args, the object $sparse contains the percentage of
simulations producing significant p-value separated by shift sign
($p.states).
$trend.results a list including the percentage of
simulations showing significant p-values for phenotypes versus age and
absolute rates versus age regressions for the entire tree separated by
slope sign ($tree). If one or more nodes are specified within
trend.args, the list also includes the same results at nodes ($node)
and the results for comparison between nodes ($comparison).
Silvia Castiglione, Carmela Serio, Pasquale Raia
Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., Di Febbraro, M., & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9: 974-983.doi:10.1111/2041-210X.12954
Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019a) Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PLoS ONE, 14: e0210101. https://doi.org/10.1371/journal.pone.0210101
Castiglione, S., Serio, C., Tamagnini, D., Melchionna, M., Mondanaro, A., Di Febbraro, M., Profico, A., Piras, P.,Barattolo, F., & Raia, P. (2019b). A new, fast method to search for morphological convergence with shape data. PLoS ONE, 14, e0226949. https://doi.org/10.1371/journal.pone.0226949
## Not run:
data("DataOrnithodirans")
DataOrnithodirans$treedino->treedino
DataOrnithodirans$massdino->massdino
DataOrnithodirans$statedino->statedino
cc<- 2/parallel::detectCores()
# Extract Pterosaurs tree and data
library(ape)
extract.clade(treedino,746)->treeptero
massdino[match(treeptero$tip.label,names(massdino))]->massptero
massptero[match(treeptero$tip.label,names(massptero))]->massptero
RRphylo(tree=treedino,y=massdino)->dinoRates
RRphylo(tree=treeptero,y=log(massptero))->RRptero
# Case 1 search.shift under both "clade" and "sparse" condition
search.shift(RR=dinoRates, status.type= "clade",foldername=tempdir())->SSnode
search.shift(RR=dinoRates, status.type= "sparse", state=statedino,
foldername=tempdir())->SSstate
overfitRR(RR=dinoRates,y=massdino,swap.args =list(si=0.2,si2=0.2),
shift.args = list(node=rownames(SSnode$single.clades),state=statedino),
nsim=10,clus=cc)
# Case 2 search.trend on the entire tree
search.trend(RR=RRptero, y=log(massptero),nsim=100,clus=cc,
foldername=tempdir(),cov=NULL,ConfInt=FALSE,node=NULL)->STtree
overfitRR(RR=RRptero,y=log(massptero),swap.args =list(si=0.2,si2=0.2),
trend.args = list(),nsim=10,clus=cc)
# Case 3 search.trend at specified nodes
search.trend(RR=RRptero, y=log(massptero),node=143,clus=cc,foldername=tempdir(),
cov=NULL,ConfInt=FALSE)->STnode
overfitRR(RR=RRptero,y=log(massptero),trend.args = list(node=143),nsim=10,clus=cc)
# Case 4 overfitRR on multiple RRphylo
data("DataCetaceans")
DataCetaceans$treecet->treecet
DataCetaceans$masscet->masscet
DataCetaceans$brainmasscet->brainmasscet
DataCetaceans$aceMyst->aceMyst
ape::drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),
treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi
RRphylo(tree=treecet.multi,y=masscet.multi)->RRmass.multi
RRmass.multi$aces[,1]->acemass.multi
c(acemass.multi,masscet.multi)->x1.mass
RRphylo(tree=treecet.multi,y=brainmasscet,x1=x1.mass)->RRmulti
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,clus=cc,foldername=tempdir())->STcet
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(),x1=x1.mass,nsim=10,clus=cc)
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,x1.residuals=TRUE,
clus=cc,foldername=tempdir())->STcet.resi
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(x1.residuals=TRUE),
x1=x1.mass,nsim=10,clus=cc)
# Case 5 searching convergence between clades and within a single state
data("DataFelids")
DataFelids$PCscoresfel->PCscoresfel
DataFelids$treefel->treefel
DataFelids$statefel->statefel
RRphylo(tree=treefel,y=PCscoresfel,clus=cc)->RRfel
search.conv(RR=RRfel, y=PCscoresfel, min.dim=5, min.dist="node9",
foldername = tempdir(),clus=cc)->SC.clade
as.numeric(c(rownames(SC.clade[[1]])[1],as.numeric(as.character(SC.clade[[1]][1,1]))))->conv.nodes
overfitRR(RR=RRfel, y=PCscoresfel,conv.args =
list(node=conv.nodes,state=statefel,declust=TRUE),nsim=10,clus=cc)
## End(Not run)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.