Testing RRphylo methods overfit
Testing the robustness of search.trend
(Castiglione et al. 2019a), search.shift
(Castiglione et al. 2018), and search.conv
(Castiglione et al. 2019b) results to sampling effects and
phylogenetic uncertainty.
overfitRR(RR,y,s=0.25,swap.args=NULL,trend.args=NULL,shift.args=NULL,conv.args=NULL, aces=NULL,x1=NULL,aces.x1=NULL,cov=NULL,rootV=NULL,nsim=100,clus=.5)
RR |
an object produced by |
y |
a named vector of phenotypes. |
s |
the percentage of tips to be cut off. It is set at 25% by default. |
swap.args |
a list of arguments to be passed to the function
|
trend.args |
a list of arguments specific to the function
|
shift.args |
a list of arguments specific to the function
|
conv.args |
a list of arguments specific to the function
|
aces |
if used to produce the |
x1 |
the additional predictor to be specified if the RR object has been
created using an additional predictor (i.e. multiple version of
|
aces.x1 |
a named vector of ancestral character values at nodes for
|
cov |
if used to produce the |
rootV |
if used to produce the |
nsim |
number of simulations to be performed. It is set at 100 by default. |
clus |
the proportion of clusters to be used in parallel computing. To
run the single-threaded version of |
Methods using a large number of parameters risk being overfit. This
usually translates in poor fitting with data and trees other than the those
originally used. With RRphylo
methods this risk is usually very low.
However, the user can assess how robust the results got by applying
search.shift
, search.trend
or search.conv
are by
running overfitRR
. With the latter, the original tree and data are
subsampled by specifying a s
parameter, that is the proportion of
tips to be removed from the tree. In some cases, though, removing as many
tips as imposed by s
would delete too many tips right in clades
and/or states under testing. In these cases, the function maintains no less
than 5 species at least in each clade/state under testing (or all species
if there is less), reducing the sampling parameter s
if necessary.
Internally, overfitRR
further shuffles the tree by using the
function swapONE
. Thereby, both the potential for overfit and
phylogenetic uncertainty are accounted for straight away.
The function returns a 'list' containing:
$mean.sampling the mean proportion of species actually removed from the tree over the iterations.
$rootCI the 95% confidence interval around the root value.
$ace.regressions the results of linear regression between ancestral state estimates before and after the subsampling.
$conv.results a list including results for
search.conv
performed under clade
and state
conditions. If a node pair is specified within conv.args
, the
$clade
object contains the percentage of simulations producing
significant p-values for convergence between the clades. If a state vector
is supplied within conv.args
, the object $state
contains the
percentage of simulations producing significant p-values for convergence
within (single state) or between states (multiple states).
$shift.results a list including results for
search.shift
performed under clade
and sparse
conditions. If one or more nodes are specified within shift.args
,
the $clade
object contains for each node the percentage of
simulations producing significant p-value separated by shift sign, and the
same figures by considering all the specified nodes as evolving under a
single rate (all.clades).If a state vector is supplied within
shift.args
, the object $sparse
contains the percentage of
simulations producing significant p-value separated by shift sign
($p.states).
$trend.results a list including the percentage of
simulations showing significant p-values for phenotypes versus age and
absolute rates versus age regressions for the entire tree separated by
slope sign ($tree). If one or more nodes are specified within
trend.args
, the list also includes the same results at nodes ($node)
and the results for comparison between nodes ($comparison).
Silvia Castiglione, Carmela Serio, Pasquale Raia
Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., Di Febbraro, M., & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9: 974-983.doi:10.1111/2041-210X.12954
Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019a) Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PLoS ONE, 14: e0210101. https://doi.org/10.1371/journal.pone.0210101
Castiglione, S., Serio, C., Tamagnini, D., Melchionna, M., Mondanaro, A., Di Febbraro, M., Profico, A., Piras, P.,Barattolo, F., & Raia, P. (2019b). A new, fast method to search for morphological convergence with shape data. PLoS ONE, 14, e0226949. https://doi.org/10.1371/journal.pone.0226949
## Not run: data("DataOrnithodirans") DataOrnithodirans$treedino->treedino DataOrnithodirans$massdino->massdino DataOrnithodirans$statedino->statedino cc<- 2/parallel::detectCores() # Extract Pterosaurs tree and data library(ape) extract.clade(treedino,746)->treeptero massdino[match(treeptero$tip.label,names(massdino))]->massptero massptero[match(treeptero$tip.label,names(massptero))]->massptero RRphylo(tree=treedino,y=massdino)->dinoRates RRphylo(tree=treeptero,y=log(massptero))->RRptero # Case 1 search.shift under both "clade" and "sparse" condition search.shift(RR=dinoRates, status.type= "clade",foldername=tempdir())->SSnode search.shift(RR=dinoRates, status.type= "sparse", state=statedino, foldername=tempdir())->SSstate overfitRR(RR=dinoRates,y=massdino,swap.args =list(si=0.2,si2=0.2), shift.args = list(node=rownames(SSnode$single.clades),state=statedino), nsim=10,clus=cc) # Case 2 search.trend on the entire tree search.trend(RR=RRptero, y=log(massptero),nsim=100,clus=cc, foldername=tempdir(),cov=NULL,ConfInt=FALSE,node=NULL)->STtree overfitRR(RR=RRptero,y=log(massptero),swap.args =list(si=0.2,si2=0.2), trend.args = list(),nsim=10,clus=cc) # Case 3 search.trend at specified nodes search.trend(RR=RRptero, y=log(massptero),node=143,clus=cc,foldername=tempdir(), cov=NULL,ConfInt=FALSE)->STnode overfitRR(RR=RRptero,y=log(massptero),trend.args = list(node=143),nsim=10,clus=cc) # Case 4 overfitRR on multiple RRphylo data("DataCetaceans") DataCetaceans$treecet->treecet DataCetaceans$masscet->masscet DataCetaceans$brainmasscet->brainmasscet DataCetaceans$aceMyst->aceMyst ape::drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet), treecet$tip.label)])->treecet.multi masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi RRphylo(tree=treecet.multi,y=masscet.multi)->RRmass.multi RRmass.multi$aces[,1]->acemass.multi c(acemass.multi,masscet.multi)->x1.mass RRphylo(tree=treecet.multi,y=brainmasscet,x1=x1.mass)->RRmulti search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,clus=cc,foldername=tempdir())->STcet overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(),x1=x1.mass,nsim=10,clus=cc) search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,x1.residuals=TRUE, clus=cc,foldername=tempdir())->STcet.resi overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(x1.residuals=TRUE), x1=x1.mass,nsim=10,clus=cc) # Case 5 searching convergence between clades and within a single state data("DataFelids") DataFelids$PCscoresfel->PCscoresfel DataFelids$treefel->treefel DataFelids$statefel->statefel RRphylo(tree=treefel,y=PCscoresfel,clus=cc)->RRfel search.conv(RR=RRfel, y=PCscoresfel, min.dim=5, min.dist="node9", foldername = tempdir(),clus=cc)->SC.clade as.numeric(c(rownames(SC.clade[[1]])[1],as.numeric(as.character(SC.clade[[1]][1,1]))))->conv.nodes overfitRR(RR=RRfel, y=PCscoresfel,conv.args = list(node=conv.nodes,state=statefel,declust=TRUE),nsim=10,clus=cc) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.