diff --git a/README.md b/README.md index 40f4318..8a48729 100644 --- a/README.md +++ b/README.md @@ -1 +1,11 @@ -# contraband-tutorial \ No newline at end of file +--- +author: Rong Zhang,Fábio K. Mendes +level: Intermediate +title: Contraband tutorial +subtitle: Total-evidence dating and trait-evolution evolutionary inference using phylogenetic multivariate Brownian motion models +beastversion: 2.7.7 +--- + +# Contraband tutorial + +This tutorial is currently only available as a [PDF]( {{site.baseurl}}/tutorials/contraband-tutorial/main.pdf ). \ No newline at end of file diff --git a/main.pdf b/main.pdf index 5ac7ed3..6d0010b 100644 Binary files a/main.pdf and b/main.pdf differ diff --git a/main.tex b/main.tex index 2027c0e..7a14e9f 100644 --- a/main.tex +++ b/main.tex @@ -49,13 +49,13 @@ \section{Background}\label{background} % \begin{quote} -\noindent \textbf{Bird's-eye view}. This tutorial shows how to use the \texttt{contraband} package in \texttt{BEAST 2} to model continuous trait evolution along a phylogeny with Brownian motion. +\noindent \textbf{Bird's-eye view}. This tutorial shows how to use the \texttt{contraband} package in \texttt{BEAST2} to model continuous trait evolution along a phylogeny with Brownian motion. Unlike methods that assume a ``known'', fixed tree, \texttt{contraband} lets you estimate the tempo and mode of trait evolution simultaneously with both species relationships and divergence times. % \end{quote} \subsection{What is \texttt{contraband} for} -In this tutorial, we will walk you through running a simple analysis with the \texttt{contraband} (\textbf{con}tinuous \textbf{tra}its \textbf{b}rowni\textbf{an} mo\textbf{d}els) \texttt{BEAST 2} package. +In this tutorial, we will walk you through running a simple analysis with the \texttt{contraband} (\textbf{con}tinuous \textbf{tra}its \textbf{b}rowni\textbf{an} mo\textbf{d}els) \texttt{BEAST2} package. As the name suggests, \texttt{contraband} implements Brownian motion (BM) models for the evolution of continuous traits on a phylogeny. To understand how these models can be useful to evolutionary biologists, let's put our X-ray goggles on and look at the core of the \texttt{contraband} package: the probability density function (pdf) of the multivariate Brownian motion model -- the same pdf used for a multivariate normal distribution: @@ -78,7 +78,7 @@ \subsection{What is \texttt{contraband} for} In many software tools, especially those implemented in \texttt{R} and using frequentist methods, the phylogeny ($\boldsymbol{T}$) is not estimated but instead fixed to a tree point estimate from the literature. The downside of this approach is that the continuous trait data can only inform our estimates of trait evolution parameters, $\boldsymbol{y_0}$ and $\mathbf{\Sigma}$ -- not the phylogeny itself. -While it is possible to take this approach in \texttt{BEAST 2} as well, its hierarchical Bayesian framework allows us to go further: we can co-estimate $\boldsymbol{T}$ (i.e., the species tree or phylogeny) together with the parameters of trait evolution. +While it is possible to take this approach in \texttt{BEAST2} as well, its hierarchical Bayesian framework allows us to go further: we can co-estimate $\boldsymbol{T}$ (i.e., the species tree or phylogeny) together with the parameters of trait evolution. This means we can infer trait-evolution parameters \textbf{alongside} the species divergence times and phylogenetic relationships captured in $\boldsymbol{T}$. In other words, \texttt{contraband} is a tool not only for studying how continuous traits evolve, but also for estimating the topology and divergence times of phylogenies. @@ -114,7 +114,7 @@ \subsection{A quick peek under the hood} As mentioned above, $\mathbf{M}$ contains our continuous-character data, it is a matrix whose dimensions are the number of species $\times$ the number of characters. On the right-hand side of this equation, you should further recognize some of the terms that have direct counterparts in models used for molecular evolution, e.g., those involved in the morphological clock model. These are the morphological global clock rate ($c_m$) and relative branch rates ($\boldsymbol{b}_m$). -Other parameters, however, are unique to multivariate Brownian models, like the character values from all characters at the root of $\Phi$ ($\boldsymbol{y_0}$), a vector containing all relative character-specific evolutionary rates ($\boldsymbol{r}$), and the between-character correlation matrix ($\boldsymbol{\rho}$). +Other parameters, however, are unique to multivariate Brownian models, like the character values from all characters ($\boldsymbol{y_0}$) at the root of the tree ($\Phi$), a vector containing all relative character-specific evolutionary rates ($\boldsymbol{r}$), and the between-character correlation matrix ($\boldsymbol{\rho}$). All of these parameters can in principle be estimated with MCMC, but the accuracy of and uncertainty about our estimates will be a function of our data set size, which include the number of traits and of species (more details can be found in \cite{zhang24}), as well as analysis running times. Among the most challenging parameters to estimate is $\boldsymbol{\rho}$. @@ -156,7 +156,7 @@ \subsection{A quick peek under the hood} \section{Programs used in this exercise}\label{programs-used-in-this-exercise} % -\subsubsection{BEAST2 - Bayesian Evolutionary Analysis Sampling Trees2} +\subsubsection*{BEAST2 - Bayesian Evolutionary Analysis Sampling Trees2} % BEAST2 (\url{http://www.beast2.org}) is a free software package for Bayesian evolutionary analysis of molecular sequences using MCMC and @@ -164,27 +164,27 @@ \subsubsection{BEAST2 - Bayesian Evolutionary Analysis Sampling Trees2} phylogenetic trees. This tutorial is written for BEAST v2.7.7 \citep{bouckaert2019beast}. -\subsubsection{BEAUti2 - Bayesian Evolutionary Analysis Utility} +\subsubsection*{BEAUti2 - Bayesian Evolutionary Analysis Utility} BEAUti2 is the successor of BEAUti, a graphical user interface tool that makes it easy to generate BEAST2 XML configuration files (these files are necessary to specify and run MCMC analyses). It is provided as a part of the BEAST2 package so you do not need to install it separately. Both BEAST2 and BEAUti2 are written in Java, meaning that these programs can not only be integrated at their codebase level, but that they are also cross-platform: the exact same code runs on all platforms. Although the screenshots used in this tutorial have been taken on a Mac OS X computer, both BEAST2 and BEAUti2 will have the same layout and functionality under other operating systems like Windows and Linux. -\subsubsection{TreeAnnotator}\label{treeannotator} +\subsubsection*{TreeAnnotator}\label{treeannotator} TreeAnnotator is a program we will use to produce a summary tree from a posterior sample of trees obtained via MCMC. We will also use this program to summarize and visualize the posterior estimates of other tree-related parameters (e.g., node heights). TreeAnnotator is also provided as a part of the BEAST2 package so you do not need to install it separately. -\subsubsection{Tracer}\label{tracer} +\subsubsection*{Tracer}\label{tracer} Tracer (\url{http://tree.bio.ed.ac.uk/software/tracer}) is used to summarize the posterior estimates of the various parameters sampled via MCMC. This program can be used for visual inspection of MCMC chains and to assess their convergence. Tracer makes it easy to calculate parameter median estimates, their 95\% highest posterior density (95\%-HPD) intervals, their effective sample sizes (ESS), and their correlation with other parameters. % Conventionally, ESSs of at least 200 for are interpreted as It can also be used to investigate potential parameter correlations. We will be using Tracer v1.7.2. -\subsubsection{FigTree}\label{figtree} +\subsubsection*{FigTree}\label{figtree} The last program we will use is FigTree (\url{http://tree.bio.ed.ac.uk/software/figtree}). FigTree was designed so that users can easily visualize trees and draw publication-quality figures. @@ -196,7 +196,7 @@ \section{Setting up} \subsection{Installing dependencies} Total-evidence dating of phylogenies is a complex task that requires a series of models, a few of which are implemented in their own BEAST2 packages. -The main package for this tutorial is called \texttt{contraband} and it implements Brownian models for the evolution of multiple characters on phylogenetic trees. +The main package for this tutorial is called \texttt{contraband} and it implements Brownian motion models for the evolution of multiple characters on phylogenetic trees. % In this tutorial we will estimate evolutionary rates, trait correlations, ancestral states and phylogenetic trees using the Brownian motion implemented in BEAST2, contraband. @@ -212,57 +212,66 @@ \subsection{Installing dependencies} % Learn how to read the output of a ``contraband" analysis % \end{itemize} -In order to install \texttt{contraband}, we have to download it using the BEAUTi \textcolor{red}{[2?]} package manager. -Open BEAUti2, go to \emph{File \textgreater{}\textgreater{} Manage Packages}, and click on the \texttt{contraband} link (Figure 1). -The package will become available in BEAUti2 once you close and restart the program. +In order to install \texttt{contraband}, we have to download it using the BEAST2 package manager. +Open BEAUti2, go to \emph{File \textgreater{}\textgreater{} Manage Packages}, and click on the \texttt{contraband} link and then clicking \emph{Install/Upgrade} (\autoref{fig:example1}). \begin{figure}[!htbp] \centering \includegraphics[width=0.5\textwidth]{figures/contrabandDownload.png} - \caption{Download the contraband package.} + \caption{Downloading the contraband package.} \label{fig:example1} \end{figure} This tutorial will also make use of a few other packages; -these are \texttt{bdtree}, \texttt{sampled-ancestors}, and \texttt{morph-models}. +these are \texttt{bdtree} (birth-death sequential sampling trees), \texttt{SA} (sampled-ancestor trees), and \texttt{MM} (morphological character evolution models). Please also install these packages if they are not already installed! + The first two implement speciation and fossilization models (for evaluating the probability of phylogenies themselves), and the latter implements models for the evolution of discrete morphological characters. -These packages have a dedicated tutorial (here \textcolor{red}{[https://taming-the-beast.org/tutorials/Total-Evidence-Tutorial/]}) and will not be discussed further in the present exercise. +The SA and MM packages have two dedicated tutorials on the Taming the BEAST website (\href{https://taming-the-beast.org/tutorials/FBD-tutorial/}{Divergence Time Estimation tutorial} and \href{https://taming-the-beast.org/tutorials/Total-Evidence-Tutorial/}{Total Evidence tutorial}) and will not be discussed further in the present exercise. + +The newly installed packages will become available in BEAUti2 once you close and restart the program. + \subsection{Preparing the data} -The data sets used in this tutorial include three types of data -- molecular, discrete morphology, continuous morphology -- scored fo up to 27 Carnivore species (11 of which are extinct and 16 of which are extant). +The data sets used in this tutorial include three types of data -- molecular, discrete morphology and continuous morphology -- scored for up to 27 Carnivore species (11 of which are extinct and 16 of which are extant). \subsubsection{Continuous characters} -Our TED analysis will leverage a published geometric-morphometric data set consisting of 29 three-dimensional (3D) cranium landmarks \citep{alvarez19}, each dimension of which will be treated as a separate continuous characters (i.e., we will have a total of 87 continuous characters). +Our TED analysis will leverage a published geometric-morphometric data set consisting of 29 three-dimensional (3D) cranium landmarks \citep{alvarez19}, each dimension of which will be treated as a separate continuous character (i.e., we will have a total of 87 continuous characters). This data can be found in file \texttt{carnivora\_continuous\_27.nex} attached to this tutorial. The same cranium landmarks have also been scored in 21 \textit{Vulpes vulpes} (one of the focal carnivore species) individuals. -This intraspecific data will be used in the analyses the bypasses the estimation of character correlations ($\boldsymbol{\rho}$), and can be found in attached file \texttt{vulpes\_continuous\_data.txt}. +This intraspecific data will be used in the analyses to bypass the estimation of character correlations ($\boldsymbol{\rho}$), and can be found in the attached file \texttt{vulpes\_continuous\_data.txt}. \subsubsection{Discrete characters} -12 species of interest have discrete morphological characters that describe their basicranial, dental, postcranial anatomical features (carnivora\_discrete\_27.nex) \citep{barrett21}. There are 183 features in total and the number of character states ranges from 0 to 3. +12 species of interest have discrete morphological characters that describe their basicranial, dental and postcranial anatomical features (\texttt{carnivora\_discrete\_27.nex}) \citep{barrett21}. There are 183 features in total and the number of character states ranges from 0 to 3. \subsubsection{Molecular sequences} -The molecular sequences of 12 mitochondrial genes for 14 species of interest are collected from NCBI database and are further were concatenated, aligned using MAFFT (carnivora\_dna\_27.fasta). +The molecular sequences of 12 mitochondrial genes for 14 species of interest were collected from the NCBI database, concatenated and aligned using MAFFT (\texttt{carnivora\_dna\_27.fasta}). \section{Practical Part \uppercase\expandafter{\romannumeral 3}: Parameter and State inference under Brownian motion model} + +\subsection{Setting up the analysis in BEAUti} + + \subsubsection{Loading the Carnivoran Continuous data}\label{load-continuous-data} -The continuous characters can be found in the \emph{data} folder named \emph{carnivora\_continuous\_27.nex}. It can be either drag and dropped into BEAUti "Partitions" panel or added using BEAUti’s menu system via \emph{File \textgreater{}\textgreater{} Load Continuous Data}. Once the character are loaded successfully into BEAUTi, the panel will show +The continuous characters can be found in the \texttt{data} folder named \texttt{carnivora\_continuous\_27.nex}. It can be either dragged and dropped into BEAUti "Partitions" panel or added using BEAUti’s menu system via \emph{File \textgreater{}\textgreater{} Load Continuous Data}. Once the character are loaded successfully into BEAUTi, the panel will show +\textcolor{red}{\textbf{ THIS DOESN'T SEEM TO WORK AND SECTION INCOMPLETE!!!}} -\subsubsection{Get the fossil ages (Tip Dates)}\label{parse-fosill-age} -Since the data set have fossil species, we will need to open the "Tip Dates" panel and then select the "Use tip dates" checkbox to specify the fossil ages. This can be done in multiple ways. In our case, we can obtain the date information from the species names. We can tell BEAUti to use these by clicking the \emph{Auto-configure} button. The fossil ages appear following the second underscore "\_" in the species name. To extract these times, select "use everything", then select "after last" from the drop-down box to the right, and input "\_" (without the quotes) in the text box immediately to the right, as shown in the figure below \autoref{fig:example2}. Clicking "Ok" should now populate the table with the fossil ages extracted from the species names. + +\subsubsection{Setting the fossil ages (Tip Dates)}\label{parse-fosill-age} +Since the data set contains fossil species, we will need to open the "Tip Dates" panel and then select the "Use tip dates" checkbox to specify the fossil ages. This can be done in multiple ways. In our case, we can obtain the date information from the species names. We can tell BEAUti to use these by clicking the \emph{Auto-configure} button. The fossil ages appear following the second underscore "\_" in the species name. To extract these times, select "use everything", then select "after last" from the drop-down box to the right, and input "\_" (without the quotes) in the text box immediately to the right, as shown in the figure below \autoref{fig:example2}. Clicking "OK" should now populate the table with the fossil ages extracted from the species names. \begin{figure}[!htbp] \centering \includegraphics[width=0.5\textwidth]{figures/BMParseDates.png} - \caption{Guess sampling times.} + \caption{Guessing sampling times.} \label{fig:example2} \end{figure} -In the populated table, the two columns \textbf{Date} and \textbf{Height} should now have values between 0.0 and 35.55 in million years \autoref{fig:example3}. +In the populated table, the two columns \textbf{Date} and \textbf{Height} should now have values between 0.0 and 35.55 in million years (\autoref{fig:example3}). \begin{figure}[!htbp] \centering @@ -271,8 +280,8 @@ \subsubsection{Get the fossil ages (Tip Dates)}\label{parse-fosill-age} \label{fig:example3} \end{figure} -\subsubsection{Set the Brownian motion Model} -As is introduced above, the parameters under the Brownian motion model include trait evolutionary rate (Sigmasq), trait correlations (Correlation) and ancestral states at the root (Root Values). Here we assume that all characters share one evolutionary rate. Therefore, we put a tick in the box in front of the "One Rate Only". +\subsubsection{Setting the Brownian motion Model} +As introduced above, the parameters under the Brownian motion model include trait evolutionary rate (Sigmasq), trait correlations (Correlation) and ancestral states at the root (Root Values). Here we assume that all characters share one evolutionary rate. Therefore, we put a tick in the box in front of the "One Rate Only". \begin{figure}[!htbp] \centering @@ -282,193 +291,219 @@ \subsubsection{Set the Brownian motion Model} \end{figure} -\subsubsection{Set the Clock model}\label{bm-clock-model} +\subsubsection{Setting the Clock model}\label{bm-clock-model} We assume the relative branch-specific rates are independently distributed and follow a LogNormal distribution with a fixed mean of 1. Therefore, we specify a relaxed clock model by selecting "Optimised Relaxed Clock" in the drop-down menu, where the mean clock rate represents the global morphological clock rate that will be estimated by default. The detailed description of the model can be found in \cite{douglas2021}. \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/BMClockModel.png} - \caption{Set the initial clock rate.} + \caption{Setting the initial clock rate.} \label{fig:example5} \end{figure} -\subsubsection{Specify the priors} -In the ''Priors" panel, we select "Fossilized Birth Death Model" \citep{gavryushkina2014} as the tree prior and leave the rest of the parameters having their default prior distributions. +\subsubsection{Specifying the priors} + +% FBD model is from Heath et al 2014, BEAST implementation from Gavryushkina et al 2014! +In the ''Priors" panel, we select "Fossilized Birth Death Model" \citep{heath2014, gavryushkina2014} as the tree prior and leave the rest of the parameters at their default prior distributions. \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/BMPriors.png} - \caption{Set the tree model and priors on parameters.} + \caption{Setting the tree model and priors on parameters.} \label{fig:example6} \end{figure} -\subsubsection{Specify the MCMC chain length (MCMC)}\label{specify-the-mcmc-chain-length-mcmc} +\subsubsection{Specifying the MCMC chain length (MCMC)}\label{specify-the-mcmc-chain-length-mcmc} % Here we can set the length of the MCMC chain and after how many iterations the parameter and trees a logged. For this dataset, 2 million iterations should be sufficient. In order to have enough samples but not create too large files, we can set the logEvery to 2000, so we have 1001 -samples overall. Next, we have to save the \lstinline!*.xml! file under +samples overall. Next, we have to save the \texttt{*.xml} file under \emph{File \textgreater{}\textgreater{} Save as}. \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/BMMCMC.png} - \caption{save the *.xml.} + \caption{saving the \texttt{*.xml} file.} \label{fig:example7} \end{figure} -\subsubsection{Run the Analysis using BEAST2}\label{run-the-analysis-using-beast2} +\subsection{Running the Analysis using BEAST2}\label{run-the-analysis-using-beast2} % -Run the \lstinline!*.xml! using BEAST2 or use finished runs from the -\emph{precooked-runs} folder. The analysis should take about 6 to 7 +Run the \texttt{*.xml} file using BEAST2. The analysis should take about 6 to 7 minutes. % -\subsubsection{Post analysis} +\subsection{Analysing the results} -\begin{itemize} -\item Analyse the log file using Tracer -\\ -First, we can open the \lstinline!*.log! file in tracer to check if the +For this section either use the output files from your own analysis or use finished runs from the +\emph{precooked-runs} folder. \textcolor{red}{\textbf{ Precooked runs don't exist!}} + +Follow the steps below: + +\subsubsection{Analysing the log file using Tracer} + +First, we can open the \texttt{*.log} file in tracer to check if the MCMC has converged. The ESS value should be above 200 for almost all -values and especially for the posterior estimates (\autoref{fig:bm_res1}). +parameters and especially for the posterior estimates (\autoref{fig:bm_res1}). This is clearly not the case here and this analysis should be run for much longer to reach convergence. \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/results/BM_posterior.png} - \caption{Check if the posterior converged.} + \caption{Checking if the chain converged.} \label{fig:bm_res1} \end{figure} -\item The estimated evolutionary rate (\autoref{fig:bm_res2}) -\begin{figure}[!htbp] - \centering - \includegraphics[width=0.700000\textwidth]{figures/results/BM_evo_rate.png} - \caption{Evolutionary rate shared by 87 characters.} - \label{fig:bm_res2} -\end{figure} -\item The estimated character correlations (\autoref{fig:bm_res3}) -\begin{figure}[!htbp] - \centering - \includegraphics[width=0.700000\textwidth]{figures/results/BM_covValues.png} - \caption{Character correlations among 87 characters.} - \label{fig:bm_res3} -\end{figure} +Next, examine the posterior estimates for the following parameters: -\item The estimated ancestral states at the root (\autoref{fig:bm_res4}) -\begin{figure}[!htbp] - \centering - \includegraphics[width=0.700000\textwidth]{figures/results/BM_rootValues.png} - \caption{87 trait values at the root of the tree.} - \label{fig:bm_res4} -\end{figure} +\begin{itemize} + \item The estimated evolutionary rate (\autoref{fig:bm_res2}), + \begin{figure}[!htbp] + \centering + \includegraphics[width=0.700000\textwidth]{figures/results/BM_evo_rate.png} + \caption{Evolutionary rate shared by 87 characters.} + \label{fig:bm_res2} + \end{figure} + + \item The estimated character correlations (\autoref{fig:bm_res3}), + \begin{figure}[!htbp] + \centering + \includegraphics[width=0.700000\textwidth]{figures/results/BM_covValues.png} + \caption{Character correlations among 87 characters.} + \label{fig:bm_res3} + \end{figure} + + \item The estimated ancestral states at the root (\autoref{fig:bm_res4}), + \begin{figure}[!htbp] + \centering + \includegraphics[width=0.700000\textwidth]{figures/results/BM_rootValues.png} + \caption{87 trait values at the root of the tree.} + \label{fig:bm_res4} + \end{figure} + + \item The inferred morphological clock model (\autoref{fig:bm_res5}). + \begin{figure}[!htbp] + \centering + \includegraphics[width=0.700000\textwidth]{figures/results/BM_clock_sigma.png} + \caption{Comparing the inferrred migration rates.} + \label{fig:bm_res5} + \end{figure} +\end{itemize} -\item The inferred morphological clock model (\autoref{fig:bm_res5}) -\begin{figure}[!htbp] - \centering - \includegraphics[width=0.700000\textwidth]{figures/results/BM_clock_sigma.png} - \caption{Compare the inferrred migration rates.} - \label{fig:bm_res5} -\end{figure} +\clearpage -\item Make the summary tree using TreeAnnotator -\\ +\subsubsection{Constructing the summary tree using TreeAnnotator} Open \textbf{TreeAnnotator} and then set the options including \textbf{Burnin percentage}, \textbf{Target tree type}, \textbf{Node heights}, \textbf{Input Tree File} and the \textbf{Output File}. -Use the logged trees in the file \lstinline!carnivora_continuous_27_tree_bm.trees! as \textbf{Input Tree File}. Name output file \lstinline!carnivora_continuous_27_bm_mcc.tree!. +Use the logged trees in the file +\\ +\texttt{carnivora\_continuous\_27\_tree\_bm.trees} as \textbf{Input Tree File}. Name output file +\\ +\texttt{carnivora\_continuous\_27\_bm\_mcc.tree}. After clicking \textbf{Run} the program should summarize the trees. + \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/results/BM_MCC_tree.png} - \caption{Summarised MCC tree of logged trees under Brownian motion model.} + \caption{Summarised MCC tree of logged trees under the Brownian motion model.} \label{fig:bm_res6} \end{figure} -\end{itemize} -\section{Practical Part \uppercase\expandafter{\romannumeral 4}: Parameter and State inference using combined data with Brownian motion model with shrinkage method} +\section{Practical Part \uppercase\expandafter{\romannumeral 4}: Parameter and State inference using combined data with the Brownian motion model combined with a shrinkage method} + +\subsection{Setting up the analysis in BEAUti} + + \subsubsection{Loading the Carnivoran data sets} -We first load the continuous data and parse the fossil ages as is mentioned in previous sections \ref{load-continuous-data} and \ref{parse-fosill-age}. Then, in the "Partitions" panel, we continue to load the Carnivoran molecular sequences via \emph{File \textgreater{}\textgreater{} Import Alignment}. Finally, we add the discrete characters by \emph{File \textgreater{}\textgreater{} Add Morphological Data}. As is shown in \autoref{fig:example8}, +We first load the continuous data and parse the fossil ages as in sections~\ref{load-continuous-data} and~\ref{parse-fosill-age}. Then, in the "Partitions" panel, we also load the Carnivoran molecular sequences via \emph{File \textgreater{}\textgreater{} Import Alignment}. Finally, we add the discrete characters by \emph{File \textgreater{}\textgreater{} Add Morphological Data}. As is shown in \autoref{fig:example8}, + +\textcolor{red}{\textbf{ SECTION INCOMPLETE?}} \begin{figure} \centering \includegraphics[width=0.700000\textwidth]{figures/DataPartitions.png} - \caption{Load continuous characters, molecular sequences and discrete characters.} + \caption{Loading continuous characters, molecular sequences and discrete characters.} \label{fig:example8} \end{figure} -\subsubsection{Set the Shrinkage Model} -In the "Shrinkage Model" panel, we will need to fill in three components of the model. First, the shrinkage parameter is given by a constant value in the box to the right of "Delta". Second, the continuous characters from 21 \textit{Vulpes vulpes} individuals are given in the block of "Population Traits". To be more specific, the trait data should be written in one-line data separated by spaces. In addition, the number of trait is given by "Minordimension" and should be consistent with the dimension of the continuous data in ''Partitions" panel. Third, the added individual trait values are not only used for estimating correlations, but also normalizing the continuous data of the 19 carnivoran species. Therefore, we put a $\checkmark$ in the box in front of "Include Pop Var". +\subsubsection{Setting the Shrinkage Model} +In the "Shrinkage Model" panel, we will need to fill in three components of the model. First, the shrinkage parameter is given by a constant value in the box to the right of "Delta". Second, the continuous characters from 21 \textit{Vulpes vulpes} individuals are given in the block of "Population Traits". To be more specific, the trait data should be written in one-line data separated by spaces. In addition, the number of traits is given by "Minordimension" and should be consistent with the dimension of the continuous data in the ''Partitions" panel. Third, the added individual trait values are not only used for estimating correlations, but also normalizing the continuous data of the 19 carnivoran species. Therefore, we put a $\checkmark$ in the box in front of "Include Pop Var" (\autoref{fig:example9}). + \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/ShrinkageModel.png} - \caption{Set the shrinkage model.} + \caption{Setting the shrinkage model.} \label{fig:example9} \end{figure} -\subsubsection{Set the Substitution Model} -In the "Site Model" panel, we assume a HKY+Gamma for nucleotide substitutions by specifying 4 categories \autoref{fig:example1}. In addition, we assume Mk models \citep{lewis2001} for discrete characters, as is shown in \autoref{fig:example11}. +\subsubsection{Setting the Substitution Model} +In the "Site Model" panel, we assume an HKY+Gamma model for nucleotide substitutions by specifying 4 categories under "Gamma Category Count" \autoref{fig:example10}. In addition, we assume Mk models \citep{lewis2001} for the discrete characters, as is shown in \autoref{fig:example11}. \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/SiteModelDNA.png} - \caption{Set site models for molecular sequences and discrete characters.} + \caption{Setting site models for molecular sequences.} \label{fig:example10} \end{figure} \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/SiteModelMk.png} - \caption{Set site models for molecular sequences and discrete characters.} + \caption{Set site models for discrete characters.} \label{fig:example11} \end{figure} -\subsubsection{Set the Clock model} -Similar to what is mentioned in section \autoref{bm-clock-model}, we assume relaxed clock model for each data partition. The specifications are shown in \autoref{fig:example12}. +\subsubsection{Setting the Clock model} +Similar to section~\ref{bm-clock-model}, we assume a relaxed clock model for each data partition. The specifications are shown in \autoref{fig:example12}. \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/ClockModel.png} - \caption{Set the initial clock models for continuous data, molecular data and discrete data.} + \caption{Setting the clock models for continuous data, molecular data and discrete data.} \label{fig:example12} \end{figure} -\subsubsection{Specify the priors} +\subsubsection{Specifying the priors} -First, we select ''Fossilized Birth Death Model" from the drop-down menu and set it as the tree prior. Then we also keep the default priors for the rest of the parameters \autoref{fig:example13}. +First, we select ''Fossilized Birth Death Model" from the drop-down menu and set it as the tree prior. Then we again retain the default priors for the rest of the parameters (\autoref{fig:example13}). \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/Priors.png} - \caption{Set up tree model and the prior distributions.} + \caption{Setting up tree model and the prior distributions.} \label{fig:example13} \end{figure} -\subsubsection{Specify the MCMC chain length (MCMC)}\label{specify-the-mcmc-chain-length-mcmc} +\subsubsection{Specifying the MCMC chain length (MCMC)}\label{specify-the-mcmc-chain-length-mcmc} % Here we can set the length of the MCMC chain and after how many iterations the parameter and trees a logged. For this dataset, 2 million iterations should be sufficient. In order to have enough samples but not create too large files, we can set the logEvery to 2000, so we have 1001 -samples overall. Next, we have to save the \lstinline!*.xml! file under +samples overall. Next, we have to save the \texttt{*.xml} file under \emph{File \textgreater{}\textgreater{} Save as}. -\subsubsection{Run the Analysis using BEAST2}\label{run-the-analysis-using-beast2} + +\subsection{Running the Analysis using BEAST2}\label{run-the-analysis-using-beast2} % -Run the \lstinline!*.xml! using BEAST2 or use finished runs from the -\emph{precooked-runs} folder. The analysis should take about 6 to 7 +Run the \texttt{*.xml} file using BEAST2. The analysis should take about 6 to 7 minutes. % -\subsubsection{Post analysis} +\subsection{Analysing the results} + +For this section either use the output files from your own analysis or use finished runs from the +\emph{precooked-runs} folder. \textcolor{red}{\textbf{ Precooked runs don't exist!}} + \begin{itemize} -\item Inferred clock models for continuous data, discrete data and molecular sequences (\autoref{fig:bm_res6} and \autoref{fig:bm_res7}) +\item Examine the posterior estimates for the inferred clock models for continuous data, discrete data and molecular sequences in Tracer (\autoref{fig:bm_res6} and \autoref{fig:bm_res7}) \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/results/EST_clock_rate.png} @@ -483,24 +518,28 @@ \subsubsection{Post analysis} \label{fig:bm_res7} \end{figure} -\item Make the summary tree using TreeAnnotator (\autoref{fig:bm_res8}) +\item Construct the summary tree using TreeAnnotator (\autoref{fig:bm_res8}) \begin{figure}[!htbp] \centering \includegraphics[width=0.700000\textwidth]{figures/results/EST_tree_height.png} - \caption{Summarised MCC trees estimated by combined data sets.} + \caption{Summarised MCC trees estimated from the combined data sets.} \label{fig:bm_res8} \end{figure} \end{itemize} +% Comment out incomplete section? \section{Errors that can occur (Work in progress)} One of the errors message that can occur regularly is the following: \emph{Infinity likelihood} +% Not necessarily an error, could just be a very unlikely state, or more likely numerical underflow \emph{Negative branch length} +% If this is in the MCC tree, this is not an error, just an unfortunate side effect of summarising a set of posterior trees into one tree. \emph{Unequal likelihoods} +% This seems serious and seems like a problem with the implementation of the BEAST2 package %%%%%%%%%%%%%%%%%%%%%%% diff --git a/master-refs.bib b/master-refs.bib index 1931a88..a443ffc 100755 --- a/master-refs.bib +++ b/master-refs.bib @@ -123,16 +123,17 @@ @article{bouckaert2019beast @article{mitov20, author = {Venelin Mitov and Krzysztof Bartoszek and Georgios Asimomitis and Tanja Stadler}, - journal = {{Theor. Popul. Biol.}}, + journal = {Theoretical Population Biology}, pages = {66--78}, publisher = {Elsevier}, title = {Fast likelihood calculation for multivariate {G}aussian phylogenetic models with shifts}, volume = {131}, - year = {2020}} + year = {2020} +} @article{ronquist12, author = {Ronquist, Fredrik and Klopfstein, Seraina and Vilhelmsen, Lars and Schulmeister, Susanne and Murray, Debra L and Rasnitsyn, Alexandr P}, - journal = {{S}yst. {B}iol.}, + journal = {Systematic Biology}, number = {6}, pages = {973--999}, publisher = {Oxford University Press}, @@ -155,7 +156,7 @@ @article{zhang24 @article{alvarez19, author = {Sandra {\'{A}lvarez--Carretero} and Anjali Goswami and Ziheng Yang and Mario {dos Reis}}, - journal = {{Syst. Biol.}}, + journal= {Systematic Biology}, number = {6}, pages = {967--986}, publisher = {Oxford University Press}, @@ -166,7 +167,7 @@ @article{alvarez19 @article{barrett21, author = {Barrett, Paul Z and Hopkins, Samantha SB and Price, Samantha A}, - journal = {{J}. {V}ertebr. {P}aleontol.}, + journal = {J. Vertebr. Paleontol.}, number = {1}, pages = {e1923523}, publisher = {Taylor \& Francis}, @@ -178,7 +179,7 @@ @article{barrett21 @article{douglas2021, title={Adaptive dating and fast proposals: {R}evisiting the phylogenetic relaxed clock model}, author={Douglas, Jordan and Zhang, Rong and Bouckaert, Remco}, - journal={{PL}o{S} computational biology}, + journal={PLoS computational biology}, volume={17}, number={2}, pages={e1008322}, @@ -189,7 +190,7 @@ @article{douglas2021 @article{lewis2001, title={A likelihood approach to estimating phylogeny from discrete morphological character data}, author={Lewis, Paul O}, - journal={Systematic biology}, + journal={Systematic Biology}, volume={50}, number={6}, pages={913--925}, @@ -209,5 +210,13 @@ @article{gavryushkina2014 publisher={Public Library of Science San Francisco, USA} } - +@ARTICLE{heath2014, + author = {Heath, Tracy A and Huelsenbeck, John P and Stadler, Tanja}, + title = {The fossilized birth-death process for coherent calibration of divergence-time + estimates}, + journal = PNAS, + year = {2014}, + volume = {111}, + pages = {E2957--E2966} +} diff --git a/xmls/BM.xml b/xml/BM.xml similarity index 100% rename from xmls/BM.xml rename to xml/BM.xml diff --git a/xmls/ShrinkageBM.xml b/xml/ShrinkageBM.xml similarity index 100% rename from xmls/ShrinkageBM.xml rename to xml/ShrinkageBM.xml