Bayesian inference in phylogeny

Bayesian inference of phylogeny uses a likelihood function to create a quantity called the posterior probability of trees using a model of evolution, based on some prior probabilities, producing the most likely phylogenetic tree for the given data. The Bayesian approach has become popular due to advances in computing speeds and the integration of Markov chain Monte Carlo algorithms. Bayesian inference has a number of applications in molecular phylogenetics and systematics.

Bayesian inference of phylogeny background and bases

Bayesian inference refers to a probabilistic method developed by Reverend Thomas Bayes based on Bayes' theorem. Published posthumously in 1763 it was the first expression of inverse probability and the basis of Bayesian inference. Independently, unaware of Bayes work, Pierre-Simon Laplace developed Bayes' theorem in 1774.
Bayesian inference was widely used until 1900s when there was a shift to frequentist inference, mainly due to computational limitations. Based on Bayes' theorem, the bayesian approach combines the prior probability of a tree P with the likelihood of the data to produce a posterior probability distribution on trees P. The posterior probability of a tree will indicate the probability of the tree to be correct, being the tree with the highest posterior probability the one chosen to represent best a phylogeny. It was the introduction of Markov Chain Monte Carlo methods by Nicolas Metropolis in 1953 that revolutionized Bayesian Inference and by the 1990s became a widely used method amongst phylogeneticists. Some of the advantages over traditional maximum parsimony and maximum likelihood methods are the possibility of account for the phylogenetic uncertainty, use of prior information and incorporation of complex models of evolution that limited computational analyses for traditional methods. Although overcoming complex analytical operations the posterior probability still involves a summation over all trees and, for each tree, integration over all possible combinations of substitution model parameter values and branch lengths.
MCMC methods can be described in three steps: first using a stochastic mechanism a new state for the Markov chain is proposed. Secondly, the probability of this new state to be correct is calculated. Thirdly, a new random variable is proposed. If this new value is less than the acceptance probability the new state is accepted and the state of the chain is updated. This process is run for either thousands or millions of times. The amount of time a single tree is visited during the course of the chain is just a valid approximation of its posterior probability. Some of the most common algorithms used in MCMC methods include the Metropolis-Hastings algorithms, the Metropolis-Coupling MCMC and the LOCAL algorithm of Larget and Simon.

Metropolis-Hastings algorithm

One of the most common MCMC methods used is the Metropolis-Hastings algorithm, a modified version of the original Metropolis algorithm. It is a widely used method to sample randomly from complicated and multi-dimensional distribution probabilities. The Metropolis algorithm is described in the following steps:

An initial tree, T_i, is randomly selected
A neighbour tree, T_j, is selected from the collection of trees.
The ratio, R, of the probabilities of T_j and T_i is computed as follows: R = f/f
If R ≥ 1, T_j is accepted as the current tree
If R < 1, T_j is accepted as the current tree with probability R, otherwise T_i is kept
At this point the process is repeated from Step 2 N times.

The algorithm keeps running until it reaches an equilibrium distribution. It also assumes that the probability of proposing a new tree T_j when we are at the old tree state T_i, is the same probability of proposing T_i when we are at T_j. When this is not the case Hastings corrections are applied.
The aim of Metropolis-Hastings algorithm is to produce a collection of states with a determined distribution until the Markov process reaches a stationary distribution. The algorithm has two components:

A potential transition from one state to another using a transition probability function q_i,j
Movement of the chain to state j with probability α_i,j and remains in i with probability 1 – α_i,j.
Metropolis-coupled MCMC

Metropolis-coupled MCMC algorithm has been proposed to solve a practical concern of the Markov chain moving across peaks when the target distribution has multiple local peaks, separated by low valleys, are known to exist in the tree space. This is the case during heuristic tree search under maximum parsimony, maximum likelihood, and minimum evolution criteria, and the same can be expected for stochastic tree search using MCMC. This problem will result in samples not approximating correctly to the posterior density. The improves the mixing of Markov chains in presence of multiple local peaks in the posterior density. It runs multiple chains in parallel, each for n iterations and with different stationary distributions,, where the first one, is the target density, while, are chosen to improve mixing. For example, one can choose incremental heating of the form:
so that the first chain is the cold chain with the correct target density, while chains are heated chains. Note that raising the density to the power with has the effect of flattening out the distribution, similar to heating a metal. In such a distribution, it is easier to traverse between peaks than in the original distribution. After each iteration, a swap of states between two randomly chosen chains is proposed through a Metropolis-type step. Let be the current state in chain,. A swap between the states of chains and is accepted with probability:
At the end of the run, output from only the cold chain is used, while those from the hot chains are discarded. Heuristically, the hot chains will visit the local peaks rather easily, and swapping states between chains will let the cold chain occasionally jump valleys, leading to better mixing. However, if is unstable, proposed swaps will seldom be accepted. This is the reason for using several chains which differ only incrementally.
An obvious disadvantage of the algorithm is that chains are run and only one chain is used for inference. For this reason, is ideally suited for implementation on parallel machines, since each chain will in general require the same amount of computation per iteration.

LOCAL algorithm of Larget and Simon

The LOCAL algorithms offers a computational advantage over previous methods and demonstrates that a Bayesian approach is able to assess uncertainty computationally practical in larger trees. The LOCAL algorithm is an improvement of the GLOBAL algorithm presented in Mau, Newton and Larget in which all branch lengths are changed in every cycle. The LOCAL algorithms modifies the tree by selecting an internal branch of the tree at random. The nodes at the ends of this branch are each connected to two other branches. One of each pair is chosen at random. Imagine taking these three selected edges and stringing them like a clothesline from left to right, where the direction is also selected at random. The two endpoints of the first branch selected will have a sub-tree hanging like a piece of clothing strung to the line. The algorithm proceeds by multiplying the three selected branches by a common random amount, akin to stretching or shrinking the clothesline. Finally the leftmost of the two hanging sub-trees is disconnected and reattached to the clothesline at a location selected uniformly at random. This would be the candidate tree.
Suppose we began by selecting the internal branch with length that separates taxa and from the rest. Suppose also that we have selected branches with lengths and from each side, and that we oriented these branches. Let, be the current length of the clothesline. We select the new length to be, where is a uniform random variable on. Then for the LOCAL algorithm, the acceptance probability can be computed to be:

Assessing convergence

To estimate a branch length of a 2-taxon tree under JC, in which sites are unvaried and are variable, assume exponential prior distribution with rate. The density is. The probabilities of the possible site patterns are:
for unvaried sites, and
Thus the unnormalized posterior distribution is:
or, alternately,
Update branch length by choosing new value uniformly at random from a window of half-width centered at the current value:
where is uniformly distributed between and. The acceptance
probability is:
Example: ,. We will compare results for two values of, and. In each case, we will begin with an initial length of and update the length times.

Maximum parsimony and maximum likelihood

There are many approaches to reconstructing phylogenetic trees, each with advantages and disadvantages, and there is no straightforward answer to “what is the best method?”. Maximum parsimony and maximum likelihood are traditional methods widely used for the estimation of phylogenies and both use character information directly, as Bayesian methods do.
Maximum Parsimony recovers one or more optimal trees based on a matrix of discrete characters for a certain group of taxa and it does not require a model of evolutionary change. MP gives the most simple explanation for a given set of data, reconstructing a phylogenetic tree that includes as few changes across the sequences as possible, this is the one that exhibits the smallest number of evolutionary steps to explain the relationship between taxa. The support of the tree branches is represented by bootstrap percentage. For the same reason that it has been widely used, its simplicity, MP has also received criticism and has been pushed into the background by ML and Bayesian methods. MP presents several problems and limitations. As shown by Felsenstein, MP might be statistically inconsistent, meaning that as more and more data is accumulated, results can converge on an incorrect tree and lead to long branch attraction, a phylogenetic phenomenon where taxa with long branches tend to appear more closely related in the phylogeny than they really are. For morphological data, recent simulation studies suggest that parsimony may be less accurate than trees built using Bayesian approaches, potentially due to overprecision, although this has been disputed. Studies using novel simulation methods have demonstrated that differences between inference methods result from the search strategy and consensus method employed, rather than the optimization used.
As in maximum parsimony, maximum likelihood will evaluate alternative trees. However it considers the probability of each tree explaining the given data based on a model of evolution. In this case, the tree with the highest probability of explaining the data is chosen over the other ones. In other words, it compares how different trees predict the observed data. The introduction of a model of evolution in ML analyses presents an advantage over MP as the probability of nucleotide substitutions and rates of these substitutions are taken into account, explaining the phylogenetic relationships of taxa in a more realistic way. An important consideration of this method is the branch length, which parsimony ignores, with changes being more likely to happen along long branches than short ones. This approach might eliminate long branch attraction and explain the greater consistency of ML over MP. Although considered by many to be the best approach to inferring phylogenies from a theoretical point of view, ML is computationally intensive and it is almost impossible to explore all trees as there are too many. Bayesian inference also incorporates a model of evolution and the main advantages over MP and ML are that it is computationally more efficient than traditional methods, it quantifies and addresses the source of uncertainty and is able to incorporate complex models of evolution.

Pitfalls and controversies

Bootstrap values vs Posterior Probabilities. It has been observed that bootstrap support values, calculated under parsimony or maximum likelihood, tend to be lower than the posterior probabilities obtained by Bayesian inference. This fact leads to a number of questions such as: Do posterior probabilities lead to overconfidence in the results? Are bootstrap values more robust than posterior probabilities?
Controversy of using prior probabilities. Using prior probabilities for Bayesian analysis has been seen by many as an advantage as it will provide a hypothesis a more realistic view of the real world. However some biologists argue about the subjectivity of Bayesian posterior probabilities after the incorporation of these priors.
Model choice. The results of the Bayesian analysis of a phylogeny are directly correlated to the model of evolution chosen so it is important to choose a model that fits the observed data, otherwise inferences in the phylogeny will be erroneous. Many scientists have raised questions about the interpretation of Bayesian inference when the model is unknown or incorrect. For example, an oversimplified model might give higher posterior probabilities.
MRBAYES software

MrBayes is a free software tool that performs Bayesian inference of phylogeny. Originally written by John P. Huelsenbeck and Frederik Ronquist in 2001. As Bayesian methods increased in popularity MrBayes became one of the software of choice for many molecular phylogeneticists. It is offered for Macintosh, Windows, and UNIX operating systems and it has a command-line interface. The program uses the standard MCMC algorithm as well as the Metropolis coupled MCMC variant. MrBayes reads aligned matrices of sequences in the standard NEXUS format.
MrBayes uses MCMC to approximate the posterior probabilities of trees. The user can change assumptions of the substitution model, priors and the details of the MC³ analysis. It also allows the user to remove and add taxa and characters to the analysis. The program uses the most standard model of DNA substitution, the 4x4 also called JC69, which assumes that changes across nucleotides occurs with equal probability. It also implements a number of 20x20 models of amino acid substitution, and codon models of DNA substitution. It offers different methods for relaxing the assumption of equal substitutions rates across nucleotide sites. MrBayes is also able to infer ancestral states accommodating uncertainty to the phylogenetic tree and model parameters.
MrBayes 3 was a completely reorganized and restructured version of the original MrBayes. The main novelty was the ability of the software to accommodate heterogeneity of data sets. This new framework allows the user to mix models and take advantages of the efficiency of Bayesian MCMC analysis when dealing with different type of data. It uses the Metropolis-Coupling MCMC by default.
MrBayes 3.2 new version of MrBayes was released in 2012 The new version allows the users to run multiple analyses in parallel. It also provides faster likelihood calculations and allow these calculations to be delegated to graphics processing unites. Version 3.2 provides wider outputs options compatible with FigTree and other tree viewers.

List of phylogenetics software

This table includes some of the most common phylogenetic software used for inferring phylogenies under a Bayesian framework. Some of them do not use exclusively Bayesian methods.

Name	Description	Method	Author	Website link
Armadillo Workflow Platform	Workflow platform dedicated to phylogenetic and general bioinformatic analysis	Inference of phylogenetic trees using Distance, Maximum Likelihood, Maximum Parsimony, Bayesian methods and related workflows	E. Lord, M. Leclercq, A. Boc, A.B. Diallo and V. Makarenkov	https://web.archive.org/web/20161024081942/http://www.bioinfo.uqam.ca/armadillo/.
Bali-Phy	Simultaneous Bayesian inference of alignment and phylogeny	Bayesian inference, alignment as well as tree search	Suchard MA, Redelings BD	http://www.bali-phy.org
BATWING	Bayesian Analysis of Trees With Internal Node Generation	Bayesian inference, demographic history, population splits	I. J. Wilson, D. Weale, D.Balding	http://www.maths.abdn.ac.uk/˜ijw
Bayes Phylogenies	Bayesian inference of trees using Markov Chain Monte Carlo methods	Bayesian inference, multiple models, mixture model	M. Pagel, A. Meade	http://www.evolution.rdg.ac.uk/BayesPhy.html
PhyloBayes / PhyloBayes MPI	Bayesian Monte Carlo Markov Chain sampler for phylogenetic reconstruction.	Non-parametric methods for modeling among-site variation in nucleotide or amino-acid propensities.	N. Lartillot, N. Rodrigue, D. Stubbs, J. Richer	http://www.atgc-montpellier.fr/phylobayes/
BEAST	Bayesian Evolutionary Analysis Sampling Trees	Bayesian inference, relaxed molecular clock, demographic history	A. J. Drummond, A. Rambaut & M. A. Suchard	https://beast.community
BEAST 2	A software platform for Bayesian evolutionary analysis	Bayesian inference, , multiple models	R Bouckaert, J Heled, D Kühnert, T Vaughan, CH Wu, D Xie, MA Suchard, A Rambaut, AJ Drummond.	http://www.beast2.org
BUCKy	Bayesian concordance of gene trees	Bayesian concordance using modified greedy consensus of unrooted quartets	C. Ané, B. Larget, D.A. Baum, S.D. Smith, A. Rokas and B. Larget, S.K. Kotha, C.N. Dewey, C. Ané	http://www.stat.wisc.edu/~ane/bucky/
Geneious	Geneious provides genome and proteome research tools	Neighbor-joining, UPGMA, MrBayes plugin, PHYML plugin, RAxML plugin, FastTree plugin, GARLi plugin, PAUP* Plugin	A. J. Drummond,M.Suchard,V.Lefort et al.	http://www.geneious.com
MrBayes	Phylogenetic inference	A program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models.	Zangh, Huelsenbeck, Der Mark, Ronquist & Teslenko	https://nbisweden.github.io/MrBayes/
TOPALi	Phylogenetic inference	Phylogenetic model selection, Bayesian analysis and Maximum Likelihood phylogenetic tree estimation, detection of sites under positive selection, and recombination breakpoint location analysis	I.Milne, D.Lindner, et al.	http://www.topali.org

Applications

Bayesian Inference has extensively been used by molecular phylogeneticists for a wide number of applications. Some of these include:

Inference of phylogenies.
Inference and evaluation of uncertainty of phylogenies.
Inference of ancestral character state evolution.
Inference of ancestral areas.
Molecular dating analysis.
Model dynamics of species diversification and extinction
Elucidate patterns in pathogens dispersal.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...