Bayesian Phylogenetics
Bret Larget
The Reverand Thomas Bayes was born in London in 1702.
He was the son of one of the first Noncomformist ministers to be
ordained in England.
Bayes’ Theorem explains how to calculate inverse probabilities. For
example, suppose that Box B1 contains four balls, three of
which are black and one of which is white.
Box B2 has four balls, two of which are black and two of
which are white.
If a ball is chosen uniformly at random from Box
B1, there is a 3/4 chance that it is black.
But if a black ball is drawn, how likely is it that it came from Box
B1?
|
Mathematical Background
|
Bayes’ Theorem
|
|
Things are further complicated in that additional parameters such as
branch lengths and likelihood model parameters affect the likelihood,
but are also unknown.
A posterior distribution is a probability distribution on
parameters after data is observed.
Nuisance |
Optimize them
|
Average over them
|
Model |
Likelihood |
|
|
Mathematical Background |
Likelihood |
8 / 27 |
Then we are interested in computing
= |
P(data | tree)P(tree) P(data) |
Bayesian Phylogenetic Methods
Metropolis-Hastings Example
Example
What is Markov Chain Monte Carlo?
Metropolis-Hastings is a form of MCMC that works using any Markov
chain to propose the next item to sample, but rejecting proposals with
specified probability.
2 |
If rejected, set xi+1 = xi.
|
We have a function h(θ) from which we want to
sample.
We only need to know h up to a normalizing constant.
We begin the Markov chain at a single point.
We evaluate the value of h at this point.
Current state θ; Proposed state θ∗
This proposal is accepted.
|
|
Computation
|
Example |
19 / 27 |
Second Proposal
Accept with probability 0.153
Third Proposal
The proposal was rejected, so proposed state is sampled
again and remains current.
θ θ*
Sample So Far
Repeat this for 10,000 proposals and show the sample.
Large Sample

distribution at all: almost any type of proposal method
would have
worked.
▶ The sample mean converges to the mean of the target.
▶ The sample median converges to the median of the target.
The model parameters for a Bayesian phylogenetics analysis typically
includes:
▶ a tree (topology and branch lengths);
▶ substitution process parameters.
Cautions