Lecture slides on Estimating Species Divergence Times in RevBayes (https://github.com/revbayes/revbayes).
By Tracy Heath and Tanja Stadler
** this version was taught at the 2014 NESCent Academy Course: Phylogenetic analysis using RevBayes, 8/27/2014 (https://www.nescent.org/sites/academy/Phylogenetic_analysis_using_RevBayes) **
OECD bibliometric indicators: Selected highlights, April 2024
Divergence-time estimation in RevBayes
1. ESTIMATING SPECIES DIVERGENCE TIMES
Tracy A. Heath
University of Kansas
University of California, Berkeley
Iowa State University (in 2015)
Tanja Stadler
ETH Zürich
2014 Phylogenetic Analysis in RevBayes
NESCent Academy
2. A TIME-SCALE FOR EVOLUTION
Phylogenetic trees can provide both topological information
and temporal information
Primates
Artiodactyla
Carnivora
0.2 expected
substitutions/site
Cetacea
Simiiformes
Microcebus
Hippo
Homininae
Phylogenetic Relationships Sequence
Fossil
Data
Calibrations
Did Simiiformes experience
accelerated rates of molecular
evolution?
What is the age of the MRCA of
mouse lemurs (Microcebus)?
Equus
Rhinoceros
Bos
Hippopotamus
Balaenoptera
Physeter
Ursus
Canis
Felis
Homo
Pan
Gorilla
Pongo
Macaca
Callithrix
Loris
Galago
Daubentonia
Varecia
Eulemur
Lemur
Hapalemur
Propithecus
Lepilemur
Cheirogaleus
Mirza
M. murinus
M. griseorufus
M. myoxinus
M. berthae
M. rufus1
M. tavaratra
M. rufus2
M. sambiranensis
M. ravelobensis
100 80.0 60.0 40.0 20.0 0.0
Simiiformes
Microcebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003)
3. A TIME-SCALE FOR EVOLUTION
Phylogenetic divergence-time
estimation
• What was the spacial and
climatic environment of ancient
angiosperms?
• Did the uplift of the Patagonian
Andes drive the diversity of
Peruvian lilies?
• How has mammalian body-size
changed over time?
• Is diversification in Caribbean
anoles correlated with ecological
opportunity?
• How has the rate of molecular
evolution changed across the
Tree of Life?
Historical biogeography Molecular evolution
(Antonelli & Sanmartin. Syst. Biol. 2011)
(Nabholz, Glemin, Galtier. MBE 2008)
Trait evolution
Diversication
Anolis fowleri (image by L. Mahler)
(Lartillot & Delsuc. Evolution 2012) (Mahler, Revell, Glor, & Losos. Evolution 2010)
Understanding Evolutionary Processes
4. THE DYNAMICS OF INFECTIOUS DISEASES
Divergence time estimation
of rapidly evolving
pathogens provide
information about spatial
and temporal dynamics of
infectious diseases
Sequences sampled at
different time horizons
impose a temporal structure
on the tree by providing
ages for non-contemporaneous
tips
(Pybus & Rambaut. 2009. Nature Reviews Genetics.)
5. DIVERGENCE TIME ESTIMATION
Goal: Estimate the ages of interior nodes to understand the
timing and rates of evolutionary processes
Model how rates are
distributed across the tree
Describe the distribution of
speciation events over time
External calibration
information for estimates of
absolute node times
calibrated node
Equus
Rhinoceros
Bos
Hippopotamus
Balaenoptera
Physeter
Ursus
Canis
Felis
Homo
Pan
Gorilla
Pongo
Macaca
Callithrix
Loris
Galago
Daubentonia
Varecia
Eulemur
Lemur
Hapalemur
Propithecus
Lepilemur
Cheirogaleus
Mirza
M. murinus
M. griseorufus
M. myoxinus
M. berthae
M. rufus1
M. tavaratra
M. rufus2
M. sambiranensis
M. ravelobensis
100 80.0 60.0 40.0 20.0 0.0
Simiiformes
Microcebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
6. A TIME-SCALE FOR EVOLUTION
Phylogenetic trees can provide both topological information
and temporal information
Equus
Rhinoceros
Bos
Hippopotamus
Balaenoptera
Physeter
Ursus
Canis
Felis
Homo
Pan
Gorilla
Pongo
Macaca
Callithrix
Loris
Galago
Daubentonia
Varecia
Eulemur
Lemur
Hapalemur
Propithecus
Lepilemur
Cheirogaleus
Mirza
M. murinus
M. griseorufus
M. myoxinus
M. berthae
M. rufus1
M. tavaratra
M. rufus2
M. sambiranensis
M. ravelobensis
100 80.0 60.0 40.0 20.0 0.0
Simiiformes
Microcebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)
7. THE GLOBAL MOLECULAR CLOCK
Assume that the rate of
evolutionary change is
constant over time
A B C
(branch lengths equal
percent sequence
divergence) 10%
400 My
200 My
20%
10%
10%
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
8. THE GLOBAL MOLECULAR CLOCK
A B C
We can date the tree if we
10%
know the rate of change is
1% divergence per 10 My 20%
N
10%
10%
200 My
400 My
200 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
9. THE GLOBAL MOLECULAR CLOCK
If we found a fossil of the
MRCA of B and C, we can
use it to calculate the rate
of change & date the root
of the tree
A B C
N
20%
10%
10%
10%
200 My
400 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
10. REJECTING THE GLOBAL MOLECULAR CLOCK
Rates of evolution vary across lineages and over time
Mutation rate:
Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:
Variation in
• strength and targets of
selection
• population sizes
A B C
10%
400 My
200 My
20%
10%
10%
11. UNCONSTRAINED ANALYSIS
Sequence data provide
information about branch
lengths
In units of the expected # of
substitutions per site
branch length = rate × time
0.2 expected
substitutions/site
Phylogenetic Relationships Sequence
Data
12. PHYLOGENETIC LIKELIHOOD
f (D | V, θs, )
V Vector of branch lengths
θs Sequence model parameters
D Sequence data
Tree topology
13. RATE AND TIME
The expected # of substitutions/site occurring along a
branch is the product of the substitution rate and time
length = rate × time length = rate length = time
Methods for dating species divergences estimate the
substitution rate and time separately
14. RATES AND TIMES
The sequence data
provide information
about branch length
for any possible rate,
there’s a time that fits
the branch length
perfectly
5
4
3
2
1
0
branch length = 0.5
time = 0.8
rate = 0.625
0 1 2 3 4 5
Branch Rate
Branch Time
(based on Thorne & Kishino, 2005)
15. BAYESIAN DIVERGENCE TIME ESTIMATION
length = rate length = time
R = (r1, r2, r3, . . . , r2N−2)
A = (a1, a2, a3, . . . , aN−1)
N = number of tips
16. BAYESIAN DIVERGENCE TIME ESTIMATION
length = rate length = time
R = (r1, r2, r3, . . . , r2N−2)
A = (a1, a2, a3, . . . , aN−1)
N = number of tips
17. BAYESIAN DIVERGENCE TIME ESTIMATION
Posterior probability
f (R,A, θR, θA, θs | D, )
R Vector of rates on branches
A Vector of internal node ages
θR, θA, θs Model parameters
D Sequence data
Tree topology
18. BAYESIAN DIVERGENCE TIME ESTIMATION
f(R,A, θR, θA, θs | D) =
f (D | R,A, θs) f(R | θR) f(A | θA) f(θs)
f(D)
f(D | R,A, θR, θA, θs) Likelihood
f(R | θR) Prior on rates
f(A | θA) Prior on node ages
f(θs) Prior on substitution parameters
f(D) Marginal probability of the data
19. BAYESIAN DIVERGENCE TIME ESTIMATION
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA, C)
20. MODELING RATE VARIATION
Some models describing lineage-specific substitution rate
variation:
• Global molecular clock (Zuckerkandl & Pauling, 1962)
• Local molecular clocks (Hasegawa, Kishino & Yano 1989;
Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder
2003, Drummond and Suchard 2010)
• Punctuated rate change model (Huelsenbeck, Larget and
Swofford 2000)
• Log-normally distributed autocorrelated rates (Thorne,
Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne &
Kishino 2002)
• Uncorrelated/independent rates models (Drummond et al.
2006; Rannala & Yang 2007; Lepage et al. 2007)
• Mixture models on branch rates (Heath, Holder, Huelsenbeck
2012)
Models of Lineage-specific Rate Variation
21. GLOBAL MOLECULAR CLOCK
The substitution rate is
constant over time
All lineages share the same
rate
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
22. GLOBAL MOLECULAR CLOCK
rate prior distribution
rate
density
r
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
23. GLOBAL MOLECULAR CLOCK
The sampled rate is applied
to every branch in the tree
rate prior distribution
rate
density
r
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
24. REJECTING THE GLOBAL MOLECULAR CLOCK
Rates of evolution vary across lineages and over time
Mutation rate:
Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:
Variation in
• strength and targets of
selection
• population sizes
A B C
10%
400 My
200 My
20%
10%
10%
25. RELAXED-CLOCK MODELS
To accommodate variation in substitution rates
‘relaxed-clock’ models estimate lineage-specific substitution
rates
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Mixture models on branch rates
26. LOCAL MOLECULAR CLOCKS
Rate shifts occur
infrequently over the tree
Closely related lineages
have equivalent rates
(clustered by sub-clades)
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
27. LOCAL MOLECULAR CLOCKS
Most methods for
estimating local clocks
required specifying the
number and locations of
rate changes a priori
Drummond and Suchard
(2010) introduced a
Bayesian method that
samples over a broad range
of possible random local
clocks
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
28. AUTOCORRELATED RATES
Substitution rates evolve
gradually over time –
closely related lineages have
similar rates
The rate at a node is
drawn from a lognormal
distribution with a mean
equal to the parent rate
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
29. AUTOCORRELATED RATES
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
30. PUNCTUATED RATE CHANGE
Rate changes occur along
lineages according to a
point process
At rate-change events, the
new rate is a product of
the parent’s rate and a
-distributed multiplier
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)
31. INDEPENDENT/UNCORRELATED RATES
Lineage-specific rates are
uncorrelated when the rate
assigned to each branch is
independently drawn from
an underlying distribution
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Drummond et al. 2006)
33. INFINITE MIXTURE MODEL
Dirichlet process prior:
Branches are partitioned
into distinct rate categories
branch length = substitution rate
c
5
c
4
c
3
c
2
c
1
substitution rate classes
Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)
34. THE DIRICHLET PROCESS PRIOR (DPP)
A stochastic process that models data as a mixture of
distributions and can identify latent classes present in the
data
Branches are assumed to
form distinct substitution
rate clusters
Efficient Markov chain
Monte Carlo (MCMC)
implementations allow for
inference under this model
branch length = substitution rate
c
5
c
4
c
3
c
2
c
1
substitution rate classes
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
35. THE DIRICHLET PROCESS PRIOR (DPP)
A stochastic process that models data as a mixture of
distributions and can identify latent classes present in the
data
Random variables under the
DPP informed by the data:
• the number of rate
classes
• the assignment of
branches to classes
• the rate value for each
class
branch length = substitution rate
c
5
c
4
c
3
c
2
c
1
substitution rate classes
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
36. THE DIRICHLET PROCESS PRIOR (DPP)
A stochastic process that models data as a mixture of
distributions and can identify latent classes present in the
data
Random variables under the
DPP informed by the data:
• the number of rate
classes
• the assignment of
branches to classes
• the rate value for each
class
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
37. DPP FOR CLUSTERING PROBLEMS
Global molecular
clock
branch length = substitution rate
18
c
1
rate classes
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
38. DPP FOR CLUSTERING PROBLEMS
Independent
rates
branch length = substitution rate
1
c
1
1
c
18
1
c
3
1
c
2
rate classes
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
39. DPP FOR CLUSTERING PROBLEMS
Local molecular
clock
branch length = substitution rate
5
c
3
5
c
2
8
c
1
rate classes
(262,142 different local molecular clocks)
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
40. DPP FOR CLUSTERING PROBLEMS
Global molecular
clock
branch length = substitution rate
4
c
3
6
c
2
8
c
1
rate classes
Each of the
682,076,806,159
configurations
has a prior weight
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
41. MODELING RATE VARIATION
These are only a subset of the available models for
branch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Models of Lineage-specific Rate Variation
42. MODELING RATE VARIATION
Are our models appropriate across all data sets?
sloth bear
brown bear
polar bear
cave bear
Asian
black bear
American
black bear
sun bear
American giant
short-faced bear
spectacled
bear
giant panda
harbor seal
100•100•100•1.00
4.08
5.39
t9
99•97•94•1.00
5.66
12.86
2.75
5.05
19.09
MP•MLu•MLp•Bayesian
35.7
0.88
[3.11–5.27]
4.58
[4.26–7.34]
[9.77–16.58]
[3.9–6.48]
[0.66–1.17]
[4.2–6.86]
[2.1–3.57]
[14.38–24.79]
[3.51–5.89]
mean age (Ma)
14.32
[9.77–16.58]
95% CI
t2
t3
t4
t6
t7
t5
t8
t10
tx
node
100•100•100•1.00
100•100•100•1.00
85•93•93•1.00
76•94•97•1.00
100•100•100•1.00
100•100•100•1.00
t1
Eocene Oligocene Miocene Plio Plei Hol
34 23.8 5.3 1.8 0.01
Epochs
Ma
Global expansion of C4 biomass
Major temperature drop and increasing seasonality
Faunal turnover
Krause et al., 2008. Mitochondrial genomes reveal an
explosive radiation of extinct and extant bears near the
Miocene-Pliocene boundary. BMC Evol. Biol. 8.
Taxa
1
5
10
50
100
500
1000
5000
10000
20000
300 200 100 0
MYA
Polypteriformes
Chondrostei
Holostei
Elopomorpha
Osteoglossomorpha
Clupeomorpha
Denticipidae
Gonorynchiformes
Cypriniformes
Gymnotiformes
Siluriformes
Characiformes
Esociformes
Salmoniformes
Galaxiiformes
Osmeriformes
Stomiiformes
Argentiniformes
Myctophiformes
Aulopiformes
Percopsif. + Gadiif.
Polymixiiformes
Zeiforms
Lampriformes
Beryciformes
Percomorpha
Ophidiiformes
Clade r ε ΔAIC
1. 0.041 0.0017 25.3
2. 0.081 * 25.5
3. 0.067 0.37 45.1
4. 0 * 3.1
Bg. 0.011 0.0011
Ostariophysi Acanthomorpha
Teleostei
Santini et al., 2009. Did genome duplication drive the origin
of teleosts? A comparative study of diversification in
ray-finned fishes. BMC Evol. Biol. 9.
43. MODELING RATE VARIATION
These are only a subset of the available models for
branch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Model selection and model uncertainty are very important
for Bayesian divergence time analysis
Models of Lineage-specific Rate Variation
44. BAYESIAN DIVERGENCE TIME ESTIMATION
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA, C)
http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html
45. PRIORS ON NODE TIMES
Sequence data are only informative on relative rates times
Node-time priors cannot give precise estimates of absolute
node ages
We need external information (like fossils) to calibrate or
scale the tree to absolute time
Node Age Priors
46. CALIBRATING DIVERGENCE TIMES
Fossils (or other data) are necessary to estimate absolute
node ages
There is no information in
the sequence data for
absolute time
Uncertainty in the
placement of fossils
A B C
N
20%
10%
10%
10%
200 My
400 My
47. CALIBRATION DENSITIES
Bayesian inference is well suited to accommodating
uncertainty in the age of the calibration node
Divergence times are
calibrated by placing
parametric densities on
internal nodes offset by age
estimates from the fossil
record
A B C
N
200 My
Density
Age
48. FOSSIL CALIBRATION
Fossil and geological data
can be used to estimate the
absolute ages of ancient
divergences
Time (My)
Calibrating Divergence Times
49. FOSSIL CALIBRATION
The ages of extant taxa
are known
Time (My)
Calibrating Divergence Times
50. FOSSIL CALIBRATION
Fossil taxa are assigned to
monophyletic clades
Minimum age Time (My)
Calibrating Divergence Times Notogoneus osculus (Grande Grande J. Paleont. 2008)
51. FOSSIL CALIBRATION
Fossil taxa are assigned to
monophyletic clades and
constrain the age of the
MRCA
Minimum age Time (My)
Calibrating Divergence Times
52. ASSIGNING FOSSILS TO CLADES
Misplaced fossils can affect node age estimates throughout
the tree – if the fossil is older than its presumed MRCA
Calibrating the Tree (figure from Benton Donoghue Mol. Biol. Evol. 2007)
53. ASSIGNING FOSSILS TO CLADES
Crown clade: all
living species and
their most-recent
common ancestor
(MRCA)
Calibrating the Tree (figure from Benton Donoghue Mol. Biol. Evol. 2007)
54. ASSIGNING FOSSILS TO CLADES
Stem lineages:
purely fossil forms
that are closer to
their descendant
crown clade than
any other crown
clade
Calibrating the Tree (figure from Benton Donoghue Mol. Biol. Evol. 2007)
55. ASSIGNING FOSSILS TO CLADES
Fossiliferous
horizons: the
sources in the
rock record for
relevant fossils
Calibrating the Tree (figure from Benton Donoghue Mol. Biol. Evol. 2007)
56. FOSSIL CALIBRATION
Age estimates from fossils
can provide minimum time
constraints for internal
nodes
Reliable maximum bounds
are typically unavailable
Minimum age Time (My)
Calibrating Divergence Times
57. PRIOR DENSITIES ON CALIBRATED NODES
Parametric distributions are
typically off-set by the age
of the oldest fossil assigned
to a clade
These prior densities do not
(necessarily) require
specification of maximum
bounds
Uniform (min, max)
Log Normal (μ, s2)
Gamma (a, b)
Exponential (l)
Minimum age Time (My)
Calibrating Divergence Times
58. PRIOR DENSITIES ON CALIBRATED NODES
Describe the waiting time
between the divergence
event and the age of the
oldest fossil
Minimum age Time (My)
Calibrating Divergence Times
59. PRIOR DENSITIES ON CALIBRATED NODES
Overly informative priors
can bias node age
estimates to be too young
Minimum age
Exponential (l)
Time (My)
Calibrating Divergence Times
60. PRIOR DENSITIES ON CALIBRATED NODES
Uncertainty in the age of
the MRCA of the clade
relative to the age of the
fossil may be better
captured by vague prior
densities
Minimum age
Exponential (l)
Time (My)
Calibrating Divergence Times
61. PUT A PRIOR ON IT
Hyperprior: place an higher-order prior on the parameter of
a prior distribution
Sample the time from
the MRCA to the
fossil from a mixture
of different
exponential
distributions
Account for
uncertainty in values
of λ
Density
Hyperprior
Hyperparameter
Density
Prior
Parameter
63. PRIOR DENSITIES ON CALIBRATED NODES
Common practice in Bayesian divergence-time estimation:
Estimates of absolute node
ages are driven primarily by
the calibration density
Specifying appropriate
densities is a challenge for
most molecular biologists
Uniform (min, max)
Log Normal (μ, s2)
Gamma (a, b)
Exponential (l)
Minimum age Time (My)
Calibration Density Approach
64. IMPROVING FOSSIL CALIBRATION
We would prefer to
eliminate the need for
ad hoc calibration
prior densities
Calibration densities
do not account for
diversification of fossils
Domestic dog
Spotted seal
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ursavus primaevus
Giant panda
Ailurarctos lufengensis
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Spectacled bear
Giant short-faced bear
Sloth bear
Brown bear
Polar bear
Cave bear
Sun bear
Am. black bear
Asian black bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
65. IMPROVING FOSSIL CALIBRATION
We want to use all
of the available fossils
Example: Bears
12 fossils are reduced
to 4 calibration ages
with calibration density
methods
Domestic dog
Spotted seal
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ursavus primaevus
Giant panda
Ailurarctos lufengensis
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Spectacled bear
Giant short-faced bear
Sloth bear
Brown bear
Polar bear
Cave bear
Sun bear
Am. black bear
Asian black bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
66. IMPROVING FOSSIL CALIBRATION
We want to use all
of the available fossils
Example: Bears
12 fossils are reduced
to 4 calibration ages
with calibration density
methods
Domestic dog
Spotted seal
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ursavus primaevus
Giant panda
Ailurarctos lufengensis
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Spectacled bear
Giant short-faced bear
Sloth bear
Brown bear
Polar bear
Cave bear
Sun bear
Am. black bear
Asian black bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
67. IMPROVING FOSSIL CALIBRATION
Because fossils are
part of the
diversification process,
we can combine fossil
calibration with
birth-death models
Domestic dog
Spotted seal
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ursavus primaevus
Giant panda
Ailurarctos lufengensis
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Spectacled bear
Giant short-faced bear
Sloth bear
Brown bear
Polar bear
Cave bear
Sun bear
Am. black bear
Asian black bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
68. IMPROVING FOSSIL CALIBRATION
This relies on a
branching model that
accounts for
speciation, extinction,
and rates of
fossilization,
preservation, and
recovery
Domestic dog
Spotted seal
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ursavus primaevus
Giant panda
Ailurarctos lufengensis
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Spectacled bear
Giant short-faced bear
Sloth bear
Brown bear
Polar bear
Cave bear
Sun bear
Am. black bear
Asian black bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
69. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Improving statistical inference of absolute node ages
Eliminates the need to specify arbitrary
calibration densities
Better capture our statistical
uncertainty in species divergence dates
All reliable fossils associated with a
clade are used 150 100 50 0
Time
Heath, Huelsenbeck, Stadler. 2014. The fossilized birth-death process for
coherent calibration of divergence time estimates. PNAS 111(29):E2957–E2966.
http://www.pnas.org/content/111/29/E2957.abstract
70. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Recovered fossil specimens
provide historical
observations of the
diversification process that
generated the tree of
extant species
150 100 50 0
Time
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
71. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
The probability of the tree
and fossil observations
under a birth-death model
with rate parameters:
λ = speciation
μ = extinction
ψ = fossilization/recovery
150 100 50 0
Time
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
72. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
We assume that the fossil
is a descendant of a
specified calibrated node
The time of the fossil: A
indicates an observation of
the birth-death process after
the age of the node N
250 200 150 100 50 0
Time (My)
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
73. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
The fossil must attach to
the tree at some time: ⋆
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
74. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
If it is the descendant of
an unobserved lineage, then
there is a speciation event
at time ⋆ on one of the 2
branches
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
75. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
If it is the descendant of
an unobserved lineage, then
there is a speciation event
at time ⋆ on one of the 2
branches
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
76. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
If ⋆ = A, the fossil is an
observation of a lineage
ancestral to the extant
species
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
77. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
If ⋆ = A, the fossil is an
observation of a lineage
ancestral to the extant
species
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
78. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
The probability of this
realization of the
diversification process is
conditional on:
λ, μ, and ψ
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
79. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Using MCMC, we can
sample the age of the
calibrated node • while
conditioning on
λ, μ, and ψ
other node ages
A and ⋆
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
80. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
MCMC allows us to
consider all possible values
of ⋆ (marginalization)
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
81. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
MCMC allows us to
consider all possible values
of ⋆ (marginalization)
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
82. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
MCMC allows us to
consider all possible values
of ⋆ (marginalization)
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
83. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
MCMC allows us to
consider all possible values
of ⋆ (marginalization)
250 200 150 100 50 0
Time (My)
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
84. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
The posterior samples of
the calibrated node age are
informed by the fossil
attachment times
250 200 150 100 50 0
Time (My)
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
85. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
The FBD model allows
multiple fossils to calibrate
a single node
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
86. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Given ⋆ and A, the new
fossil can attach to the tree
via speciation along either
branch in the extant tree
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
87. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Given ⋆ and A, the new
fossil can attach to the tree
via speciation along either
branch in the extant tree
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
88. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Given ⋆ and A, the new
fossil can attach to the tree
via speciation along either
branch in the extant tree
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
89. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Or the unobserved branch
leading to the other
calibrating fossil
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
90. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
If ⋆ = A, then the new
fossil lies directly on a
branch in the extant tree
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
91. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
If ⋆ = A, then the new
fossil lies directly on a
branch in the extant tree
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
92. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Or it is an ancestor of the
other calibrating fossil
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
93. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
The probability of this
realization of the
diversification process is
conditional on:
λ, μ, and ψ
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
94. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Using MCMC, we can
sample the age of the
calibrated node while
conditioning on
λ, μ, and ψ
other node ages
A and ⋆
A and ⋆
N
250 200 150 100 50 0
Time (My)
N
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
95. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
MCMC allows us to
consider all possible values
of ⋆ (marginalization)
250 200 150 100 50 0
Time (My)
Diversification of Fossil Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
96. BAYESIAN INFERENCE UNDER THE FBD
Implemented in:
• DPPDiv:
github.com/trayc7/FDPPDIV
• BEAST2:
beast2.cs.auckland.ac.nz/
Available soon:
• RevBayes:
github.com/revbayes/revbayes
150 100 50 0
Time
250 200 150 100 50 0
Time (My)
FBD Implementation (Heath, Huelsenbeck, Stadler. PNAS 2014)
97. LINEAGES THROUGH TIME
The FBD model provides an
estimate of the number of
lineages over time
200 150 100 50 0
Time (My)
1.0
3.0
2.0
1.0
200 150 100 50 0
Time (My)
Number of lineages
98. LINEAGES THROUGH TIME
Lineage diversity over time with fossils
MCMC samples the
times of lineages in
the reconstructed tree
and the times of the
fossil lineages
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
99. LINEAGES THROUGH TIME
Lineage diversity over time with fossils
Visualize extant and
sampled fossil lineage
diversification when
using all available
fossils (21 total)
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
100. LINEAGES THROUGH TIME
Lineage diversity over time with fossils
Choosing only the
oldest (calibration)
fossils reduced the set
of sampled fossils
from 21 to 12,
giving less information
about diversification
over time
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
101. BEARS: DIVERGENCE TIMES
Sequence data for extant
species:
• 8 Ursidae
• 1 Canidae (gray wolf)
• 1 Phocidae (spotted seal)
Fossil ages:
• 12 Ursidae
• 5 Canidae
• 5 Pinnipedimorpha
DPP relaxed clock model
(Heath, Holder, Huelsenbeck MBE 2012)
Fixed tree topology
Sequence
Data
Fossil Data Phylogenetic Relationships
fossil canids
fossil pinnepeds
stem fossil ursids
fossil Ailuropodinae
giant short-faced bear
cave bear
Empirical Analysis (Wang 1994, 1999; Krause et al. 2008; Fulton Strobeck 2010; Abella et al. 2012)
102. BEARS: DIVERGENCE TIMES
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sloth bear
Brown bear
Polar bear
Sun bear
Am. black bear
Asian black bear
1. fossil canids
2. fossil pinnepeds
2. stem fossil ursids
3. fossil Ailuropodinae
4. giant short-faced bear
5. cave bear
1
3
2
4
5
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
103. BEARS: DIVERGENCE TIMES
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sloth bear
Brown bear
Polar bear
Sun bear
Am. black bear
Asian black bear
Ursidae
Eocene Oligocene Miocene
60 40 20 0
Time (My)
Plio
Pleis
Paleocene
95% CI
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
104. BEARS: DIVERGENCE TIMES
fossil canids
fossil pinnipeds
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sloth bear
Brown bear
Polar bear
Sun bear
Am. black bear
Asian black bear
Ursidae
Eocene Oligocene Miocene
60 40 20 0
Time (My)
Plio
Pleis
Paleocene
stem fossil Ursidae
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
105. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Improved statistical inference of absolute node ages
Biologically motivated
models can better
capture statistical
uncertainty
fossil canids
fossil pinnipeds
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sloth bear
Brown bear
Polar bear
Sun bear
Am. black bear
Asian black bear
Ursidae
Eocene Oligocene Miocene
60 40 20 0
Time (My)
Plio
Pleis
Paleocene
stem fossil Ursidae
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
106. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Improved statistical inference of absolute node ages
Use all available fossils
Eliminates arbitrary
choice of calibration
priors
fossil canids
fossil pinnipeds
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sloth bear
Brown bear
Polar bear
Sun bear
Am. black bear
Asian black bear
Ursidae
Eocene Oligocene Miocene
60 40 20 0
Time (My)
Plio
Pleis
Paleocene
stem fossil Ursidae
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
107. THE FOSSILIZED BIRTH-DEATH PROCESS (FBD)
Improved statistical inference of absolute node ages
Extensions of the FBD
can account for
stratigraphic sampling
of fossils and shifts in
rates of speciation
and extinction
Preservation Rate
175 150 125 100 75 50 25 0
Time
0.2 1.05
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)