Event
The sample frequency spectrum (SFS) describes the distribution of allele
counts at segregating sites, and is a useful statistic for both
summarizing genetic data and inferring biological parameters. SFS-based
inference proceeds by comparing observed and expected values of the SFS,
but computing the expectations is computationally challenging when
there are multiple populations related by a complex demographic history.
We
are developing a new software package, momi (MOran Models for
Inference), that computes the multipopulation SFS under population size
changes (including exponential growth), population mergers and splits,
and pulse admixture events. Underlying momi is a multipopulation Moran
model, which is equivalent to the coalescent and the Wright-Fisher
diffusion, but has computational advantages in both speed and numerical
stability. Techniques from graphical models are used to integrate out
historical allele frequencies. Automatic differentiation provides the
gradient and Hessian, which are useful for searching through parameter
space and for computing asymptotic confidence intervals.
Using momi,
we are able to compute the exact SFS for more complex demographies than
previously possible. In addition, the expectations of a wide range of
statistics, such as the time to most recent common ancestor (TMRCA) and
total branch length, can also be efficiently computed. The scaling
properties of momi depend heavily on the pattern of migration events,
but for certain demographic histories, momi can scale up to tens to
hundreds of populations. We demonstrate the accuracy of momi by applying
it to simulated data, and are in the process of applying it to
real data to infer a model of human history involving archaic hominins
(Neanderthal and Denisovan) and modern humans in Africa, Europe, East
Asia, and Melanesia.