Project Summary/Abstract Phylogeny is fundamental to our understanding of biology and has translational applications to many areas of human health including epidemiology, cancer biology and immunology. Genome sequences from closely related species such as the great apes contain a wealth of information about their evolutionary history, including the species phy- logeny and divergence times, population demography, and possible episodes of hybridization or admixture. How- ever, extracting this information requires advanced probability models and ef?cient statistical and computational methods. This is because population genetic processes are stochastic and sequences from closely related species are highly similar containing only weak historical information about some parameters. For this reason, it is critical to develop parametric statistical methods that maximize the information extracted from the data. In this project we aim to develop ef?cient Bayesian computational methods for analysis of genome-scale datasets under the multispecies- coalescent-with-introgression (MSci) model. The proposed research will develop and implement novel algorithms and statistical methods in the program bpp to infer the number, the directions, timings, and intensity of introgression events between species (Aim 1). The program will then accommodate naturally both deep coalescence and introgression in the model. This will also allow a novel Bayesian method to be developed for inferring the probability that particular loci (genomic regions) are introgressed from a particular species admixture event for each sequence of a diploid individual (Aim 2). This question is of broad relevance and has been a subject of intense interest with respect to hominid admixtures. Another useful extension will be the addition of ongoing migration between pairs of populations using an ef?cient new migration model formulation (Aim 3). The method will provide parameter estimates of migration rates that are particularly relevant for designing safe CRISPR gene drive experiments in wild populations. The range of species that the bpp program can be applied to will be expanded by incorporating a more parameter rich model of DNA substitution (GTR+G) that better accommodates multiple substitutions per site and is necessary for analyzing more distantly related species. Moreover, we will allow fossil calibrations and a relaxed molecular clock (incorporating the features of our other program for divergence time estimation MCMCtree into bpp)(Aim 4). Fossil calibrations will allow estimates of divergence times in units of years rather than expected DNA substitutions. To broaden the accessibility of the program to users without command line program experience we will further develop a cross- platform GUI for bpp (BPPg) using a modern Javascript framework (Aim 5). Finally, the statistical performance of the method will be studied and compared to other methods (when they exist) by simulations and by analysis of paradigmatic datasets (Aim 6).