A fundamental understanding of microbial genome and species evolution, as well as human genome evolution, is important for public health and medical science. This proposal addresses the concept of codivergence, i.e., the divergence of one gene or species lineage concomitantly with the divergence of another. In this process two or more lineages stay closely associated with one another: In this process two or more lineages stay closely associated with one another: genes with species and hosts with pathogens, parasites or symbionts. Deviations from codivergence that are increasingly recognized in pathogen and human genomes include gene duplications, lateral gene transfers between species, retention of ancestral polymorphisms by balancing selection, and accelerated evolution by neofunctionalization. This project will bring together complementary expertise and cross-train students in algebraic geometry and mathematical biology, molecular biology and evolution of symbiotic systems, and computer science and bioinformatics. The investigators propose to: (1) develop a new statistical model and corresponding methods and algorithms for simultaneous derivation of pairs of gene trees to allow rigorous tests of their codivergence or deviation from codivergence; (2) design and develop software that efficiently implements these new methods; and (3) apply such methods to the large number of genes available from genome sequences in order to better assess the history of speciation and genome evolution. Unlike existing methods based on independently constructed phylogenetic trees, a novel statistical model is proposed based on polyhedral and algebraic geometry to determine whether sets of gene sequences exhibit codivergence. The method will develop two or more trees jointly, to better reflect the properties of codivergent lineages. This approach will be implemented together with phylogenomic tools to characterize ancestral genomes or species, modeling most-recent common ancestor species as clouds of associated gene lineages with related but nonidentical gene tree topologies. Because exact algorithms for these techniques will be computationally impractical for genome-scale data sets, heuristics and approximations will also be developed and tested. The proposed methods will be applicable to a broad array of biological problems, such as identifying codivergent and noncodivergent gene sets in whole genomes, evaluating possible host-parasite codivergence and coevolution, and testing codivergence of modules in multi-modular enzymes. In this project, teams of students, both graduate and undergraduate from all of the represented disciplines, will conduct joint projects. They will learn how to gather sequence data, design and use bioinformatic tools, and analyze biological data with mathematics and statistics to infer gene and species relationships. Additionally, a new phylogenetics course, building on an existing interdisciplinary Informatics Certificate curriculum, will be offered to graduate students.