Complete vertebrate genomes are accumulating rapidly, and the pace of accumulation will only increase. This is excellent news, because the utility of comparative analysis depends heavily on the diversity of species sampling. There are, however, substantial challenges to exploiting the full potential of such extensive data: development of novel methods and analytical approaches is needed. We aim to develop and extend our capacity to analyze the dynamic evolutionary processes (across regions and through time) that have shaped extant genomes. We will achieve this goal using a Bayesian evolutionary analysis approach we recently developed that allows us many orders of magnitude speed advantage over competing approaches, and which scales well with model complexity and data size. Many of the studies we propose are based on biologically realistic paradigms that previously were impossible to consider or test because of computational limitations. We propose to comprehensively delineate the repetitive contents of a selected set of vertebrate genomes, including annotation of ancient elements from the dark matter of genomes (the currently unannotated portion). The transposable elements in this set of repeat sequences will be used to build the first complete genome-wide models of context-dependent substitution processes. We will consider contexts such as recombination, rearrangement, expression, and local nucleotide content, as well as unknown contexts, and analyze how the evolutionary processes influenced by these contexts have changed over time. These context- dependent substitution models will provide a powerful tool for identifying and annotating functional regions in interspecific comparisons of vertebrate genomes, and for differentiating and characterizing fitness-based effects in proteins. The core concept is that that if we better understand genome-wide patterns of background nucleotide substitution, then we will be able to more accurately identify genomic regions that are likely functional, and to understand how selection directs the evolution of proteins. PUBLIC HEALTH RELEVANCE: The proposed research is relevant to public health because it will develop new methods for understanding and interpreting vertebrate (and human) genomes, and for identifying genomic regions that are functionally important and thus relevant to phenotype and disease. The project is relevant to the NIH mission because it will provide methods for extracting information from comparative genomic data that will inform the structure and function of genomes, and how they relate to phenotypes of disease-related mutations in humans.