Project Summary: Enabling Comparative Pangenomics To many in the field, it is clear that we are moving rapidly toward a golden age of vertebrate comparative genomics in which thousands of high quality genomes of different species are publicly available and used in understanding the human genome. Despite the opportunity presented by the growth in available genomes, there has been relative stagnation in the software used to compare complete genomes, most of the software developed being old and limited in capabilities. To remedy this situation, we will create a hardened toolkit for genome comparison and annotation that can be robustly applied to thousands of vertebrate genomes. To demonstrate this toolkit and deliver its results to the broader genomics community, we will apply it to create a resource within the existing UCSC and Ensembl Genome Browsers that will incorporate thousands of vertebrate genomes. Large, well organized consortia have coalesced to take on the challenge of sequencing and assembling vertebrate genomes. Our alignments will form a backbone of these projects? analysis, and our synthesis of their data will create a resource that is much greater than the sum of what might otherwise be a series of smaller, fragmented and not directly comparable efforts. We will gather together more than 600 vertebrate genomes into our proposed resource in the first year of the proposal, rapidly delivering results. Paralleling the growth in available reference genomes, the last decade has been marked by an explosion in population sequencing projects. Although much of the cataloged human variation has a very recent evolutionary origin, there is a tremendous opportunity to combine and so better understand intra- and inter- species change using models from population genetics. We will create pangenome software to (i) avoid reference bias in species comparisons (i.e. avoiding assumptions about which alleles are fixed when comparing between species, which is important in quasi-species such as cichlids), (ii) allow ancestral alleles to be comprehensively estimated, including those that are part of structural variation, and (iii) more easily enable the study of balancing selection. To demonstrate the utility of comprehensive variation integration we will create a prototype of a pan-genome for the apes. We will use this graph to identify ancestral alleles and to dynamically convert annotations between species and assembly versions, and, via population mapping experiments, we will demonstrate its power for typing segregating but ancient variation. Using knowledge of ape evolution, we will ultimately extend this graph to adequately model the most complex regions of the human genome.