PROJECT SUMMARY Structural variation (SV), is a diverse class of genome variation that includes copy number variants (CNVs) such as deletions and duplications, as well as balanced rearrangements, such as inversions and reciprocal translocations. A typical human genome harbors >4,000 SVs larger than 300bp and their large size increases the potential to delete or duplicate genes, disrupt chromatin structure, and alter expression. Despite their prevalence and potential for phenotypic consequence, SVs remain notoriously difficult to detect and genotype with high accuracy. Much of this difficulty is driven by the fact DNA sequence alignment ?signals? indicating SVs are far more complex than for single-nucleotide and insertion deletion variants. Unlike SNP alignments that vary only in allele state, alignments supporting SVs vary in state (supports an alternate structure or not) alignment location, and type. Consequently, the accuracy of SV discovery is much lower than that of SNPs and INDELs. Furthermore, SV pipelines scale poorly and are difficult to run. These challenges are a barrier for single genome analysis and studies of families must invest substantial effort into eliminating a sea of false positives. These problems become exponentially more acute for large-scale sequencing efforts such as TOPmed, the Centers for Common Disease Genetics, and the All of Us program. Software efficiency is key to scalability for such projects. However, of equal importance is comprehensive, accurate discovery. Building upon more than a decade of software development experience and analyzing SV in diverse disease contexts, we have invested significant effort into understanding the causes of the insufficient accuracy for SV discovery. These efforts, together with our research and development experience in this area, give us unique insight into improving the accuracy and scalability of SV discovery. Our goal is to narrow the accuracy gap between SNP/INDEL variation and structural variation discovery. These developments will empower studies of human genomes in diverse contexts and will therefore have broad impact. Our goals are to: 1. Develop a deep learning model to correct systematic variation in sequence depth. This new machine learning model will correct systematic biases in DNA sequence depth and dramatically improve the discovery of deletions and duplications. 2. Improve the speed, scalability, and accuracy of SV detection and genotyping. Using new algorithms, we will bring the accuracy of SV detection much closer to that of SNP and INDEL discovery and allow accurate SV discovery to be deployed at scale. 3. Create a map of genomic constraint for SV from population-scale genome analysis. We will deploy our new methods to detect and genotype structural variation among tens of thousands of human genomes. The resulting SV map will empower the creation of a model of genomic constraint for SV and enable new software to predict deleterious SVs, especially in the noncoding genome.