Structural variants, including duplications, insertions, deletions, inversions, and translocations of large blocks of DNA sequence, have been shown to be associated with various human diseases. These variants also frequently occur as somatic alterations in cancer. Identifying and characterizing structural variants in a genome sequence is a challenging task. We propose to develop computational methods to enable comprehensive studies of structural variation in normal and diseased genomes. In Aim 1 we develop a general computational framework for classification and comparison of structural variants across multiple samples and measurement platforms using a novel geometric and probabilistic approach. In Aim 2 we design algorithms to maximize the effectiveness of emerging single-molecule sequencing technologies for detecting and assembling complex structural variants and rearranged transcripts. In Aim 3 we develop algorithms to reconstruct the organization of cancer genomes and investigate how structural variants alter genome organization during somatic evolution. Finally, in Aim 4, we study the population genetics of inversion polymorphisms in the human genome, including their effects on haplotype block structure and whether inversions under selection leave distinctive genetic signatures. We will apply these approaches to data from human, cancer, mouse, and pathogen genomes in collaboration with several biomedical researchers. Successful completion of the proposed studies will facilitate future research of the role of structural variation in human and cancer genetics. PUBLIC HEALTH RELEVANCE: Identifying the inherited genetic differences associated with disease and the acquired mutations that lead to cancer are major challenges in genomics. One important class of such mutations are structural variants, which include duplications, insertions, deletions, inversions, and translocations of large blocks of DNA sequence. These variants have been implicated in several diseases including autism and cancer. New genome technologies are enabling large-scale measurement of these variants, but demand novel computational methods to maximize the information from these measurements. We will develop a number of algorithms to facilitate the identification and characterization of structural variants. These approaches will aid in the discovery of genetic variants that will provide better diagnostics and/or personalized treatments for various diseases.