Fine-scale nucleotide changes, along with genetic recombination, are often cited as the major source of human genetic variation [1, 13, 14]. Less is known about larger scale (> 10kb) genomic structural variations. As genomic technologies improve, we are detecting structural variation in ever-increasing numbers, including genomic inversions [24, 48, 71, 65, 31]; insertion/deletion polymorphisms [12, 26, 42]; and, copy number polymorphisms [28, 59, 60]. These large variations can completely disrupt coding and regulatory sites and copy number of genes, and thereby have a huge impact on human phenotypes and disease susceptibility [23, 61]. Deleterious effects have indeed been observed in cancer and other diseases [70, 43]. Our understanding of the scale and impact of these variations can be enhanced by improving computational tools for mining the data from these technologies. Here, I propose the development of algorithms and computational tools to improve detection and resolution (location of breakpoints) of structural variation. Specifically, I will develop algorithms for (a) experimental design of sequencing projects for detecting and resolving structural variations; (b) fine-mapping of breakpoints using end sequence profiling, to detect gene-disruption and gene-fusions; (c) reconstructing tumor genome architectures; (d) detection of targeted genomic variations in a heterogeneous mix of normal versus mutated cells via multiplex PCR; and (e) detection of balanced structural variation in genotype data. The tools will be designed using techniques from statistical machine learning and combinatorial algorithms. Validation will be performed using known structural variations, simulation studies, and extensive experimental collaborations with technology developers and early technology adopters. All of the data, and software will be freely available for academic and non-commercial uses. PUBLIC HEALTH RELEVANCE: The proposed computational tools will be used to detect structural variations in human populations as a starting point for understanding their role in normal evolution and disease, specifically cancer. The architecture of tumor genomes will help reveal genes that are disrupted and differentially expressed in tumor cells. The targeted detection of genomic lesions in a heterogeneous mix of mutated and wildtype cells, will find application as an early diagnostic for cancer. Thus, our computational methods will have an immediate and long term effect on human health.