The discovery of genome structural variants has increased exponentially with the application of next generation sequencing (NGS) technologies. The current approaches, however, are particularly biased against variants of a particular size and sequence context. Most notable is the skew against highly duplicated regions-regions enriched in genes and disease-causing variation. This program project will focus on the sequence characterization of more complex structural genetic variation with a particular emphasis on regions of biomedical relevance. The approach will be to leverage existing and new clone resources in combination with next-generation sequencing technologies to target variation that has not been adequately assessed for copy, content and structure. The specific aims of this proposal are to 1) discover, sequence and integrate novel insertion sequences, duplicated regions of high diversity, and recurrent structural variants into the human reference genome;2) develop a next-generation sequencing based platform to accurately predict copy number and sequence content of duplicated genes for 2,000 genomes being analyzed as part of the 1000 Genomes Project;and 3) generate a BAC clone resource (n=18 individuals) and completely sequence and characterize structurally variant haplotypes for 20 biomedically relevant loci where structural variation predisposes to disease. This program project is a collaborative effort that brings together expertise in large scale genome sequencing, library production and structural variation. This work will provide fundamental information that will inform and complement efforts as part of the 1000 Genomes Project's Structural Variation Initiative and the Genome Reference Consortium. It will continue to develop the first high-quality reference set of sequenced variants, provide insight into the molecular mechanisms underlying these differences, and lead to the development of genot5rping platforms that will be needed to assess the phenotypic consequences of these regions in terms of human disease and adaptation. RELEVANCE: This program project will develop methods, resources and high-quality sequence data corresponding to human genome structural variation that predisposes to both common and rare human genetic diseases. The program particularly focuses on classes of genetic variation that are currently poorly understood as part of ongoing efforts to sequence genomes. The results of this work will provide insight into the mechanisms and risk factors leading to human disease;more fully explore the full spectrum of human genetic variation and lead to the detailed characterization of structural variant haplotypes of biomedical importance.