One goal of the Human Genome Project is to provide the complete sequence of the euchromatic portions of human chromosomes. Even among chromosomes that are near completion, there are still large (greater than 100 kb) gaps which will require specialized efforts to fill or regions in which the present assembly is suspect. Many of these gaps lie within pericentromeric regions that are not necessarily refractory to subcloning. Instead, these same regions contain many highly duplicated segments in which the degree of sequence variation among duplicated loci (paralogous sequence variation) approaches levels of allelic variation. The goal of this proposal is to utilize the paralogous nature of pericentromeric DNA sequence to develop a high-throughput genome-wide approach of systematic gap closure and the proper assembly of these regions. In particular, the grant will focus on gap closure and resolving the biological complexity of ten pericentromeric regions (approximately 20 Mb of human DNA) that share various domains of duplicated sequence in common. First, we will exploit the duplicative nature of these regions to develop a sequence-based method to rapidly recover large-insert clones that map to gaps within the current assembly, integrate the appropriate clones into the sequence queue and to direct and validate assembly of overlap within the appropriate chromosomal regions. Secondly, combining both high-resolution FISH methods and paired-end sequencing reads we will assess the degree of structural polymorphism of these regions within the human population. The end product of this analysis will be the generation of high-quality sequence and genomic continuity within biologically complex regions that demarcate the boundary between euchromatin and heterochromatin. These results should provide the framework for understanding the peculiar genomic architecture of pericentromeric DNA and for defining the instability of these regions with respect to genetic disease.