Project 2 (Sequencing and Clone Characterization) will generate much of the data produced by the overall Program Project. Its basic mission is to take fosmid clones, identified in Project 1 as being likely to contain novel structural variants of the human genome, and characterizethem through a hierarchical procedure. The steps in this procedure will include fingerprinting with multiple restriction enzymes, light-shotgun sequencing, full-shotgun sequencing, and finishing. At each step, the likelihood that the fosmids have been correctly flagged as containing structural variants relative to the human-reference sequence, will be evaluated. A second activity in Project 2 will be to characterize clones that have two good-quality end sequences, neither of which aligns with the human-reference sequence. Fosmids in this class are candidates either for falling in gaps in the reference sequence or within large insertions relative to it. Project 2 will build contigs from these "no hitter" clones and choose a minimal tiling path across the contigs for sequencing, providing that capacity is available; alternately, we will take the lead in finding other options for the sequencing of these paths. Priority within Project 2 will be given to clones from insertions into the reference sequence, rather than from gaps, in keeping with the overall goals of the Program Project. Complete definition of the sequences of intermediate-scale variants of the human genome will have direct applications in human genetics. Unlike the more intensively studied single-nucleotide polymorphisms (SNPs), these variants will typically affect the arrangement of thousands or more nucleotides; hence, any given intermediate-scale variant is more likely than a SNP to alter the function of one or more human genes. The complete sequences of the variants, which will be determined in Project 2, will allow design of reliable genotyping assays in Project 3 that can be used to assess the presence or absence of the variant in large numbers of individualsfor example, in a series of cases and controls ascertained to study genetic influences on susceptibility to a particular disease.