Background: Large-scale genomic sequencing currently requires high cost equipment and is labor intensive. The throughput of conventional sequencing has grown inadequate in fulfilling the escalating demands for genomic sequence. Understanding the intricacies of human genetic organization and how it relates to human health and inheritance, requires genomic-level comparative analyses that cannot currently be performed due to the lack of sequence information. The 454 Corporation has developed a massively parallel, high-throughput sequencing instrument that combines simultaneous sequencing in hundreds of thousands of picoliter-scale reaction wells, with high-powered bioinformatics. The method does not require an exponential scale up in effort or cost, despite exponential increases in genome size. The effort and cost of conventional sequencing scales up proportionately with the size of the genome. The 454 approach will be low cost, and make sequencing large genomes available to a wide variety of laboratories. Specific Aims: In this Program we will (i) Construct a robust double-ended sequencing method that generates short sequences from both ends of each individual fragment (ii) Develop a robust sequence assembly tool appropriate for double-ended sequencing (iii) Prototype the use of our scaled up system, with double-ended sequencing for economical, rapid haplotyping of mammalian organisms. Our Study design incorporates a multi-disciplinary effort across molecular biology, chemistry, engineering, software and bioinformatics groups at 454 Corporation. The molecular biology and sequencing efforts will be lead by the PI, Dr. Kenton L. Lohman. The hardware, fluidics, optics, software and bioinformatics efforts will be lead by co-PI, Dr. Marcel Margulies. We will be taking advantage of current 454 infrastructure, and key personnel. Relevance: There is a growing need across the research, pharmaceutical and clinical communities for low-cost, high throughput genomic sequencing. Comparative genomics, SNP and haplotype analyses have shown tremendous potential to rapidly characterize individual susceptibilities to many classes of chronic and acute diseases and disorders. Current costs of whole genome sequencing can only be borne by large institutions. The cost of automated sample preparation and sequencing, scales up proportionately to the size of the genome being sequenced. The 454 Sequencing system simultaneously analyzes millions of fragments in massively parallel sequencing of mammalian organisms. The use of massively parallel sequencing and bioinformatics analysis creates a low cost, high throughput sequencing system.