Deep resequencing of human genes has led to the discovery of rare, nonsynonymous sequence variants that are robustly associated with complex human phenotypes. Such studies have historically been rate-limited by the cost of DNA sequencing. Although a new generation of sequencing platforms is reducing costs by over two orders of magnitude, the routine sequencing of complete human genomes continues to be prohibitively expensive. Recently, methods have been developed to enable the efficient capture of specific subsets of the genome. With these methods, the cost of sequencing all of the protein-coding sequences (i.e. ~1% of the human genome split across ~180,000 discontiguous subsequences) may soon be on par with that of dense genotyping arrays. The goals of this proposal are to further the development of these targeting methods, and to integrate them into a scalable resequencing pipeline that relies on second-generation sequencing technology. Specifically, we will: (1) optimize and evaluate candidate strategies for multiplex capture, including array hybridization and gap-fill molecular inversion probes, while extending their application to the full protein-coding genome;(2) integrate optimized capture methods, second-generation sequencing technology, and sequence analysis software into a scalable resequencing pipeline;(3) develop the requisite computational tools for translating raw sequence data generated by new sequencing platforms into quality-tagged, consensus predictions of sequence variants;(4) make our data and methods broadly available, and facilitate the goals of this program through open communication with other investigators and the NIH. PUBLIC HEALTH RELEVANCE - As we enter an era of "personalized medicine", DNA sequencing technology will be increasingly important to public health, contributing towards the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. Next-generation sequencing technologies have the potential to markedly accelerate genetics research, but are hindered by the lack of equivalently powerful methods to target specific subsets of the human genome. We propose here to develop technologies that meet this critical need, focusing specifically on the development of a scalable resequencing pipeline that targets the ~1% of the genome that is protein-coding. The principal investigator (PI) proposes to evaluate two different methods that will capture approximately 1% of the human genome that represents the protein coding genome (PCG).