We propose to develop, demonstrate, and validate a pipeline for high-throughput, low-cost targeted resequencing of all human exons based on next-generation (gen2) sequencing techniques in support of the long term goal of enabling sequencing to be used routinely to characterize genotypes and genetic variation in genome-wide medical targets for large populations of individuals. We will develop and integrate two genome-scale target capturing methods--padlock capture and hybridization capture--that are well suited to work with short reads yielded by the gen2 technologies that will be our primary focus (Illumina Genome Analyzer and Polony). We expect that a pipeline that can be used to define the RefSeq exome of thousands of subjects will play a critical role in the next phase of genetic medical research. We propose three specific Aims: Aim 1 (Capture and sequence the human exome with padlock probes): Here we will develop a padlock probe library for exon capture;optimize the capture protocol to reduce cost, amplification bias, and increased coverage;similarly optimize generation of the padlock probe set, and scale up from small sets of exons to the entire RefSeq exome. Aim 2 (Capturing and sequencing the human exome by hybridization selection): Here we develop methods for targeted capture of exonic sheared DNA fragments on nitrocellulose filters;optimize capture and hybridization protocols and reduce cost;develop molecular bar-coding methods for multiplexed sequencing of multiple subject exome libraries;and scale up to RefSeq exome sequencing. Aim 3 (Develop data analysis and management for targeted exome sequence) develops algorithms for calling genotypes from sequence generated by Aims 1 and 2 that take into account sequence quality and coverage distributions;provides feedback to Aims 1 and 2 regarding feasibility of proposed coverage and accuracy targets;develops algorithms for indel detection;and provides for computer resources, software, and data distribution. Accuracy, replicability, coverage, cost, and quality control are common themes supported by all Aims. We will use samples with known genotype content in assessing capture efficiency, accuracy, and algorithm effectiveness. The two capture technologies we develop are both complementary and mutually supporting. For instance, sheared exome capture will provide opportunities to detect larger indels than padlock capture, but the padlock probes may be useful reagents for sheared exome capture. Our group is one of the few pioneering all component aspects required by this RFA. PUBLIC HEALTH REVELANCE This research will immediately advance medical research by providing technology for the cost-effective sequencing of thousands of individual DNA samples in large populations, giving deep insight into the genetic variations at work in human health and disease. It will eventually also enable health providers to learn the genetic variations of their patients very inexpensively and thus help refine and improve their medical care.