Project Abstract: This application addresses the NHLBI-RFA-OD-09-004 for Large-scale DNA Sequencing and Molecular Profiling of Well-phenotyped NHLBI Cohorts. We propose to establish a sequencing center to perform the production-level resequencing of exomes from 10,000 genomic DNA samples derived from well-phenotyped NHLBI cohorts. Second generation methods for targeted capture and DNA sequencing have matured rapidly. Exome sequencing currently has advantages over whole genome sequencing for studies aimed at understanding the contribution of rare variants to heart, lung and blood diseases. These advantages include much lower costs per sample and an increased likelihood of identifying variants of large effect that are amenable to functional interpretation. In our preliminary studies, we developed methods for targeted capture and second generation sequencing of protein-coding sequences at a genome-wide scale, i.e. the exome. We are consistently able to identify coding variants at 96% of targeted bases for 5% of the sequencing effort required for a whole genome. The result is high quality exomes, with a concordance to genotype calls of >99.75% and a false discovery rate for novel variants of <1%. We also show the power of exome sequencing for the direct identification of the causative gene for a monogenic disease. This proof-of-concept serves as a starting point for extending exome sequencing to study extreme and/or complex phenotypes of relevance to the NHLBI mission. Improvements that increase throughput or decrease costs while maintaining high data quality will be integrated into the exome production pipeline. Our recent innovations include a novel algorithm that nearly doubles the usable amount of sequence data that can be extracted from second generation sequencing image sets. The production focus of our team will be complemented by experts in high-throughput sequencing and genotyping, technology development (experimental and algorithmic), the statistical analysis of rare variation, population genetics, and copy number variation. Samples will be received from NHLBI cohorts and undergo extensive quality control prior to exome sequencing. Following sequencing, we will deliver a fully annotated set of coding variants for each individual. For the final deliverable, we will develop a custom genotyping chip for up to 50,000 high-impact, nonsynonymous variants to be assayed on a larger set of cohort samples (up to 50,000). We anticipate working closely with cohort investigators and the NHLBI to maximize the scientific value of these data and of this program. PUBLIC HEALTH RELEVANCE: Project Narrative: Well-phenotyped cohorts provide a key resource for studying the contribution of genetic variation to traits related to heart, lung or blood diseases. Applying targeted capture and massively parallel sequencing of all protein coding regions in the human genome (the exome) to well-phenotyped cohorts will help to delineate the contributions of both rare and common protein-altering variants to common diseases for the first time.