NGS in Large CAD Families: In-Depth Identification of Rare Risk Genomic Variants Abstract Coronary artery disease (CAD) is the leading cause of death worldwide. Genetic factors contribute significantly to the development of CAD. The long-term objective of this project is thus to identify novel genetic and molecular determinants/markers for CAD. To achieve this goal, we have spent more than 10 years of extensive efforts to identify and acquire data for 24 very large, multigenerational families (GeneQuest II, mean pedigree size=16). This has become a unique and highly valuable resource for discovering susceptibility genes and genomic variants that confer risk of CAD. We have completed a genome-wide linkage scan with 408 polymorphic markers that cover the entire human genome by every 10 cM in GeneQuest II families, and identified two highly significant CAD loci on chromosome 3q28 and 7p22.3 and four other significant loci. Back in the 90s, we also had established another well-characterized US cohort of 428 CAD families with familial, early onset CAD (GeneQuest, mean pedigree size=5). The same 3q28 CAD locus showed a highly significant linkage in GeneQuest, too. Whole genome next generation sequencing (NGS) has become an enabling technology to identify susceptibility genes for complex diseases. Thus, we propose to employ an innovative, integrated strategy that combines whole genome NGS and genome-wide linkage analysis in the 24 GeneQuest II families to identify genomic variants associated with CAD. All affected family members in the 24 GeneQuest II families will be subjected to whole genome NGS, and novel rare genomic variants will be identified. Private variants will be characterized by simple co- segregation with disease i families to determine whether they are disease-causing mutations. Other rare variants will be analyzed for association with CAD in the 24 large GeneQuest II families using family-based rare variant association studies that incorporate multiple variants in a gene or a functional region as well as haplotypes from multiple variants. Positive associations will be validated in the replication population (428 GeneQuest families). We prioritize rare variants in the following succeeding order: (1) Rare variants under linkage peaks; (2) Rare variants at or near CAD loci identified by GWAS; (3) Rare variants outside of linkage peaks or GWAS loci. Bioinformatics analysis and relevant functional/expression studies will be used to determine whether variants associated with CAD affect the function or expression of nearby genes. These studies should lead to identification of new genomic variants that confer risk of CAD and uncover novel genetic/molecular pathways for the pathogenesis of CAD.