Enabling imputation and CNV analysis in genetic studies of African Americans This application addresses broad Challenge Area (08) Genomics, and specific Challenge Topic, 08-HL-104: Assess genetic variation in African Americans and determine its effect on disease. Project summary Rapid progress is being made in identifying regions of the human genome that harbor disease-causing genetic variation. However, almost all genome-wide association (GWA) studies to date have been carried out in populations whose ancestry derives from a single continent, usually Europe. Populations like African Americans and Latino Americans have been excluded in most GWA studies, both because their mixed ancestry (from multiple continents) creates complexities for GWA analyses, and because enabling data resources for such populations are currently under-developed. The routine exclusion of populations with mixed ancestry from GWA studies is problematic, not only because this practice produces research findings that may be less relevant to minority populations, but also because it reduces the number of biological discoveries. Genetic variants that may be important in affecting disease in minority populations - for example due to high frequency in those populations, or to interaction with environmental or cultural factors - are less likely to be discovered in populations of European ancestry than in the populations in which they are most significant. Here we propose to develop methods and resources that enable effective, fully powered GWA studies in admixed populations. Two areas of modern genetics that are critical to GWA studies are imputation and the analysis of copy number variation (CNV). The extension of GWA studies beyond the polymorphisms that are directly typed to far-larger sets of variants ("imputation") has become a fundamental tool for extending the reach of such studies and integrating the results of multiple studies (in "meta-analysis") that have directly typed different sets of variants. It has also become clear in recent years that human genomes differ at large physical scales in the form of copy number variants (CNV) that extend for thousands of bases, and that such variation is frequently associated with disease. Fully determining the relevance of CNV to disease and phenotypic variation is becoming a core goal of GWA studies. In the work to be supported by this award, we will: [unreadable] Extend the best methodology for imputation to African Americans, producing public-domain software to execute these analyses, validating our methods, and providing a public-domain database of all known polymorphisms and their imputability in African American cohorts;[unreadable] Extend effective CNV data resources and imputation strategies to African Americans, including maps of CNV locations, allele frequencies, and LD properties in African American populations, by integrating data from CARe, the 1000 Genomes Project, and HapMap phase 3;[unreadable] Critically evaluate these methods, resources, and analyses by genotyping SNPs and CNVs in a large African-American cohort, the Jackson Heart Study (JHS), that is also being analyzed in CARe;[unreadable] Use these methods and resources to extend the reach of GWA studies in CARe to far more SNPs and CNVs. To validate our methods, we will genotype in JHS the disease-associated SNPs and CNVs that we discover by imputation In CARe. The work we propose will offer opportunities both to uncover genetic associations of medical importance to African Americans and to demonstrate the efficacy of advanced GWA studies in an admixed population. Although our work emphasizes genetic studies in African Americans, the methods we provide will also improve the reach and quality of GWA studies in all groups whose ancestry derives from multiple continents. Populations like Africans and Europeans that were separated from each other for thousands of generations differ in both the frequency of specific genetic variations and in the relationships these variations have to each other (sometimes called "linkage patterns"). We will characterize and use the linkage patterns in persons of either African or European ancestry to analyze DNA of African Americans, most of whom have ancestry from both of these populations. We will develop methods to use information from genetic variants that have been genotyped to predict data for variants that have not been genotyped, a process called "imputation," thus providing useful data for literally millions of untyped variants, and greatly increasing the power of efforts to find variants that contribute to human disease.