The goal of this proposal is to identify genes that affect susceptibility to stimulant and cannabis dependence using whole genome sequencing with genotype imputation. The advent of increasingly economical whole genome sequencing provides new opportunities to identify trait-associated sequence variations not detected by linkage or association methods. It is not certain, however, what the most effective analytic method for identifying trait-related loci will be. Thus, we propose to study four distinct analytic approaches to identify sequence variants that affect stimulant and cannabis dependence in three cohorts using a state-of-the-art informatics infrastructure. The three study cohorts were ascertained as part of the Mission Indian Study (PI Cindy Ehlers), the combined Yale-University of Connecticut Addiction Study samples (PI Joel Gelernter) and the San Francisco Family Study (PI Kirk Wilhelmsen). Because we are studying populations with three different continental origins, we will ascertain whether the major genetic risk factors for the traits of interest are shared or population-specific. Two of the three populations that make up our sample (i.e., Native Americans and African Americans) are understudied. Inclusion of these cohorts will allow for analyses within and across populations that differ in ascertainment strategies and ethnic composition, permitting strong tests of replication across populations and enabling direct comparisons of ascertainment and analytic approaches as they apply to genetic studies of addiction. We expect that the approach used for this study will evolve. Currently we plan to use four complementary analytic strategies: 1) a conventional SNP analysis of genotypes detected using >5X sequence coverage for polymorphisms with minor allele frequency (MAF) greater than 0.1%; 2) a gene-based approach to determine whether more cases than controls have rare sequence variants (MAF<1%) likely to affect the function of a given gene; 3) use of a simplified form of affected-only linkage analysis of known and distantly related individuals to identify long chromosomal segments that are likely to be shared identical by descent; and 4) extended analysis of higher-level structural variation. The technology and economics of DNA sequence analysis is rapidly evolving. Based on current costs we plan to complete >5X whole genome sequencing of at least 3000 subjects. Critical to our decision to pursue low-pass genomic sequencing was the development of multipoint imputation methods. Simulation analysis indicates that for the same cost more subjects can undergo >5X whole genome sequencing with imputation than exomic sequencing, thus providing more exomic data as well as rich data for the rest of the genome. The findings from this study may provide insight into the genetics of other substances of abuse which can be confirmed by data sharing with other projects responding to this RFA and to other projects in the NIDA Genetics Consortium.