Many health conditions, including substance use and mental illnesses, are complex and depend on both genetic and environmental factors. In the past several years genome wide association studies (GWA) have identified single-nucleotide polymorphisms implicating hundreds of robustly replicated loci for common traits. Despite numerous successes, it remains persistently difficult to identify genes, environmental factors, and interactions among them for complex diseases. This has been referred to as the geneticist's nightmare. Most of the identified variants have low associated risks and account for little heritability, and there is an increasing attention to find the missing heritability of complex diseases. To this end, it is important to develop novel statistical methods. Our Preliminary Progress demonstrates that our proposed methods have already produced significant findings on the association between genes, environments, and complex traits. Several genetic variants that we identified by our novel methods will be cataloged by National Human Genome Research Institute. This project will take advantage of the PI's many years of experience in the data collection and analysis of GWA studies and build on his success in the development of statistical methods and software for genetic studies. The primary aim of this application is to continue our effort and success in developing, evaluating, and applying new statistical models, methods, and software to conduct GWA analyses of complex diseases. Our specific aims are as follows: (A1) to develop statistical methods to perform inference for multidimensional and multi-modal traits. New methods will be developed to find the hidden heritability by incorporating multiple variants; simultaneously considering genetics and environment, and modeling multiple and heterogeneous traits; (A2) to develop tree- and forest-based methods for association analyses by incorporating multiple genetic variants, covariates, and gene-covariate interactions and incorporating existing biological information; (A.3) to develop and release software for public use through the PI's website. While the methods and software are developed, they will be applied to a variety of real studies that will serve as motivation and validation of our methods and software. In this regard, our secondary aims are to (B1) identify genes and environmental factors for addiction, mental illnesses, and the co-morbidity of psychiatric disorders; and (B2) identify genetic variants and environmental factors for preterm deliveries. In short, the objective of this project is significant, the foundation of our approach has been tested, and the new development will be novel and useful. The PI has decades of experience related to this project and leads a research center with well-established infrastructure and supporting personnel and students. PUBLIC HEALTH RELEVANCE: Despite great advances in technology and methodology that have led to recent successes in identifying genetic variants for complex diseases, developments of novel statistical methods are critically important in dealing with difficulties inherent in geneic studies of complex phenotypes. This project will have a significant impact on analysis of genetic data and hence on public health, because our methods and software can help investigators understand genetic and environmental factors of common and complex diseases including substance use, cancer, and preterm birth.