Genes underlie numerous diseases. Gene-gene and gene-environment interactions are likely to be present in most of the common, complex diseases including substance use disorders, cancer, preterm birth and its sequelae. Demonstrating such interactions beyond chance is very difficult. The Human Genome Project and HapMap Project have greatly advanced our ability to study genetic and environmental factors underlying complex diseases. In particular, high throughput genotyping technologies have been evolving rapidly. However, the etiologies of many complex diseases remain poorly understood, and use of the rich information to understand the complex diseases remains a tremendous challenge. Advanced data analysis and data mining techniques become indispensable in this endeavor. Developing powerful analytic methods to understand biological systems is the greatest challenge in using genomic information. In the last few years, several groups of investigators have successfully identified genes underlying various common, complex diseases using genome-wide association studies. Those successes have led to the NIH-wide Gene Environment Initiatives to identify genetic variants for complex. Recently, the PI has played a leading role in the planning, design, database development, statistical analysis, and study coordination for two major national networks of genetic studies using the genome-wide association study approach. This project will take advantage of the PI involvement in those two studies and build on his success in the development of statistical methods for genetic studies in the previous period. The primary aim of this application is to continue our effort and successes in developing, evaluating, and applying new statistical (both parametric and nonparametric) models, methods, and software to conduct GWA analyses of complex diseases. Specifically, we will develop (A1) statistical methods for genetic analysis of multiple traits; and (A2) tree- and forest-based models for association analyses of complex traits. Once accomplished, companion software will be developed for all of these models and made available to the public on Dr. Zhang's website. Our methods and software will be applied to the data available to the PI, and we will achieve the following secondary aims: to identify genes and environmental factors for tobacco use, substance use and its comorbidity with psychiatric disorders including anxiety. This is a continuation of our previous effort; and to identify genetic variants and environmental factors for preterm deliveries and its sequelae including Intraventricular Hemorrhage. Despite great advance in technology and methodology that led to recent successes in identifying genetic variants for complex diseases, developments of novel statistical methods are critically important to deal with difficulties inherent in genetic studies of complex phenotypes. PUBLIC HEALTH RELEVANCE: This project will have significant impact in analysis of genetic data and hence public, because our methods and software can help investigators understand genetic and environmental factors of common and complex diseases including substance use, cancer, and preterm birth. That in turn leads to better prevention and treatment strategies. [unreadable] [unreadable] [unreadable]