Recent developments in The Human Genome Project and breakthroughs in different types of high throughput technologies have changed how researchers approach complex diseases by moving toward cross- disciplinary studies, collecting data on all facets of disease. The objective of this application is to develop efficient statistica and computational approaches to integrating genetics, genomics and epidemiologic data for understanding the interplay of genetics and environment in complex diseases, with the long-term goal of devising personalized strategies to prevent and treat these diseases. Genome-wide association studies have identified thousands of trait associated genetic variants, and provided valuable insights into the genetic architecture of these traits. However, most variants identified so far confer relatively small increments in risk, and explain only a small proportion o heritability, leading many to question how the remaining 'missing' heritability can be explained. This application addresses this 'missing' heritability from several aspects: rare variant association analysis, gene-environment interaction, and heritability estimation beyond additive genetic effects. Accordingly, we propose the following specific aims. Aim 1 is to develop methods for integrating functional information into rare variants association analysis. To achieve this goal, Aim 1 includes developing databases of tissue-specific functional annotation and constructing regulatory expression networks (eQTL) from public data generated from large collaborative projects such as the Encyclopedia of DNA Elements and the Genotype Tissue Expression. The theoretical properties of the rare variants analysis will also be studied to devise powerful tests in consideration of genomic features such as linkage disequilibrium and sparse signals. Aim 2 is to develop methods for rare variants gene-environment interaction (GxE) that incorporates functional information. Efficient and versatile screening strategies will also be developed for genome-wide discovery of GxE. Even though this aim is focused on GxE, the methods are also applicable to gene-gene interaction (GxG). Aim 3 is to develop methods for estimating heritability that incorporates GxE and GxG to understand the complex interplay between genetic susceptibility and environment The proposed work is motivated by a large consortium on colorectal cancer, which has over 40,000 participants from well-characterized studies with detailed data on both environmental risk factors and GWAS and whole genome sequencing data. The developed methods will be applied to the consortium to gain new insights in colorectal cancer and demonstrate the feasibility of the methods. Since the methods are applicable to other complex diseases and traits, R-based open source software will be developed and submitted to the Comprehensive R Archive Network for broad dissemination.