Personalized medicine or individualized lifestyle recommendations based on both genetic and environmental factors are being promoted as the future of public health. Recent developments in The Human Genome Project and high throughput technologies have offered many opportunities in improving risk prediction and elucidating the underlying biological mechanism by integrating both genetic and environmental data. The objective of this application is to develop statistical methods for estimating absolute risk using both gene and environment data, assessing gene-environment interaction and translating findings into public health and personalized recommendations for intervention. Accurate age-specific absolute risk prediction is critical in patient management and disease prevention. Key to such translation is development of statistical tools for risk estimation. There are two urgent and unmet needs: (a) lack of statistical tools to develop robust risk prediction models, which take advantage of multiple sources including both cohort and case control studies and population-wide reports on age-specific disease rates and exposure distributions; (b) lack of guidance on how a developed prediction model should be used in the clinical setting to aid decision making with statistical rigor. Aim 1 is to develop statistical methods for estimatig robust age-specific absolute risk under complex study designs and individualized recommended age to start intervention. To better develop individually tailored risk prediction and provide guidance on potential lifestyle and screening intervention, it is important to understand how gene and environment work in synergy, as differences in genetic makeup can cause people to respond differently to the same environmental exposure (GxE). As whole genome sequencing studies are being conducted, much progress has been made for rare variant association, but little has been done toward GxE for rare variants, in part because there is a lack of adequate data to detect and estimate the effect of GxE for individual rare variants. Toward this end the functional information generated from the recent large collaborative initiatives such as ENCODE and TCGA can provide guidance on how to aggregate variants with shared functional characteristics and therefore leveraging data across variants. To our knowledge, there is no method yet to incorporate such information for GxE. Aim 2 is to develop methods for assessing GxE risks for rare variants by integrating the functional information. The proposed work will be applied to the Genetics and Epidemiology of Colorectal Cancer Consortium (PI: Ulrike Peters; Lead Biostatistician: Li Hsu). The growing consortium has currently over 40,000 participants from population-based case-control and cohort studies with detailed data on both environmental risk factors and genome-wide association and whole genome sequencing data. Since the methods are also applicable to other complex diseases, we will develop open source software based in R and make it publicly available.