Project Summary Recent developments in molecular biology and cancer epidemiology jointly are making fundamental contributions to the study of etiology, diagnosis, prognosis and treatment of cancers. Case-control studies have been increasingly used for studying the association between different types of cancers and a candidate gene in the last two decades. More recently, many premier cancer and health re- search institutes have undertaken efforts to form global consortium of large case-control genome-wide association studies (GWAS) for various types of cancer. The modest contribution of GWAS findings in terms of explaining cancer risk have again emphasized that the role of environmental factors can- not be ignored in cancer etiology. In the post-GWAS era, many epidemiologic studies are exploring gene-environment interactions (G x E studies). The proposed research considers a variation of the case-control sampling design, namely the two-phase sampling design for G x E studies. The design describes a study setting where a set of inexpensive covariates are available on a larger study base (Phase I sample) and outcome-exposure stratified sampling has been employed to select a sub-sample (Phase II sub-sample). On the Phase II sub-sample, expensive genetic or biomarker data are measured. The goal is to investigate G x E interactions under such sampling designs. The proposed methods lead to efficient use of all available data in Phase I and Phase II through an appropriate two-phase joint retrospective likelihood. More subtle issues like existence of non-monotone missing data in Phase II sub-sample, relaxing the gene-environment independence assumption, variable selection in a multi-gene model are considered. A semiparametric profile likelihood based approach and an alternative semiparametric Bayes approach is proposed for two-phase G x E studies in Specific Aims 1 and 2 respectively. Specific Aim 1: Development of semiparametric profile likelihood based estimation strategy for two- phase studies of gene-environment interaction. The proposed estimation strategy can handle non- monotone missing covariate data patterns and addresses the critical issue of relaxing gene-environment independence assumption. Specific Aim 2: Development of an alternative semiparametric Bayesian procedure to accomplish the same modeling objectives as in Aim 1. The Bayesian methods would offer more flexibility to handle large number of main effects and interaction terms in the disease risk model and to relax gene-environment independence. The possibility of extending Aim 2 to haplotype-based interactions will be explored. The project team has expertise in biostatistical methodology, cancer epidemiology, human genetics, cancer therapeutics and clinical research. A concrete data example from the Molecular Epidemiology of Colorectal Cancer Study, that examines the evidence of effect modification of the association between colorectal cancer and long-term use of statins by genes in the cholesterol synthesis/lipid metabolism pathway has been identified as a motivating and illustrating example for the proposed methods. However, the methods developed in the application are generic and may be broadly applied to other cancer epidemiology studies that employ outcome-exposure stratified sampling schemes. There are no existing Bayesian approaches for two-phase G x E studies so far. The planned research will also contribute towards filling a gap in the classical frequentist literature on handling non-monotone missing data patterns in two-phase studies. The research will provide valuable clinical insight on the chemoprotective association of statins with colorectal cancer as modified by variation in genotypic information. 1