My career goal is to become an established researcher in molecular epidemiology with expertise in genetic epidemiology of cancer. This goal builds upon my previous training in statistical science and statistical genetics, but requires further training in molecular techniques and epidemiology. At the end of this training period, I will be an established researcher in cancer prevention. I have developed a comprehensive education and reentering plan. My education plan will provide intensive instruction in the areas of genetics, cancer biology and epidemiotogic methodologies. I have chosen three mentors who will supervise specific portions of my training. Dr. Xifeng Wu will mentor me in molecular epidemiology; Dr. Christopher Amos will supervise the development and application of the statistical methods; and Dr. Margaret Spitz will provide guidance in cancer epidemiology and cancer prevention and control methodologies. My research proposal will focus on developing risk models for lung cancer. I propose to capitalize on the availability of epidemiologic and genetic marker data from an ongoing case-control study (R01 CA55679) under the direction of Dr. Margaret Spitz. This study currently includes over 2700, mostly Caucasian, lung cancer cases and controls, matched on sex, age, ethnicity and smoking status. I will also utilize data from her completed study involving an additional 597 Mexican- and African-American lung cases and controls. My research goal is to construct robust risk models to characterize the most important genetic and environmental risk factors for lung cancer. I aim to investigate lung cancer risk by simultaneously analyzing genetic and epidemiologic data. The specific aims are: 1. To investigate the effect of missing genotypic data on risk modeling and apply methodology to deal with missing data in the assessment of risk for cancer. This is a real deficiency in many studies and I will investigate approaches to handle missing data. I propose a simulation study to investigate the validity of imputation methods in a case/control framework. As a result of this simulation study I will be able to identify the pros and cons of imputation of missing genetic data. Only after an optimal method to impute missing marker values has been identified will I implement imputation in the lung study data. 2. To build a risk model that simultaneously includes available susceptibility markers and epidemiologic data, including gene-environment and gene-gene interactions. I will employ the methods of multiple logistic regression and CART to model lung cancer as a function of genetic and epidemiologic variables, I will incorporate gene-environment and gene-gene interactions cited in the risk assessment literature as welt as those identified from CART into the risk model, I will explore techniques to compare the models obtained by these two methods in order to arrive at an optimal risk model.