[unreadable] One of the challenges in developing a risk assessment model for cancer is the occurrence of missing data. Lack of data, specifically genotypic data, can reduce power to detect important risk factors as well as reduce the precision to estimate their effects. The goal of this project is to develop a quantitative method to handle missing genotypic data in order to construct robust risk models to characterize the most important genetic and environmental risk factors for lung cancer. The proposal is built upon an existing resource (epidemiological and molecular data) from an ongoing case-control study (R01 CA55679) under the direction of Dr. Margaret Spitz. This study currently includes a panel of markers of susceptibility (including in vitro bleomycin-induced chromatid breaks, DNA repair capacity, and several metabolic polymorphisms) for over 3700, mostly Caucasian, lung cancer cases and controls, matched on sex, age, ethnicity and smoking status. Few previous case-control studies of lung cancer have had the sample size and range of markers that are available for this proposal. The specific aim of this proposal is to investigate the effect of missing genotypic data on risk modeling and apply methodology to deal with missing data in the assessment of risk for cancer. Approaches to handle missing data will be investigated with a simulation study to evaluate the validity of imputation methods in a case control framework and the pros and cons of imputation of missing genetic data. After an optimal method to impute missing marker values has been identified, I will implement imputation in the lung study data in order to facilitate the development of a risk assessment model for lung cancer. [unreadable] [unreadable]