The primary objective of this proposal is to develop statistical models for the analysis of DNA methylation data. Questions that are currently posed for gene expression profiles, such as how to classify samples based on cell-wide gene expression patterns or how to classify genes with unknown function, based on their expression patterns across samples, may be similarly posed for DNA methylation patterns. Our focus is on the discovery of new subgroups based on patterns of DNA methylation. We view DNA methylation patterns as an intermediate variable along a pathway from exposures to outcomes and take a flexible hierarchical modeling approach that can incorporate measured and unmeasured covariates. Our aims are motivated by ongoing and planned future studies in the Department of Preventive Medicine and Norris Cancer Center at the University of Southern California. Specifically, we propose to: 1. Develop model-based class discovery methods (cluster analysis/unsupervised learning approaches) using quantitative measures of DNA methylation, adjusting for locus-specific and sample-specific covariate effects. a. Analysis of paired samples of tumor tissue and normal tissue taken from the margin of the tumor. b. Extend model to allow for multiple tumors per subject. c. Extend model to a two-dimensional cluster analysis of samples and loci. 2. Develop models for characterizing methylation patterns as an intermediate variable by modeling a) the association between risk factors and methylation pattern clusters, and b) the association between methylation pattern clusters and outcome. 3. Apply methods to studies of DNA methylation in colorectacl adenomas and lung cancer.