Abstract With the advent of new DNA sequencing technologies, genomic data has moved to the forefront in the quest for individualized medicine. However, the lack of new computational tools to interpret and exploit the deluge of genomic data has severely hampered its clinical benefit. A further complication is that clinical genomic data is often obtained from tissues containing a mixture of different cell types. For example, tumor samples are a mixture of tumor subclones and normal cell lineages. As a platform for understanding this heterogeneity in tumor cell populations, our studies will focus on DNA methylation changes in diffuse large B-cell lymphoma (DLBCL), the most prevalent, aggressive form of Non-Hodgkin Lymphoma (NHL). DNA methylation is cell type- specific, likely contributes to the phenotypic variation in tumor subclones, and plays key roles in gene regulation and lymphoma pathogenesis. Defining the methylation and transcriptional profiles of tumor subclones is necessary to identify potential drug targets, predict disease progression, and identify overall prognosis. Further, current computational approaches to understand epigenetic heterogeneity are not well- suited for tumor analysis and do not link epigenetic changes to expression. Here, we propose to develop a new computational approach that uses genomic methylation data from bulk samples to define DNA methylation- and expression-based subclonal populations and relate these populations to disease progression. Our approach exploits the fact that methylation levels are an accurate measure of the underlying distribution of allelic methylation states in the tumor (e.g. 30% methylation means 30% of the alleles are methylated). Our approach takes genome-wide methylation data as input, and outputs the number of subpopulations, their respective proportions, and each underlying subpopulation?s DNA methylation profile. After deconvolution, we will extend our previous work showing that DNA methylation profiles are predictive of expression levels, and apply tools from machine learning to estimate the relative expression profiles in each subpopulation. This approach will allow us to take advantage of the vast amount of data currently being collected in large-scale sequencing projects and obtain genome-wide methylation information from cellular subpopulations even when surface markers cannot be used to physically separate the different subpopulations. Lastly, we will assess the clinical relevance of these approaches by using them to understand the molecular changes found in tumor subpopulations from samples of relapsed DLBCL. We envision these methods can be applied as investigative tools first, and later as cost-effective clinical tools, to analyze the epigenetic and transcriptomic heterogeneity in different tumor types and diseased tissues.