A Data-driven Pan-cancer Study of Biological Bases of Cancer Health Disparities Project Abstract To date, significant progress has been made in our understanding of the role of socioeconomic factors in cancer racial disparities. Increasing evidence now suggests that a number of intrinsic molecular factors specific to malignant cells must also partly account for the observed health inequalities. Although research has begun to explore the biological basis of cancer disparities, most existing work is limited to several common cancer types and does not methodically explore whether the observed genetic and molecular differences represent the clinically-meaningful racial disparities in other fatal human cancers. Moreover, massive amounts of multi-faceted omics data generated by high-throughput technologies have not been fully utilized and well integrated with clinical data to search for race-specific molecular characteristics, biomarkers or potential drug targets. The goal of this RCMI research project is therefore to address these significant limitations by performing an in-depth, data-driven, pan-cancer study to investigate the cancer-specific mutome, epigenome, and RNA-Seq transcriptome differences in different racial groups. The proposed study will focus on the eight TCGA cancer types, with pertinent cancer data from other sources (E.g. dbGaP, GEO, ICGC, etc.) being systematically utilized for methodology development and/or empirical validation throughout the entire project. For a specific cancer, in connection with clinical data, we will develop new bioinformatics algorithms and pipelines to analyze these multiple types of omics data individually and collectively. As such, we will establish a pan-cancer, race-relevant assemblage of single- and multi- level coherent genes, modules and biological pathways, some of which will hold significance and promise for clinical use. This will provide large-scale direct molecular-level evidence for the biological mechanism underlying racial disparities in cancer, which is practically impossible using the approaches of in vitro, in vivo and/or population follow-up. Furthermore, we will biologically validate the identified signatures for prostate cancer using clinical samples. A database for all pinpointed signatures will be constructed so that cancer disparity researchers can interrogate how various levels of molecular variations may alter gene functions in different cancers and races. A set of efficient and powerful analytical tools for the proposed data-driven analyses of health disparities in cancer will also be made publicly available as open source software. We anticipate that this project will have a large and sustained impact that will enable us 1) to better understand the mechanisms underlying the most-studied disparities and to predict understudied disparities across races for various cancer types; 2) to search for race-specific sets of biomarkers (working through the causal mechanisms) and potential drug targets; and 3) to ultimately contribute to reducing and eventually eliminating health disparities in personalized cancer prevention and treatment.