Developing graph models and efficient algorithms for the study of cancer disease mechanisms Abstract: Cancers are driven by inheritable genetic/epigenetic changes, including somatic mutations and copy number variations (CNV). Genetic changes perturb cellular signaling pathways through the following mechanisms: 1) by changing the structures (and therefore the functions) of signaling proteins, through somatic mutations; and 2) by changing the quantity of proteins involved in signaling pathways (e.g., increasing expression of an enzyme producing a certain signaling molecule), through CNV and DNA methylation. Each signaling pathway regulates the expression of a set of genes that usually perform certain functions together, such as handling cell proliferation or death. We call this set of genes a response-module. If a pathway is perturbed, then expressions of genes regulated by the pathway will change accordingly, further altering the behavior of cells and turning normal cells into cancerous ones. One important objective of cancer disease mechanism research is to understand which genetic change is responsible for the perturbation of which signaling pathway. Currently, large-scale studies, such as the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), have detected genetic changes (e.g., somatic mutation and CNA) and expression changes (e.g., RNA expression and protein expression) in tens of thousands of tumors. These data provide an unprecedented opportunity to study cancer disease mechanisms and to investigate the heterogeneity of common cancers. However, the scale of the data also poses significant challenges in computational methodology development, particularly because many bioinformatics problems belong to a class of problems referred to as NP-hard problems. The PI of this transitional K99/R00 proposal has extensive experience in developing algorithms addressing this type of computational problem. The major goal of this proposal is to provide sufficient training for the PI to gain biological insight so that he can develop efficient algorithms to enhance computational cancer research. In addition to the PI receiving formal didactic and out-of-class training in cancer biology, this project will also develop specific computational algorithms and tools to study cancer disease mechanisms using the TCGA data. More specifically, the project proposes two aims. AIM 1 is to develop graph models and design efficient algorithms that are capable of revealing perturbed signaling pathways by combining multiple types of omics data. AIM 2 is to study the impact of pathway perturbations on cancer development and clinical outcome The specific aims proposed will address the following major issues or challenges: 1) Given a set of genes that have been differently expressed as the result of perturbations of multiple signaling pathways in tumors, how do we group them into units in such a way that the genes in each unit are likely to have been regulated by one common signal?; and 2) Given the dozens or hundreds of somatic mutations or CNAs in a tumor, how do we recognize the small portion that is likely to perturb cancer pathways, as well as trace perturbation sources, i.e., somatic mutations or CNAs in the tumor, to those cancer pathways? By applying the tools developed in this project to different types of cancer data from TCGA or other resources, we will have a better understanding of disease mechanisms for different cancers.