DESCRIPTION: Cancer is a consequence of the accumulation of genetic alterations. Large whole-genome scale resequencing projects such as The Cancer Genome Atlas (TCGA) have been launched in an effort to comprehensively catalog the genomic mutations and epigenetic modifications that are associated with cancer. It is essential to identify cancer-causing genes and pathways to gain insight into the disease mechanisms and hence facilitate early diagnosis and optimal treatment. However, identifying cancer-causing genes and their functional pathways remains challenging due to the complex biological interactions and the heterogeneity of diseases. Genetic mutations in disease-causing genes can disturb signaling pathways that impact the expression of a set of genes performing certain biological functions. We refer to a set of such genes as a functional module. We hypothesize that driver mutations, that is, mutations that lead to cancer progression, are likely to affect common disease-associated functional modules, and the causal relationship between the mutations and the perturbed signals of the modules can be reconstructed from gene expression data and protein interaction data. In this project, we will develop a novel approach to infer disease-causing genes and networks by integrating information from multiple types of data including genomic variations, gene expression and protein interactions. We first dynamically identify disease-associated modules that consist of a set of interacting genes, then develop a Bayesian-based approach to infer causative genes from the disease-associated modules. Then, by developing a stochastic search based method, we can determine the paths connecting causative genes and gene modules. As a result, disease- related pathways are inferred from the paths. Furthermore, we will integrate those pathways with the human interactome to discover higher-level disease-associated networks. In addition, we will develop machine learning based classifiers to predict disease types and clinical outcomes utilizing the molecular signatures identified in this project, such as differentially expressed gene modules and causative genes. Our computational framework and classifiers will be made available to the research community via a webserver. The PI serves as the university bioinformatics program director and has extensive teaching and research experience. A goal of this project is also to provide scientific research training to students and o help students to gain biological insight through their involvement with the project. Students will learn practical scientific computing skills from the PI and develop their own computational approaches to solving specific biomedical problems under the guidance of the PI. Thus the project will serve as an effective learning-research model in bioinformatics.