Project Summary/Abstract Genetic heterogeneity is a common feature of many diseases, with different causal variants, or mutations, present in different individuals with the disease. Genetic heterogeneity complicates the identification of the genetic basis of disease, as any modest sized study will contain individuals with different causal genetic variants. One reason for this heterogeneity is that causal variants are present in groups of genes that interact in various cellular signaling and regulatory pathways. Genetic heterogeneity demands the testing of combinations of variants, rather than individual variants, for association with a disease. However, while individual variants can be tested exhaustively for association, combinations of variants cannot, as there are too many combinations to test, and the number of samples required for statistical significance would be astronomical. We propose to develop new computational and statistical approaches to identify combinations of variants that are associated with a disease. In contrast to existing approaches, we do not restrict attention to known pathways or groups of genes a priori. Rather, our algorithms utilize genome-scale interaction networks and combinational/statistical constraints to identify combinations of variants and rigorously assess their statistical significance. Further, we extend these approaches to find associations between combinations of variants and various clinical parameters such as survival time or response to treatment. We will apply these techniques to cancer genome sequencing projects including The Cancer Genome Atlas (TCGA), in collaboration with several biomedical research groups. Successful completion of the proposed research will facilitate the study of genetically heterogeneous diseases ? and in particular cancer ? using only a modest number of samples that is attainable with present DNA sequencing technologies.