ABSTRACT In this application we propose to study clonal heterogeneity at the single cell level using transcriptomic and mutational data. However, this approach must be accompanied by the development of effective mathematical and computational techniques to visualize and extract meaningful biological and clinical information. Single cell approaches ? and indeed all high-throughput biological data, whether generated through sequencing, proteomics, or imaging ? generate large, multivariate collections of data and therefore require methods for identifying the most biologically and clinical relevant features. The Mathematical Core aims to extract robust patterns in large-scale genomic data, using a set of recently developed techniques in algebraic topology, evolutionary moduli spaces and causal networks. Our group is pioneering the development and large-scale genomic application of topological data analysis (TDA), a new mathematical framework for capturing global structural properties of large data sets that is particularly well suited for high-dimensional, high-throughput biological data. This core aims to extend this work and to develop and implement mathematical approaches to tackle the inherent complexity and biological interpretability of single cell approaches to cancer evolution and heterogeneity. In order to develop an appropriately expressive language, we will rely on ideas from topology (phylogenetic moduli spaces and TDA), modal logic, model-checking algorithms, and a probabilistic theory of causality. These tools complement each other and will be further developed in the context of single cell data and applied in the specific research projects of this proposal. In particular, the proposed methods will be calibrated and evaluated in Project 1 and tested using single cell data in primary tumors in Project 2 and in the context of therapy in Project 3.