COMPUTATIONAL SHARED RESEARCH CORE Project Summary: This Center will dissect the clonal heterogeneity of tumors by profiling their genotypes using whole genome sequencing, whole exome sequencing, RNA sequencing, and single cell RNA sequencing; and also by developing software for the analysis of their genotypes and phenotypes. Because both scientific projects will require overlapping sets of data and software, unifying their management will provide efficiencies and enable a greater ability to implement frameworks that enforce best practices. Therefore, we will create a Computational Shared Research Core to enhance synergy among projects by managing the storage, analysis, and dissemination of data. It will do this in a manner that is reproducible, secure, and maintains patient privacy. It will also develop, archive, and disseminate novel computational tools based on Bayesian statistics, mathematics, and computer science. As we developed this Core, we agreed upon the following principles that will guide its missing. 1) The integrity, security, and privacy of the data is paramount. 2) Ensuring the reproducibility of research requires an active effort at all steps of the research process. 3) Centralization of preprocessing and standardized analyses can accelerate research. And 4) public dissemination of data and code in a manner that is consistent with or exceeds community standards is necessary to promote the progress of research. These values are encoded in the five aims and operating procedures that we have established. The specific aims for the core are: 1) Data management, storage, and distribution. We will maintain and distribute the raw and processed genomic data generated by the cores and projects. 2) Data preprocessing. We will standardize the procedures to check the quality of and preprocess the raw data (DNA-Seq, RNA-Seq, and single cell RNA-Seq) into meaningful measures that can be analyzed in the projects. 3) Development and application of novel computational methods. Our Core will develop novel tools that will be applied to both projects. We will develop SNIPER (Structural Network Integrative PhEnotypeR), based on a Bayesian structural network modeling approach, to integrate gene- and pathway-level information into phenotypic signatures. We can then model the evolution of these phenotypes using Cancer IPM, a differential equation model based on evolutionary theory. Finally, we will apply and optimize SuperSeeker, which interrogates tumor subclone structure evolution and phylogeny. 4) Pipeline development. To process and analyze the data, we will use an expert system we developed, BETSY, that creates pipelines. This both accelerates research and promotes reproducibility by automating the analysis of the data. Further, BETSY automatically documents each analysis in detail. 5) Standardization of computational environment. We will create and distribute standardized computing environments in Docker containers.