Abstract - Core C (Computational Core) The DNA and RNA next generation sequencing and experimental data generated by Projects 1 ? 3 require trained bioinformaticians to process the data and conduct the downstream analysis to help the project investigators interpret the results and identify target genes for validation. Over the past years, the Computational Core has developed an infrastructure with powerful computing clusters, storage space, and well-established software to conduct the computations required. We have also been working closely with Project 1 and Core B (Preclinical Therapeutics Core) investigators to develop and improve the algorithms for tracking molecular barcodes in vivo in order to to accurately evaluate genes for their tumor initiating potentials and to identify targets for clinical treatments of pancreatic ductal adenocarcinoma. The following three specific aims are proposed: 1. One significant contribution the Computational Core can make is to conduct integrative analysis leveraging readily available MDACC internal patients and external public data sources to seek for answers, formulate hypothesis, and validate the experimental results with dedicated computational biologists. We will mine the private and public databases to provide information needed to guide the experiments and the validation of the experimental results. Databases or archives containing data collected from TCGA, CCLE, COSMIC, and MD Anderson Cancer Center's patient data repositories have been built with interfaces and applications to facilitate exploration, and visualization have been developed. Data mining results will be communicated with project investigators. 2. We will provide P01 investigators with statistical analysis of RNASeq gene expression data to identify differentially expressed genes between genomic groups, disease subtypes, or treatment conditions to identify biomarkers. A collection of publically available and internally developed tools have been configured to run on the computing cluster that can be selected based on the nature of the experiment and data structure. We will also further improve the algorithm we have developed for in vivo screening projects to identify genes as potential targets of treatments. 3. We will process the next generation sequencing data following established standards and implement quality control measures at each step to produce high quality data that can be used for downstream analysis to identify target genes with clinical implications. A production level infrastructure has been established to ensure the achievement of this goal.