CORE D: PROJECT SUMMARY The main goal of this core is to develop computational procedures for the analysis of high throughput data from B cell and T cell repertoires including: high throughput sequencing RNA-seq and flow cytometry data, and apply them to study the tissue specific data generated in the proposed projects. T cell and B cell repertoire sequencing of receptor genes provides information about clonal lineage and tissue-specific expansion of T / B cell populations, which is a key component to test the hypotheses in project 1, 2 and 4. RNA-seq is a powerful approach to profile gene expression and alternative splicing, which are important for studying the specific states of lymphocytes and local environment of different tissues, and will be applied extensively in project 1, 2 and 3. For all projects a streamlined procedure to analyze large-scale multidimensional flow cytometry data is crucial so we can separate the different immune cell populations we wish to study precisely. , We have three specific service aims in this core: (1) Establish and apply computational approaches to analyze RNA-seq data to find signatures of expressions that distinguish cell linages and tissues. We have a mature analytical pipeline for RNA- seq data at Columbia Genome Center Next-Generation Sequencing Laboratory. The field is in active development; newer methods are being published. For this part of the core, we will assess the performance of new and existing methods, and optimize the procedure for finding expression signatures that define local environment in different tissues and immune cell states. We will perform the computational analysis for project 1 through 3. (2) Establish and apply computational approaches to analyze T and B cell receptor repertoire sequencing data. We have published immuneDB an in-house bioinformatics pipeline to analyze massive account of TCR and BCR repertoire sequencing data from Illumina HiSeq or MiSeq platforms. For this part of core, we will continue to develop analytical methods for characterizing repertoire diversity and comparing of repertoire of different tissues across individuals. We will then perform the computational and mathematical analysis of TCR and BCR repertoires for project 1, 2 and 4. (3) Create novel tools for Data integration, visualization, and management of high-throughput sequencing data. We will solve issues of scale regarding data integration, annotation and analysis. More specifically we will combine TCR/BCR and RNA-Seq data to answer clone- specific transcription programs. Utilizing novel visualizations to associate sequence repertoire (BCR/TCR) and gene expression repertoire data we will link relevant clonal information to related gene expression data in our other RNA-Seq/ ATAC-Seq data repositories.