PROJECT SUMMARY We propose to build a cloud-based integrated solution for scalable, customizable, privacy-preserving, and interactive antibody repertoire analysis. Immune repertoire sequencing (IR-seq) has become a useful tool in both basic research and clinical settings. As the heart of the adaptive immunity to infection and many vaccines, the abundance and diversity composition of the B cell receptor (BCR) and its dynamic changes in health and diseases bear information of how to evaluate immune health, perform disease diagnosis and prognosis, and measure vaccination effect. However, there is a computational bottleneck for large scale antibody lineage construction, a lack of decomposable pipeline modules that preserve privacy and ownership, and a missing gap for interactive linage analysis and visualization. In this Phase I grant, we will (1) break the bottlenecks of pipeline processing and scale up the core algorithms to handle large sequence data sets; (2) protect private data and proprietary processing algorithms with modularized pipeline, integrated cloud-local processing, and data perturbation methods; 3) develop end-to-end web services for pipeline composition and interactive analysis visualization in a cloud-based deployment solution. Existing commercial efforts are mostly focusing on cancer related IR-seq analysis aiming to trace the disappearing of cancer cells after therapy, which solely focus on cataloging sequence species and abundance. This kind of analysis is much simpler and easier, compared to analyzing IR-seq data in infection and vaccination. Providing insights on host immune responses is a much more challenging but much needed task. Once the pipeline is built, it can be readily adapted to analyze cancer IR-seq data. Also, existing algorithms and optimizations that have been developed for other big data analysis can be further developed and applied to the IR-seq data analysis. We will use a publically available BCR repertoire data on an influenza vaccination cohort and a TCR repertoire data on an aging cohort to test the feasibility of the project. The long term goal of this proposal is to build cloud based accessible and customizable services for experts as well as non-specialists. We aim to provide an integrated solution for the ingestion, processing, analysis, exploration and visualization, interpretation and sharing of data generated by deep sequencing of full length antibody and TCR repertoire. The success of this Phase I SBIR will provide a solid foundation for the product launch of a commercial cloud based solution in Phase II, during which we will continue our investigation on big data security and privacy to facilitate compliance with institutional policies and design and develop programmable APIs and tools to facilitate integrations with more third party modules and services.