Project Summary The immunology database and analysis portal (ImmPort, http://immport.niaid.nih.gov) is the NIAID-funded public resource for data archive and dissemination from clinical trials and mechanistic research projects. Among the current 291 studies archived in ImmPort, 114 are focused on vaccine responses (91 for influenza vaccine responses), which is the largest category when organized by research focus. As the most effective method of preventing infectious diseases, development of the next-generation vaccines is faced with the bottleneck that traditional empirical design becomes ineffective to stimulate human protective immunity against HIV, RSV, CMV, and other recent major public health threats. This project will focus on three important aspects of informatics approaches to secondary analysis of ImmPort data for influenza vaccination research: a) expanding the data analytical capabilities of ImmPort and ImmPortGalaxy through adding innovative computational methods for user-friendly unsupervised identification of cell populations, b) processing and analyzing a subset of the existing human influenza vaccination study data in ImmPort to identify cell-based biomarkers using the new computational methods, and c) returning data analysis results with data analytical provenance to ImmPort for dissemination of derived data, software tools, as well as semantic assertions of the identified biomarkers. Each aspect is one specific research aim in the proposed work. The project outcome will not only demonstrate the utility of the ImmPort data archive but also generate a foundation for the Human Vaccine Project (HVP) to establish pilot programs for influenza vaccine research, which currently include Vanderbilt University Medical Center; University of California San Diego (UCSD); Scripps Research Institute; La Jolla Institute of Allergy and Immunology; and J. Craig Venter Institute (JCVI). Once such computational analytical workflow is established, it can be applied to the secondary analysis of other ImmPort studies as well as to support the user-driven analytics of their own cytometry data. Each of the specific aims contains innovative methods or new applications of the existing methods. The computational method for population identification in Aim 1 is a newly developed constrained data clustering method, which combines advantages of unsupervised and supervised learning. Cutting-edge machine learning approaches including random forest will be used in Aim 2 for the identification of biomarkers across study cohorts, in addition to the traditional statistical hypothesis testing. Standardized knowledge representation to be developed in Aim 3 for cell-based biomarkers is also innovative, as semantic networks with inferring and deriving capabilities can be built based on the machine-readable knowledge assertions. The proposed work, when accomplished, will foster broader collaboration between ImmPort and the existing vaccine research consortia. It will also accelerate the deployment of up-to-date informatics software tools on ImmPortGalaxy.