Over the past year, we have been active in: (1) developing computationally efficient methods and algorithms to solve known problems in the analysis of biomedical and clinical data and study complex interactions in biological systems; (2) developing knowledge-based data management systems for the discovery and curation of biomedical knowledge, including distributed annotation systems and laboratory information management systems; (3) applying predictive-analytic models to scientific and administrative domains; and (4) consulting with NIH leadership to provide evidence-based solutions to improve the grant application and review process. Specifically, in 2018, collaborative efforts in support of these goals included the following: - In a partnership with Dr. John Tsang of the NIAID Laboratory of Systems Biology, HPCIO is conducting a multifaceted project to profile the immune system using the latest high-throughput, multiplexed technologies and systems approaches. One of the goals of this collaboration is to develop novel computational methodologies that can exploit inter-subject heterogeneity and measurements at various scales to assess the roles of the immune system in health and disease. Currently, we are developing a multi-level linear model utilizing ATAC-seq and RNA-seq data from monogenic disease patients to dissect the gene regulatory network. The results will provide insights on the impacts specific genetic lesions have on downstream genes and how they are manifested at the clinical phenotypical level. - HPCIO has entered into a wide-ranging collaboration to provide data science support and innovation to Dr. Leorey Saligan and the Symptom Management Branch of NINR. An existing effort to identify radiotherapy-related fatigue genes from a PCR assay of oxidative stress using machine-learning methods has largely been completed. In addition, a number of new efforts were initiated in 2018. To provide guidance for future scRNA experiments to identify clusters of cells and their marker genes associated with wound healing in mice, an analysis of a similar experiment deposited to the Sequence Read Archive was conducted. We also initiated a machine-learning analysis of mouse proteomic data to identify targets or pathways that may be involved in both the generation of radiation-related fatigue and rescue through taltirelin. - As a pilot for a potential future service offering across NIH, HPCIO is working with Dr. John Tsang of NIAID to develop a laboratory information management system (LIMS) that can more effectively capture data associated with a large number of assays performed at the Center for Human Immunology (CHI). The CHI-LIMS will facilitate the standardization and management of procedures and workflows for various types of high-throughput assays, to enhance tracking of relevant information in each step and to improve reproducibility of results and identification of operational issues. Modules will be developed to ensure interoperability with other NIH research information systems. We envision that this system can serve as a model and be readily customized for other labs with similar needs. - In collaboration with NINDS Spinal Circuits and Plasticity Unit, HPCIO is helping develop novel methods of analyzing single-cell RNAseq data collected from mice undergoing treatment for spinal injuries. We are working with NINDS researchers to help understand the pathways involved during spinal cord injury and healing. - HPCIO is working with NCI Occupational & Environmental Epidemiology Branch to develop methodologies to incorporate occupational risk factors into epidemiological models. We are enlarging the training data to improve our novel classifiers for coding free text job descriptions into the 840 codes of the 2010 U.S. Standard Occupational Classification System. Our classifier is being embedded within the data collection software of the NCI Agriculture Health Study, a large scale epidemiological study, to automatically code job descriptions as they are entered by study participants. - In collaboration with the Membrane Transport Biophysics Section of NINDS, HPCIO is 1) developing a computational tool to accurately identify the boundaries of the lysosomes in fluorescence microscopy and 2) using the fluorescence ration to measure lysosomal pH within each organelle for better understanding of the lysosomal pH regulation. - A freely available plasmid database that is inter-operable with popular freeware is currently being developed for the NIDA Optogenetics and Transgenic Technology Core. The Plasmid Manager offers a versatile yet simple platform for scientists to store and analyze their plasmid data. Motivated by the need for a more comprehensive approach to archiving plasmid data, the database platform is enriched with numerous components beyond the repository, serving as an informatics platform designed to enhance the efficiency and analytic capabilities of scientists. - The Human Salivary Proteome Wiki is a community-driven Web portal developed by HPCIO, in collaboration with NIDCR, to enable scientists to add their own research data, share results, and discover new knowledge. Many features and external contents have been incorporated over the last few years to make it easier for users to extract different kinds of information from the wiki. We are actively adding new data to the system to allow users to discern the origin of proteins found in saliva, whether they may come from the different salivary glands or from blood plasma, for instance. Current efforts also include working with major stakeholders to engage with the research community and to gather feedback from them. - In collaboration with CSR, HPCIO is applying text analytics to provide CSR leadership with evidence-based decision support in evaluation of the grant review process. A Web-based automated referral tool, called ART, was developed and deployed to help PIs and SROs to identify the most relevant study section(s) or special emphasis panel(s) based on the scientific content of an application. In addition, HPCIO is analyzing text from quick feedback surveys on peer review. HPCIO has developed a system to capture the sentiment of reviewer comments in quick feedback surveys and classify these comments with sentiment score into broad categories. In 2018, HPCIO continued to maintain ART and retrain the machine-learning models to reflect new and changing study sections. In support of the ARGO initiative, we represented study sections through Word2Vec and generated distance metrics as well as diagrams, allowing leadership to analyze the relationship between study sections. We developed an interactive tool that allows users to curate grant applications for measures of scientific rigor