Over the past year, we have been active in: (1) developing computationally efficient methods and algorithms to solve known problems in the analysis of biomedical and clinical data and study complex interactions in biological systems; (2) developing knowledge-based data management systems for the discovery and curation of biomedical knowledge, including distributed annotation systems and clinical information management systems; (3) applying predictive-analytic models to scientific and administrative domains; and (4) consulting with NIH leadership to provide evidence-based solutions to improve the grant application and review process. Specifically, in 2017, collaborative efforts in support of these goals included the following: -In a partnership with Dr. John Tsang of the NIAID Laboratory of Systems Biology, HPCIO is conducting a multifaceted project to profile the immune system using the latest high-throughput, multiplexed technologies and systems approaches. One of the goals of this collaboration is to develop novel computational methodologies that can exploit inter-subject heterogeneity and measurements at various scales to assess the roles of the immune system in health and disease. We have collected samples from a large cohort of patients with immune-mediated monogenic diseases and are the in process of deeply phenotyping blood samples of these patients. By studying the immune system of multiple monogenic, immune-mediated diseases, we will have the opportunities to infer cellular and molecular networks of the human immune system. HPCIO is actively involved in the development of a database to record clinical information of patient visits and in the bioinformatics analyses of data generated from the project. - HPCIO is working with NCI Occupational & Environmental Epidemiology Branch to develop methodologies to incorporate occupational risk factors into epidemiological models. We are enlarging the training data to improve our novel classifiers for coding free text job descriptions into the 840 codes of the 2010 U.S. Standard Occupational Classification System. Agreement between our classification system and expert coders is measured using SOC code agreement and exposure prediction from CANJEM, a job-exposure matrix of over 250 exposure agents developed by Jerome Lavoue at the University of Montreal. We are also working with NCI to develop a two-stage mixed generalized linear model to predict lifetime occupation exposures to lead. - In collaboration with the Membrane Transport Biophysics Section of NINDS, HPCIO is 1) developing a computational tool to accurately identify the boundaries of the lysosomes in fluorescence microscopy and 2) using the fluorescence ration to measure lysosomal pH within each organelle for better understanding of the lysosomal pH regulation. - A freely available plasmid database that is inter-operable with popular freeware is currently being developed for the NIDA Optogenetics and Transgenic Technology Core. The Plasmid Manager offers a versatile yet simple platform for scientists to store and analyze their plasmid data. Motivated by the need for a more comprehensive approach to archiving plasmid data, the database platform is enriched with numerous components beyond the repository, serving as an informatics platform designed to enhance the efficiency and analytic capabilities of scientists. - As high-throughput next-generation sequencing (NGS) technology plays an important role in systematically identifying novel cancer driver mutations in genome-wide surveys, NGS data generation is rapidly increasing, currently accumulating at a rate of several terabytes of data every month at the Lymphoid Malignancies Section of NCI. In collaboration with the Louis Staudt Laboratory, a bioinformatics website is being developed containing useful tools for the analysis of the laboratory's Diffuse Large B-Cell Lymphoma data. This website enables users with very little computer expertise to run their own analyses, as opposed to having a specialist run the analyses for them. Methodologies in parallelization and text searching have also been incorporated for returning the analysis results much more quickly and efficiently than before. In 2017, a new dimension to this collaboration was the development of machine learning methods to identify somatic from germline mutations from NGS sequencing data. Machine learning models have also been tested to identify subtypes of diffuse large B-cell lymphoma, based on their features of gene aberrations. - In collaboration with NIA and NCI, we are applying machine learning and visualization techniques on large biological datasets to discover novel patterns of functional gene or protein interactions as related to aging. In this collaboration, we are developing a machine learning method that models the temporal nature of the longitudinal clinical data to predict the progression of Amyotrophic lateral sclerosis. Such machine learning method may also work well in prediction of high-dimensional time-series genomic data. - The Human Salivary Proteome Wiki is a community-driven Web portal developed by HPCIO, in collaboration with NIDCR, to enable scientists to add their own research data, share results, and discover new knowledge. Many features and external contents have been incorporated over the last few years to make it easier for users to extract different kinds of information from the wiki. One of the latest enhancements is the integration of RNA-seq transcriptional and protein immunohistochemistry data from the Human Protein Atlas. This affords users the ability to weigh evidence generated by different, independent modalities, in addition to the original mass-spectrometry-based data, to assess the status of a protein. - In collaboration with CSR, HPCIO is applying text analytics to provide CSR leadership with evidence-based decision support in evaluation of the grant review process. A Web-based automated referral tool, called ART, was developed and deployed to help PIs and SROs to identify the most relevant study section(s) or special emphasis panel(s) based on the scientific content of an application. In addition, HPCIO is analyzing text from quick feedback surveys on peer review. HPCIO has developed a system to capture the sentiment of reviewer comments in quick feedback surveys and classify these comments with sentiment score into broad categories. Progress has been made to identify needs and suggestions offered by the reviewers and to assign topic labels for these needs and suggestions. In 2017, HPCIO began to explore appropriate topological network mapping diagrams of CSR study sections, superimposed with measures of scientific productivity for those study sections. - In collaboration with the Office of Data Analysis Tools and Systems, NIH Office of Extramural Research, HPCIO has been developing a standard database update pipeline for NIH Topic Maps, originally developed by Dr. Ned Talley of NINDS. This effort was concluded in 2017. - In collaboration with NIAID, HPCIO has supported its release HT JoinSolver(R), a new application capable of analyzing V(D)J recombination in thousands of immunoglobulin gene sequences produced by high throughput sequencing.