Over the past year, we have been active in: (1) developing computationally efficient methods and algorithms to solve known problems in the analysis of biomedical and clinical data and study complex interactions in biological systems; (2) developing knowledge-based data management systems for the discovery and curation of biomedical knowledge, including distributed annotation systems and clinical information management systems; (3) applying predictive-analytic models to scientific and administrative domains; and (4) consulting with NIH leadership to provide evidence-based solutions to improve the grant application and review process. Specifically, in 2016, collaborative efforts in support of these goals included the following: - In a partnership with Dr. John Tsang of the NIAID Laboratory of Systems Biology, HPCIO is conducting a multifaceted project to profile the immune system using the latest high-throughput, multiplexed technologies and systems approaches. One of the goals of this collaboration is to develop novel computational methodologies that can exploit inter-subject heterogeneity and measurements at various scales to assess the roles of the immune system in health and disease. In a cohort of more than 1,000 patients who went through gastric bypass surgeries, we have developed an approach to deconvolve individual populations of immune cell subsets in complex tissues based on tissue gene expression data. Associations between the immune cell populations and clinical traits were measured to understand the roles these immune cells play in response to obesity-driven signals. In another aspect of the collaboration, we have compared the chromatin accessibility of fresh and cryopreserved samples of four immune cell types from health donors. The results from ATAC-seq analyses indicate subtle but interesting changes after cells have been frozen, and may have implication to the practicality of using cryopreserved samples in assessing the regulatory landscape of gene expression in the cells. - HPCIO is working with NCI Occupational & Environmental Epidemiology Branch to develop methodologies to incorporate occupational risk factors into epidemiological models. We are enlarging the training data to improve our novel classifiers for coding free text job descriptions into the 840 codes of the 2010 U.S. Standard Occupational Classification System. Agreement between our classification system and expert coders is measured using SOC code agreement and exposure prediction from CANJEM, a job-exposure matrix of over 250 exposure agents developed by Jerome Lavoue at the University of Montreal. We are also working with NCI to develop a two-stage mixed generalized linear model to predict lifetime occupation exposures to lead. - In collaboration with the Membrane Transport Biophysics Section of NINDS, HPCIO is 1) developing a computational tool to accurately identify the boundaries of the lysosomes in fluorescence microscopy and 2) using the fluorescence ration to measure lysosomal pH within each organelle for better understanding of the lysosomal pH regulation. - A freely available plasmid database that is inter-operable with popular freeware is currently being developed for the NIDA Optogenetics and Transgenic Technology Core. The Plasmid Manager offers a versatile yet simple platform for scientists to store and analyze their plasmid data. Motivated by the need for a more comprehensive approach to archiving plasmid data, the database platform is enriched with numerous components beyond the repository, serving as an informatics platform designed to enhance the efficiency and analytic capabilities of scientists. - As high-throughput next-generation sequencing (NGS) technology plays an important role in systematically identifying novel cancer driver mutations in genome-wide surveys, NGS data generation is rapidly increasing, currently accumulating at a rate of several terabytes of data every month at the Lymphoid Malignancies Section of NCI. In collaboration with the Louis Staudt Laboratory, a bioinformatics website is being developed containing useful tools for the analysis of the laboratorys Diffuse Large B-Cell Lymphoma data. This website enables users with very little computer expertise to run their own analyses, as opposed to having a specialist run the analyses for them. Methodologies in parallelization and text searching have also been incorporated for returning the analysis results much more quickly and efficiently than before. - In collaboration with NIA and NCI, we are applying machine learning and visualization techniques on large biological datasets to discover novel patterns of functional gene or protein interactions as related to aging. In this collaboration, we are developing a machine learning method that models the temporal nature of the longitudinal clinical data to predict the progression of Amyotrophic lateral sclerosis. Such machine learning method may also work well in prediction of high-dimensional time-series genomic data. - The Human Salivary Proteome Wiki is a community-driven Web portal developed by HPCIO, in collaboration with NIDCR, to enable scientists to add their own research data, share results, and discover new knowledge. Many features and external contents have been incorporated over the last few years to make it easier for users to extract different kinds of information from the wiki. One of the latest enhancements is the integration of RNA-seq transcriptional and protein immunohistochemistry data from the Human Protein Atlas. This affords users the ability to weigh evidence generated by different, independent modalities, in addition to the original mass-spectrometry-based data, to assess the status of a protein. - In collaboration with CSR, HPCIO is applying text analytics to provide CSR leadership with evidence-based decision support in evaluation of the grant review process. A Web-based automated referral tool, called ART, is being developed to help PIs and SROs to identify the most relevant study section(s) or special emphasis panel(s) based on the scientific content of an application. In addition, HPCIO is analyzing text from quick feedback surveys on peer review. This effort includes evaluating a pilot study to evaluate the feasibility of analyzing free text from peer reviewers on their perception of the study section quality. If successful, the pilot results will be used to as initial input for a full-scale implementation. - In collaboration with the Office of Data Analysis Tools and Systems, NIH Office of Extramural Research, HPCIO has been developing a standard database update pipeline for NIH Topic Maps, originally developed by Dr. Ned Talley of NINDS. We are preparing this pipeline for incorporation into a stable hosted instance and hope to go live in late 2016. - In collaboration with NIAID, HPCIO has released HT JoinSolver(R), a new application capable of analyzing V(D)J recombination in thousands of immunoglobulin gene sequences produced by high throughput sequencing. - Based on its experience in building novel models for classifying research grants and projects, HPCIO has collaborated with several ICs to develop the Portfolio Learning Tool, a comprehensive classification workflow system that will allow users to select from multiple classification algorithms, feature spaces, and training regimes, to build and run their own classifiers. HPCIO has developed an augmented support vector machine (SVM) that augments a training set by sampling from a corpus of unknowns and runs a large ensemble on various samples of this augmented space. The results obtained from this classifier suggest that, when coupled with an effective annotation strategy, such a classifier can be quite effective at categorizing a research portfolio.