The Collective Intelligence, Knowledge Infrastructure, and High Performance Program, which operates within the High Performance Computing and Informatics Office (HPCIO), Division of Computational Bioscience of CIT, is collaborating with NIH investigators to build a critical mass in collective intelligence that is envisioned to encompass a number of pertinent and related disciplines in biomedical research including semantic interoperability, knowledge engineering, computational linguistics, text and data mining, natural language processing, machine learning, and visualization. The program is intended to foster advances in critical domains at NIH including biomedical and clinical informatics, translational research, proteomics research, genomics, systems biology, and portfolio analysis. In 2009, collaborations in support of these goals included the following. - The human salivary protein catalog has been made available online on a community-based Web portal developed by HPCIO, in collaboration with NIDCR, to enable scientists to add their own research data, share results, and discover new knowledge. This is a major step towards the discovery and use of saliva biomarkers to diagnose oral and systemic diseases. - HPCIO has developed context-sensitive text-mining methodology for identifying High-Risk, High-Reward (HRHR) research based from NIH Summary Statements. The method, which uses natural language processing to parse text and classify documents, has been successful in retrospective analysis of the most recent five-years summaries. This work is being conducted in collaboration with Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI), NIH Office of the Director (OD). - HPCIO is developing a corpus of annotated NIH medical records for use in developing methods of document de-identification. The goal is to create a gold standard that de-identification algorithms for use at NIH can be measured against. This work, in collaboration with the Clinical Center, will enhance the availability of medical records stored within the Biomedical Translational Research Information System. - HPCIO is working with the caBIG Clinical Trial Management System Workspace (in collaboration with NCI) to develop a Protocol Lifecycle Tracking (PLT) tool. By providing real-time protocol status information on all relevant trials to clinicians and researchers, bottlenecks and latencies in protocol management can be identified and corrected by those responsible for conduct of a trail and the overall success of a clinical trial program. - As a component of the Molecular Libraries Roadmap imitative, the Common Assay Reporting System (CARS) allows investigators and program directors to track the status of assay projects related information at each screening center within the Molecular Libraries Program Center Network (MLPCN). The system also provides a means for collecting, sharing and retrieving of bioassay information among the centers and program office at NIH. - HPCIO is initiating a pilot endeavor with NCI to develop deep knowledge bases representing NCIs scientific portfolio. The pilot will explore several different representation paradigms (which store not only scientific concepts but the relationships between concepts as well) to evaluate their effectiveness at various tasks including document categorization, clustering, and visualization. A similar collaboration has recently been initiated with DPCPSI/OD.