A crucial component to the recent major advances in genomic research has been the uniting of advances in biology with those in computers, informatics and networking. As technologies have advanced allowing high throughput, Genomics scale data collection, the technological burden has shifted to analysis and informatics. This project was established to ensure that necessary computational tools and resources are available to the NIH intramural community. OIRs long-term collaboration with Dr. Louis Staudt (Distinguished Investigator, NCI Center for Cancer Research and Director, NCI Center for Cancer Genomics) has yielded significant findings and discoveries that have led to improvements in the treatment of lymphoma. By providing comprehensive computational expertise, resources, and support, Dr. Staudts lab has been able to perform sophisticated analyses on large-scale, high-dimensional data which have in turn been instrumental to achieving a number of highly significant findings. OIR provides comprehensive computational support to Dr. Staudts laboratory. This support entails maintaining databases of genomic data, providing computational servers with custom software for running a variety of analyses, and developing and maintaining public and local-access Web sites. These supported resources include the following: - LLMPP/SPECS: The Lymphoma/Leukemia Molecular Profiling Project/Strategic Partnering to Evaluate Cancer Signature (SPECS) program is a multi-institution grant for translational cancer research funded by National Cancer Institute. This website is designed for entering/managing clinical data for cases associated with samples included in the SPECS study. The LLMPP/SPECS project is using microarrays and other high throughput whole genome technologies to define the molecular profiles of all types of human lymphoid malignancies. One primary goal of this project is to redefine the classification of human lymphoid malignancies in molecular terms. A second major goal is to define molecular correlates of clinical parameters that can be used in prognosis and in the selection of appropriate therapy for these patients. As members of the international LLMPP/SPECS consortium, we provide the informatics development and support critical to the success of this project. A database and tools have been implemented to facilitate integrating and analyzing clinical parameters with genomic/genetic data from high throughput technologies. The consortium involves 12 participating centers in 7 countries. Data for 3,000 clinical cases have been uploaded into the system. - LYMPHCX: A Web site that allows researchers to predict DLBCL (Diffuse Large B-Cell Lymphoma) subtypes based on samples based on samples processed with a Nanostring protocol. Determination of these subtypes can be critical in deciding appropriate therapy since some subtypes are more aggressive than others. - VDB (Variation Database): An interactive web site for lab researchers to search and compare mutations from various tumor types. An analysis pipeline was developed and implemented for processing next generation sequence data generated from RNAseq libraries. Variation results derived from more than 500 lymphoma, pancreatic and prostrate RNA samples have been stored in a database, classified and integrated with relevant external annotations. - Signature database: A Web-site companion to Shaffer AL et al. A library of gene expression signatures to illuminate normal and pathological lymphoid biology, Immunol Rev. 2006 Apr;210:67-85. - Staudt lab analytical test bed: Web site to support quick turn-around of test analytical methods and rapidly allow lab members to more easily explore their own data with new algorithms. - Database support: OIR maintains information on more than 10 million mutations across over 3,000 clinical samples. Information on digital expression is also stored. - Machine learning: Development of machine learning methods to identify somatic from germline mutations in NGS sequencing data. Machine learning models have also been tested to identify subtypes of diffuse large B-cell lymphoma, based on their features of gene aberrations. The mAdb (microArray database, https://madb.nci.nih.gov) system provides a secure data management system for gathering, storing, and managing experimental information and expression array data. A variety of web accessible tools have been implemented to support the multiple analytical approaches needed to decipher array data in a more meaningful way. Important to the mAdb system design is compatibility with any platform (Unix, Windows or Macintosh) capable of running an Internet browser. A natural extension of mAdb has been the inclusion of additional data resources. This includes supporting information from various data sources (e.g. Gene Ontology, GenBank, Entrez Gene, UniGene, BioSystems Pathways, Biocarta Pathways, COSMIC, and 1000 Genomes) to enable drilling down into the rapidly expanding biological knowledgebase. In order to have effective use of the informational resource developed to support microArray analysis, ongoing user training and support is provided through CIT facilities for this collaborative effort. While ongoing development of new and improved analysis tools continues, the mAdb system is in routine service, having supported over 1900 NIH researchers and collaborators and containing over 111,000 microArray experiments. A critical design element for the mAdb system was to accommodate scalability to allow expansion to support other ICDs. The design allows us to support separate web servers serving different user communities from a single code base. The mAdb system has been set up on separate web servers to support users of the NIAID microArray core facility. In addition to user specific web based analysis, out group has facilitated the submission of over 7,000 samples to the NCBI Gene Expression Omnibus (GEO) public repository for required sharing of data associated with publications. Over the past year, we have developed next-generation sequencing extensions to mAdb, allowing it to accept RNA-Seq digital expression data and perform edgeR normalization and differential expression analysis. We have also developed nimble JavaScript implementations of several data visualization features. In collaboration with Dr. Timothy Meyers of NIAID, CIT/OIR also provides comprehensive computational support the Genomic Technologies Section (GTS) of NIAID. Since GTS provides state-of-the-art bioinformatics support to the entire NIAID intramural research program, we effectively support all the users of the GTS facility. In addition to maintaining GTS computational servers and databases, OIR maintains a number of commercial software packages for GTS, including Partel Flow, CLC-Bio