Project Summary High-throughput, short read DNA sequencing technologies have transformed sequence data production, necessitating new hardware and software approaches to sequence data analysis. The cost of producing sequence data has decreased 1000-fold in the past 5 years, and the volume of data produced has increased by the same proportion. Currently, the Baylor College of Medicine Human Genome Sequencing Center (BCM- HGSC) is generating immense amounts of data from mammalian-scale whole-genome sequencing, large environmental DNA sampling projects, and high-throughput targeted and whole genome human medical resequencing. Processing, analyzing and manipulating these large data sets, alone or in combination, requires a large resource of computational power. To address the resulting computational demands for processing these data, and making the results rapidly available to collaborators and the wider community, we require a commodity Massive Ram Genomic Analysis Cluster (MRGAC). This cluster would consist of 4 servers, each with 512 GB of Random Access Memory (RAM). This computational resource will accelerate essential data processing where memory is a bottleneck by holding entire multi-billion read datasets in RAM. More importantly, it will enable novel forms of analysis previously unachievable due to memory constraints. The anticipated major users of the MRGAC include three NIH-funded HGSC faculty, Drs. Chen, Gibbs, and Rogers, plus additional faculty and research staff in the BCM-HGSC. Immediate benefits will also be realized by the large number of HGSC research collaborators at BCM and other centers in the Texas Medical Center. Dr. Chen will make use of HGSC-customized applications that maximize the utility of a large-memory system, and will evaluate the utility of these approaches for other groups in the research community, specifically in whole transcriptome analysis and copy number variation. Dr. Gibbs will use the software for whole genome and targeted medical resequencing, for functional variant discovery, de novo genome assembly, and genome-wide variant analysis in large populations. Dr. Rogers's research is focused on comparative primate genomics and the application of this genomic information to studies of primate models of human disease. The differences discovered between human and other primates, using large scale comparative genomics, have significant functional consequences and impact for disease. Within the BCM-HGSC, other faculty members analyze genomic and epigenomic variation in human disease and cancer. Further, collaborative groups at the HGSC undertake whole genome mammalian and large-scale insect assemblies using short read data. Uniquely, the HGSC heavily utilizes all three major sequencing machines and is in an ideal position to use these technologies in a complementary fashion. In all cases, the large RAM resource provided by the MRGAC will be both enabling and transformational in our ability to rapidly analyze massive amounts of data, and produce process improvements that will enable stronger research for collaborators and the broader community. PUBLIC HEALTH RELEVANCE: Recent advances in technology have placed DNA sequencing at the forefront of a revolution in our understanding of the root causes of genetic diseases, susceptibility to infectious disease, and our ability to target individualized treatments for cancer and other illnesses. As the cost of producing data for this massive undertaking becomes ever cheaper, enabling the Human Genome Sequencing Center to process and analyze this data for community projects and collaborators becomes paramount.