Public health research increasingly incorporates high-throughput biomedical data, opening up new areas for data-driven research. Recently, scientists have begun to realize the potential for modern biology to move 'beyond the genome'to look at the genome's complex interactions with the social and physical environments, focusing on disease etiology and the role of all cellular aspects in promoting health. In order to realize this potential our scientists have been moving from individual ad hoc studies to collaborative projects intended to scale across a broad range of disciplines. In the last five years, dramatic increases in the scale of environmental and health data acquisition, sequencing and assay technologies have coupled with increased decentralization of data generation resulting in a growing data management and analysis bottleneck. Our long term goal at the School of Public Health is to provide a seamless collaboratory environment in which it is possible to exploit the broad range of our expertise across shared datasets spanning investigations from the cell to the population. In order to achieve this aim we need to radically improve our existing shared computer data storage from its concentration on low volume, high stability, high cost, high performance with a user pays all costs model, to a tiered data storage model, subsidized by the institution, that is flexible enough to meet a broad range of requirements. We wish to: (a) co-locate genomic, genetic, environmental, epidemiological, social, and statistical data in a shared data environment;(b) apply consistent policies, access, user support, computing environments, workflows and user interfaces;( c) provide a scalable data storage resource at low cost to accommodate the rapid increase in sizes of genomic and cohort data. The effective management, storage and processing of this complex experimental data is therefore crucial and requires computational infrastructure capable of providing consistent storage and organization of primary data and derived results. With scalable, shared data storage, we will directly impact studies in complex diseases, host response to infectious diseases, pathogen diversity, nutrition, and studies of genes to environment. The Harvard School of Public Health (HSPH) is requesting funding for the deployment of a centralized, tiered high-performance data storage system to support our NIH-funded research in computational biology, genomics and biostatistics as applied to public health.