This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. The California Institute for Telecommunications and Information Technology (Calit2) [1] and the J. Craig Venter Institute (JCVI) [2] jointly propose to develop a state-of-the-art genomics data, analysis, and synthesis center, using the most advanced computational data grid, optical networking, and computing technologies, devoted to the needs of scientists studying the complexity of organisms as they function in natural ecosystems. Such a center is needed urgently to enable the transition from organismal genomics to environmental metagenomics. Just as molecular biology, and more recently computational biology, emerged from the interplay of numerous disciplines looking at microbiological phenomena, marine microbial ecology is becoming a focus of such creative, interdisciplinary energy, a frontier for innovation at the interface of genomics and information technology. The pace of development and the power of genomics for biological discovery are increasing rapidly with the application of shotgun sequencing to entire microbial communities [3]. The Moore Foundation [4], the U.S. Department of Energy (DOE), and the National Science Foundation (NSF) are funding some initial metagenomic studies of natural ecosystems. The largest of these is the Venter Institute's Sorcerer II Expedition [5], sampling marine microbial communities every 200 miles around the globe. Though the number of bases that will be added to GenBank [6] from the sequence data from the first leg of the expedition is but a few percent of the current total, the number of predicted proteins will double. Moreover, about 6,000 new gene families will be added to the approximately 4,000 found in GenBank today. The number of genomes sequenced from culture is also increasing rapidly. For example, the Venter Institute, with Moore Foundation funding, will sequence the genomes of more than 75 marine microbes this year alone [7]. DOE's Joint Genome Institute (JGI) [8] sequences more than 50 per year. These new sequences, genes, and gene families, together with associated environmental data, offer tremendous potential to understand better the functioning of natural ecosystems. Environmental metagenomics examines the interplay of perhaps thousands of species present at a point in space and time. Each individual sequence is no longer just a piece of a genome. It is a piece of an entire biological community. Each individual sequence can now be considered from the worldview of the ecological sciences: the composition of the rest of the community, the environmental conditions in which it is found, and its relationships with other species with which it is found at other times and places. More importantly, it provides a more complete picture by providing many genomes from species not sustainable in laboratory cultures but only present in their native environments. The explosion of these information-rich metagenomic samples requires an innovative architecture that uses emerging Cyberinfrastructure concepts in data storage, access, analysis, and synthesis that are simply not available in the current gene sequence resources. The overarching goal of this project is to create a community resource and center for environmental genomics that will facilitate and create revolutionary advances in our knowledge of microbial ecology in marine and other natural environments, and of evolutionary biology. To this end, the proposed Center will bring together leaders in the new technologies of high-throughput DNA sequencing, and metagenomic analysis tools on the one hand, and Cyberinfrastructure innovations in optically coupled computing, emerging Grid middleware, and user workspaces on the other.