An interdisciplinary team of experimental biologists (Drs. Tyrrell Conway, Barry L. Wanner, and Daoguo Zhou), computational biologists (Drs. Michael R. Gribskov, Daisuke Kihara, and David W. Ussery), and mathematical modelers (Drs. Julio Collado-Vides and Bernhard O. Palsson) will tackle the challenge of creating whole transcriptome maps at the single nucleotide level of the model cell E. coli K-12. Results from high-throughput deep sequencing of cDNAs of total cellular RNA (RNA_Seq) will be used to generate comprehensive maps of all transcribed regions across the entire genome, to define computationally all large and small protein-encoding and non-encoding RNAs, and to quantify expression levels under a variety of growth conditions in wild-type cells and selected transcription factor mutants. Comprehensive maps of transcription start sites will be created by use of a protocol recently developed by our consultant Joerg Vogel to identify primary transcripts. These measurements will be used together with mathematical modeling to decode the first comprehensive transcriptional network of a living cell, thereby providing the framework for integration of measurements of different data types, from results for genetic interactions, protein-DNA interactions (ChIPchip and ChIP_Seq), protein-protein interactions, metabolomics, phenotyping, proteomics, cellular localization of E. coli proteins (e. g., imaging data for fluorescently tagged E. coli ASKA ORFeome clones at www.EcoliHub.org/GenoBase), three-dimensional imaging (electron tomography) of E. coli cells, and for other data sets generated elsewhere. These studies will be extended to other E. coli by development of whole transcriptome maps of pathogenic E. coli EDL933, the prototype terohemorrhagic E. coli O157:H7 (EHEC) during growth in vitro and in the mouse intestine, leading to creation of the first comprehensive extracellular in vivo expression transcriptome. These studies will be carried out with methods developed by our consultant Jay C. Hinton for isolation of bacterial RNA from mice (and infected cell cultures) for preparation of cDNAs for deep sequencing. Similar procedures will be used to generate whole transcriptome maps of Salmonella enterica serovar Typhimurium during growth in vitro and following infection of cultured macrophages and epithelial cells, thereby creating the first comprehensive intracellular in vivo expression transcriptome. Results obtained throughout the course of this project will be made public in accordance with NIH data sharing guidelines, for analysis, visualization, comparison, and downloading at www.EcoliHub.org/GenExpDB. Likewise, all computational tools implemented or developed in this project will be freely provided to users at www.EcoliHub.org. No organism can rival E. coli in the amount of baseline information and experimental tractability for all the measurements required for whole cell systems biology. The development of whole transcriptome maps of E. coli will lay the foundation for development of robust mathematical models of E. coli biochemistry and physiology and thereby the creation of a computerized, interactive "virtual cell." Solving the E. coli cell will provide critical new insights into the fundamental nature of life. PUBLIC HEALTH RELEVANCE: No other organism comes close to E. coli in the sheer depth or breadth of existing knowledge of its component parts or cellular processes. Understanding how these processes interact to form a living cell will require their characterization, quantification, integration, and mathematical modeling - that is, Systems Biology. A comprehensive whole transcriptome map of E. coli K-12 will provide the groundwork for predicting the behavior of other cells, including disease-causing microbes.