Genome-scale sequencing, RNA and protein expression studies, and systematic functional characterization projects have provided a foundation for a new microbiology, supported by predictive analysis of gene, RNA and protein sequences and structures using computers. The parallel development of microbial genome science, bioinformatics, Internet 2.0 and desktop supercomputers has helped bring this revolution to academic, industrial and governmental laboratories worldwide. However, the accurate and efficient management of this flood of new data, using expert curatorial oversight to create reliable information systems supporting experimental and systems biology research, has been an ongoing challenge. EcoGene.org demonstrates that a high impact, reliable bacterial genome database can be constructed with open source tools and maintained with low overhead in an academic research environment. Two broad, long-term objectives drive EcoGene.org development are: (1) the accurate, comprehensive and timely delivery of reliable non-redundant database- unified E. coli K-12 information vetted through an expert gatekeeper and suitable as a foundation for future interdisciplinary research, and (2) the support of other bacterial annotation and research projects through software and methods developed using E. coli as a model system. The specific aims are: (1) to collect and organize all newly published E. coli research results while controlling the quality of the input data streams, to improve interface functionality, to expand the scope of EcoGene.org to include all E. coli strains, and to provide the public with open source code for EcoGene.org and its generic derivative ProkGene.org;(2) to bring the E. coli K-12 MG1655 Genbank genome information up-to-date on a monthly basis with EcoGene.org acting as a curatorial gateway for feature, function and citation update suggestions from partner databases and the public and to expand the datasets and documentation presented in the E. coli K-12 GenBank record;(3) to perform bioinformatics research (a) to document bioinformatics validation and annotation suites for small proteins, sRNAs, pseudogenes and 5'UTRs, and to create standardized test and training sets, and (b) to add diverse statistical algorithms into a combined pattern search tool;(4) to perform selected laboratory verification studies to (a) resolve annotation gaps and ambiguities in proteome verifications including signal peptide/anchor discriminations, lipoproteins, and translation starts (b) verify the genotypes/phenotypes and fill gaps in large mutant collections, and (c) precisely revert mutations acquired during laboratory maintenance to restore lost functions and phenotypes. The accurate annotation of the E. coli genome is necessary in its own right as the most well understood cellular organism. It is also important to facilitate continued research on E. coli K-12 to increase our understanding of the reference strain most critical for anti-bacterial strategies to defend against bacterial bio-terrorism and to protect against the increasing threat of antibiotic-resistant bacterial epidemics. PUBLIC HEALTH RELEVANCE: EcoGene.org is a web site for scientists interested in the biology of Escherichia coli K-12. Even before genomes began to be sequenced, we knew more about the life of E. coli than any other organism. Now that we have all of the genes for E. coli, we have the possibility of attaining a nearly complete understanding of how a cell works in minute detail. The wealth of information collected at EcoGene.org from all over the world about the life of E. coli, both friend and foe of man, should help in achieving this