Vibrio cholerae is the etiologic agent of cholera, a potentially lethal diarrheal disease which remains a dangerous threat because of its tendency to occur in explosive outbreaks. After a century's absence, epidemic cholera has reestablished a foothold in the Western hemisphere. This project will sequence the complete genome of Vibrio cholerae and will use this sequence information to generate a database of nucleotide sequences, protein sequences, and genome organization, as well as a set of sequenced, recombinant clones, covering the entire V. cholerae genome. Determining the V. cholerae genomic sequence will expedite ongoing research worldwide and will reduce expenditures of both time and money in gene identification, thus freeing research funds for studying the biological function of gene products and the environmental source, distribution, and seasonality of V. cholerae and of the cholera epidemics. The complete DNA sequence of the Vibrio cholerae genome (serogroup O1, biotype El Tor; approximately 2.5 million base pairs) will be obtained using a random library shotgun sequencing strategy followed by gap closure, finishing, and structural confirmation by polymerase chain reaction. The assembled genome will then be analyzed and annotated, employing a variety of computer techniques to: find all the open reading frames (ORFs) and relate ORFs to known proteins by similarity searches against databases; identify untranslated features such as tRNA genes, rRNA genes, insertion sequences and repetitive elements; determine potential promoter sequences and ribosome binding sites, and characterize V. cholerae metabolically. Simultaneously, a second pathogenic V. cholerae strain, serogroup O1 classical will be shotgun sequenced at 1.5X coverage. This relatively modest sequencing investment will net 78% of the genome for a second strain, providing data on inter- and intragenic divergence between biotypes, as well as information on genome structure conservation and gene content. This information will be of critical importance in understanding variation in V. cholerae epidemiology and pathogenicity, providing for the first time a detailed, direct characterization of the interstrain variation and its effect on pathogenicity. In addition to depositing sequence data in GenBank, GSDB and other appropriate public databases, a genomic map, gene list, nucleotide and protein sequences, and done availability information will be made available on the World Wide Web (http://www.tigr.org) in the TIGR Microbial Database, using a format modeled on the Haemophilus influenzae and Mycoplasma genitalium genomes already available there. New formats will be developed to accommodate expression data, interbiotype sequence alignments, and other pertinent information.