The completion and annotation of the DNA sequence from two strains of M. tuberculosis, the laboratory strain H37Rv and the virulent clinical isolate CDC1551 will provide the mycobacteria and tuberculosis research communities with an unprecedented resource for investigating the infectivity, virulence, colonization, pathogenicity, and the mechanisms of drug- sensitivity and resistance of M. tuberculosis. Despite the abundance of information provided by the genomic sequence, the exceedingly slow growth rate and highly virulent nature of the organism make working with M. tuberculosis and its genetic manipulation formidable tasks. A great deal of effort has gone into developing M. smegmatis, a fast-growing non-pathogen, as a genetic surrogate for M. tuberculosis. Through these efforts, it is possible for the mycobacterial researcher to utilize M. smegmatis in much the same manner as E. coli is used in the greater microbiological community. Obtaining the complete genome sequence of M. smegmatis will represent a milestone in mycobacterial research. In addition to providing invaluable information for continuing its development as a genetic system for recombinant methodologies, the organism provides a valuable model for studying unique aspects of mycobacterial biology, including cell wall biosynthesis, regulation of growth-rate, drug-sensitivity, protein secretion, gene regulation, etc. The Institute for Genomic Research has developed a cost effective and efficient approach to microbial genome sequencing that has been used to sequence and assemble seven microbial genomes. The genome of M. smegmatis strain mc2155 will be sequenced using this whole genome shotgun strategy. Small and large insert random libraries will be prepared and a sufficient number of random shotgun sequences to reach 8-fold coverage will be obtained. Assembly of the genome will be done using the version 2.0 of the TIGR Assembly software and closure will be obtained with a variety of PCR based techniques developed at The Institute. Annotation of the genome will employ a variety of computer techniques for identifying open reading frames and identifying proteins of known function through similarity searches against curated databases. Additional analyses will identify features such as exported proteins, tRNA and rRNA genes, repeated sequences, promoter sequences and ribosome binding sites. The data will be made available to the research community through the TIGR Microbial Database on the World Wide Web (www.tigr.org).