This project will demonstrate high speed genome sequencing capabilities through the implementation of stat-of-the-art automation into all steps of computer-assisted multiplex sequencing and data analysis. In years 1 and 2, the genomes of two important mycobacterial pathogens, M. tuberculosis and M. leprae, will be completed. The focus will then shift toward systematic sequencing of a 33 Mb region of human chromosome 10 (q24-q26) and a small mouse syntenic region. The sequencing milestones are: 2.4 Mb, 4 Mb, 6.4 Mb, 11.3 Mb and 20 Mb of contiguous finished sequence in years 1 through 5, respectively. A technology development team will focus on the refinement of sequencing techniques to provide routine high quality read-lengths of at least 700 nucleotides. An automation implementation team will test new instrumentation and work closely with the project 2 and the informatics core. Sequence images will be generated by infrared fluorescence scanning on automatic hybridizers developed in project 2. The images will be processed on high-speed computer workstations using automated image analysis software developed by the informatics core and REPLICATM, developed by L. Mintz and G. Church at HHMI at Harvard Medical School. Contigs will be assembled and proofread using GTACTM (developed by G. Gryan and G. Church at HHMI/HMS), GelAssemble, REPLICATM, and powerful new software being developed by the informatics group. Sequence analysis will be carried out using a variety of software tools including Blast, Grail, the Large Sequence Analysis Suite, Mycdb, and programs from the GCG and Staden packages.