We will develop a general purpose sequence assembly algorithm and corresponding software to be used by large scale sequencing projects. We will incorporate many features of existing assembly programs into our software. such as sensitive overlap detection, robust detection of sequencing errors, and multiple sequence alignment. In addition, we plan the following enhancements: l) We will employ a conservative assembly protocol which makes errors only in extreme cases and with low probability, and which has been designed with the capability to handle complex repetitive DNA, sequencing errors and chimeric fragments. 2) We will allow the user to include further biological information to aid the assembly. This includes constraints that the two sequenced ends of a plasmid insert be separated by the length of the plasmid insert. 3) We will enumerate all possibilities for the assembled sequence when faced with ambiguity caused by repeats, although we anticipated that the discriminatory power of our algorithm will yield the unique, correct algorithm in most cases. 4) The algorithm will design simple restriction digests whereby the correct assembly can be verified, or if there are multiple sequences, determine the correct one. This algorithm will be tested on the large scale sequencing projects at Collaborative Research, as well as simulated data.