In collaboration with Dr. M. Miller, NCI, a critical, quantitative analysis was done of several commercial sequence assembly and analysis packages. A fundamental problem in contemporary molecular biology is the determination and interpretation of DNA sequences. Due to limitations of current sequencing technology, sequence determination entails the piecing together of short, overlapping sequence fragments into a single, long contiguous sequence. A number of commercial computer programs have been marketed to automate this process. While reviews of individual packages have been published, this is the first known study that critically compares the accuracy of assembly by these programs. Eleven programs were selected, primarily on the basis of their availability on the NIH campus. Sequence data is not random, but contains ordered repeated sequences. Likewise, errors in sequencing determinations are not randomly distributed. In order to provide a controlled and realistic dataset for measuring performance and accuracy, a known sequence, the rat multidrug resistance gene (RATMDRM, 5254 base pairs, accession number M62425) was split into 58 random overlapping fragments of 200 to 400 base pairs in length. These were then randomly seeded with 0 to 15% error based on the error distribution of the fragments originally used to determine the sequence. Errors were in the form of miscalled bases, deleted bases or added bases. The programs tested fell into three general groups based on accuracy. In order to rule out conditions unique to the chosen test sequence, four other sequences of between 4500 and 4600 base pairs were used to repeat the tests. With one exception, the error rates were comparable to those encountered using RATMDRM. Additionally, some programs were tested with different permutations of RATMDRM to ascertain their capacity to properly assemble the sequence regardless of the order of input of the fragments. Ease of editing the assembled sequences was also compared. Results of this study were accepted for publication by the Journal of Biological Computation.