Completing the sequence of the human genome will involve very high throughput sequencing centers that differ quantitatively and qualitatively from even the largest of today's "large scale" sequencing laboratories. Increasing the size and throughput of a center by more than an order magnitude will substantially alter computing support needs with new requirements for adaptability and reliability as well as overall capacity. This project will examine these scaling issues in detail and will design an informatics architecture to support very high throughput sequencing. The cost of DNA sequence analysis is highly dependent on the accuracy required of the finished data. Models will be developed to predict expected error rates for different sequencing strategies and to assess the impact of likely error rates on data utility. The Genome Sequencing Center, at Washington University, will be used as a case study and test- bed. The objectives of this proposal are: Specific Aim l. Designing a modular architecture to support very high throughput sequencing Specific Aim 2. Understanding the determinants of DNA sequence accuracy Specific Aim 3. Analyzing error prone DNA sequence a.Improved tools for the analysis of error prone DNA sequence b.Understanding the utility of error prone DNA sequence Specific Aim 4. Automating accurate consensus sequence generation Specific Aim 5. Developing quality assurance procedures for very high throughput DNA sequencing Specific Aim 6. Disseminating results