The development of rapid methods for molecular cloning, DNA sequencing, and protein and DNA sequence comparison has revolutionized the practice of molecular biology. Newly determined sequences are routinely compared against large sequence databases, and increasingly, inferences about structure are based on sequence similarity. During the past grant period, we (1) developed a rigorous new approach to evaluating sequence comparison algorithm and scoring parameters and discovered a simple but very effective normalization that significantly improves similarity searches; (2) implemented our search programs within six network parallel programming environments and evaluated their performance; (3) investigated tree-based multiple alignment strategies; (4) developed more exhaustive distance-based evolutionary tree methods. During the next period, we will (1) Develop rapid methods for protein sequence comparison that perform as well as or better than the rigorous Smith-Waterman approach. We will incorporate statistical estimates into the FASTA and Smith-Waterman comparison programs. We will also search for scoring parameters and normalization functions that provide better search performance. We will extend our studies to DNA sequence comparison, using repeated sequence families and exon sequences to characterize use ability of algorithm and scoring parameters to both identify aid properly bound homologous DNA sequences. (2) Our current network-parallel comparison programs are not well suited for production environments. We will augment our them to provide all the functions present in the widely used serial versions. We will also develop more robust and usable parallel platforms for "production" sequence searching on networks of shared-workstations. (3) We will develop efficient heuristics for constructing evolutionary trees based on distance and parsimony criteria. We will focus on new approaches that sample more broadly evolutionary tree space and can produce information on sub-optimal trees. (4) We will continue to develop and characterize tree-based approaches to multiple sequence alignment. We will develop heuristic tree-based alignment algorithms that are capable of aligning rapidly dozens of sequences and develop parallel implementations of these algorithms. We will also examine more sophisticated gap-penalties for tree-based alignments. (5) We will examine effective "unified"' approaches to phylogeny and alignment by combining the approaches outlined in aims 3 and 4 above.