Recent improvements in computer algorithms for comparing DNA and protein sequences have dramatically decreased the amount of time required to compare an unidentified sequence to a DNA or protein sequence library. Using the FASTA sequence comparison program, one can compare a newly sequenced protein to the entire NBRF protein sequence library (1.2 X 106 amino acids) in less than 5 minutes on an IBM-PC/AT microcomputer, or in less than 1 minute on a SUN 3/50 workstation. Sequence comparisons against the entire GenBank TM DNA sequence library (1.5 X 107 nt) require about 15 times as long. Although the FASTA program has decreased the time required for library searches almost 100- fold, large-scale sequence comparisons, such as the comparison of two protein sequence libraries with one another, still require 50 - 100 hours of SUN 3/50 computer time, and in the future, 50 to 200-fold increases in the size of the DNA and protein sequence libraries are expected. More rapid methods for comparing DNA and protein sequence libraries will be required for timely analysis of these larger libraries; we propose to decrease the time required for rapid library searches by 10 to 200-fold by (1) using encoded libraries to very rapidly identify closely related sequences; and (2) examining the performance of the FASTA program on several parallel machine architectures. In addition to increasing the speed of FASTA rapid comparison algorithm, we propose to test several strategies that may improve the sensitivity of protein sequence comparisons by incorporating more structural information into the comparison process. One approach will be to evaluate strategies that abstract information about a family of proteins or protein structures, such as the hemoglobin fold, the serine protease active site, into a consensus search sequence or pattern. To do this, we will develop new multiple alignment programs that can rapidly align several dozen sequences. In addition, we propose to examine the hypothesis that some local protein sequence similarities are due to common tertiary structure rather than common ancestry. FASTA, and other protein sequence comparison programs, sometimes find sequence alignments with unexpectedly high sequence similarity scores, between sequences that are not believed to share common ancestry. Although it is assumed that such sequences may share common structural features, the relationship between sequence similarity and structural similarity in non-homologous sequences is not well understood. We propose to evaluate the structural basis of these high similarity scores by comparing the sequences in the protein crystal-structure database, and examining sequence alignments with high similarity scores in the absence known homology. These sequences will then be compared at the structural level, to determine whether structural similarity can be detected from sequence similarity in the absence of common ancestry.