In the last two years, rapid accumulation of genome sequences and protein structures has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI formed the basis of our work on protein motif analysis. New functions of PSI-BLAST, namely the ability to save a sequence profile from a database search and use it for subsequent searches of other databases were developed and applied to the task of protein fold recognition and to the analysis of a variety of protein superfamilies. Starting from the classification of protein structures in the SCOP database, a library of profiles that could serve as fold identifiers was developed. This library was used to examine the distribution of predicted protein folds in all completely sequenced genomes. A significant (approximately twofold) increase in the fold prediction rate compared to previous studies was achieved. Major differences in the distribution of protein folds in bacteria and archaea compared to the eukaryotes were demonstrated. It was shown that with systematic selection of optimal starting points for iterative database searches, PSI-BLAST allows the detection of subtle relationships between proteins that have been previously deemed detectable only by structure-structure comparison. A considerable number of completely unexpected structural and evolutionary relationships between proteins were identified using this approach. Two examples include the discovery of the Toprim domain present in two classes of topoisomerases, bacterial-type DNA primases, and a specific family of nucleases, and the demonstration of a common structure in the multifunctional adaptor POZ domain and the tetramerization domain of potassium channels.