The BLAST family of protein and DNA database search programs constitute one of the key services offered by the NCBI. These programs are currently run on NCBI servers about 70,000 times during a typical weekday. This project represents an ongoing effort to improve and extend the functionality of these programs. Efforts this year have focussed on the improvement of the PSI-BLAST program: PSI-BLAST searches a database of protein sequences using a position-specific score matrix (PSSM) as query. The PSSMs used are generally constructed on the fly, through multiple iterations of database searching, initiated with a standard protein sequence. PSI-BLAST has been widely used to annotate proteins inferred from new DNA sequences, and to generate sets of PSSMs representing large classes of proteins. In order to improve the sensitivity of the PSI-BLAST program to distant sequence relationships, we developed a system to evaluate the program's performance. For a set of about 100 query sequences, experts in the group compiled an exhaustive list of related proteins in yeast. The queries can then be compared to a comprehensive protein sequence databease through an arbitary number of PSI-BLAST iterations, and the resulting PSSM compared to the complete yeast sequence. This procedure generates a list of yeast sequences ordered by E-value, from which a plot of false positives vrs. true positives may be obtained. We used our evaluation system to improve the average sensitivity of PSI-BLAST to distant relationships. The changes adopted include: 1) Filtering the database sequences rather than the query for segments of restricted amino acid composition; 2) Using the Smith-Waterman algorithm to construct any alignments reported; 3) Improving the numerical precision in the calculation of amino acid pair target/background frequency ratios; 4) Adopting an improved estimation of statistical and edge-effect parameters; 5) Calculating E-values based upon the composition of the database sequence hit rather than upon a standard protein amino acid composition; 6) Letting gaps in a given alignment column render the projected amino acid frequencies for that column closer to background frequencies; 7) Adopting composition-based statistics only when they have the effect of increasing E-values; 8) Decreasing the pseudocount constant from 10 to 9; 9) Increasing the percent difference from other sequences required for inclusion in the multiple alignment from 2% to 6%. All these changes have been incorporated into the version of PSI-BLAST now available over the public NCBI web page. The new program is much less likely to return false positives, with spurious low E-values.