Background: We are developing and testing automated techniques for assigning protein structure to novel, uncharacterized sequence, a technique called fold-recognition. In previous years, we had developed new techniques for protein secondary structure prediction. Such predictions are now used with our hidden Markov model (HMM) approach to protein fold recognition, called FORESST. The power of this statistical technique can best be assessed in carefully controlled retrospective statistical studies or in prospective trials. Our studies show that FORESST is more effective than other techniques in finding very distant homologous folds. Software developed in this Lab over previous years had been distributed in single copies on diskette. We have shifted all software distribution to Web-based servers. Additionally, some software is now being developed to run in server-mode so that users send data to the server rather than the server sending software to the user. This has advantages in that users need only have a standard Web browser, yet can make use of the latest software versions. We are also using web- technology to provide a common interface for programs used by our own Section. Progress in FY99:The FORESST method was tested extensively with existing protein families using a cross-validated analysis and a larger set of models of protein fold families. This study compared the method to local sequence similarity, sequence-motif recognition using HMMs and global sequence recognition using HMMs. These four methods were compared on the problem of recognizing distantly homologous proteins or protein folds, which is a critical problem facing genome annotation efforts today. Results showed that a method incorporating secondary structure propensity, (FORESST) outperformed purely sequence- based methods for the most difficult remote homology detection problems, whereas local sequence homology was generally the most powerful for moderate to close homolog detection. ABS staff also collaborated in several projects with NIH Intramural investigators, using bioinformatics and structure prediction tools, particular sequences or protein families of interest. Such techniques include secondary structure prediction, fold assignments, determining sequence- structure relationships using multiple sequence alignments, homology modeling, motif analysis and database searching. ABS staff also provided statistical advice and collaboration in areas of ligand binding data analysis, dose-response curve analysis, repeated-measures ANOVA and MANOVA and in one project, analysis of endocrine time series. The ABS structure prediction team has completed an entry into the CASP3 (Critical Assessment of Structure Prediction 3) international competition. Our entries include secondary structure prediction and protein fold recognition on a variety of newly solved but unpublished protein structures. While our fold-predictions were not the most accurate of those entered into this competition, their level of prediction accuracy was approximately the same as we anticipated based on earlier theoretical studies. Also, few if any of the prediction targets fell into the range where our prediction tool FORESST is known to exceed other methods in prediction accuracy. Software download for the P-SCAN program was made available first to NIH staff, and then, with the release of version 1.0, to the entire Web. Downloads have occurred at a rate of about 1 per day. The ABS web server also provides a number of unique sequence analysis services, including secondary structure prediction by the GOR4 algorithm, multiple sequence alignment by CLUSTALW, and various reformatting services.