We have shown that a consensus secondary structure for a family of proteins can be accurately predicted from a set of aligned homologous protein sequences using heuristics that extract conformational information from patterns of conservation and variation from the alignment, provided that these heuristics are based on an accurate understanding of how protein sequences divergently evolve under functional constraints. We will to continue to develop these heuristics for predicting the secondary, supersecondary, and tertiary structure of proteins starting from sequence alignments. This work will test heuristics already successfully used to make bona fide predictions of secondary structure by systematic and fully automated application against proteins with known structure. These heuristics will be made available to the public via a server accessible through -electronic mail. Efforts will then be directed towards creating "perfect" secondary structure prediction that are compelling starting points for modelling tertiary structure. A detailed analysis will be made of the remaining misassignments made using our heuristics, which fall into five classes: (a) errors where conformation within a family of proteins has diverged, (b) errors where the multiple alignment is bad, (c) misassignments of internal helices, (d) misassignment of secondary structure near the active site, and (e) misassignments of surface beta strands. Modified heuristics will be developed to eliminate each class of misassignment, or to ensure that misassignments do not disrupt efforts-to model tertiary structure starting from a predicted secondary structure. "Core" and "non-core" secondary structural elements will be defined and prediction heuristics scored by their relative ability to detect core elements. A systematic study of internal helices will be undertaken to refine techniques that have successfully identified an internal helix in the hemorrhagic metalloproteinases. A systematic study of secondary structure near the active site will be undertaken, to learn more about how peptide conformation in these regions might be modelled. Tools for identifying joints between domains, long distant compensatory covariation, and "parsing" strings will be developed. A database of surface beta strands will be constructed and analyzed to learn more about how these might be distinguished from surface coils. Advanced statistical methods (GOR, neural networks) as well as our heuristics will then be comparatively evaluated, with the goal of obtaining a composite prediction tool that combines the best elements of each method. This grant will also permit us to continue make bona fide predictions. In a second set of studies, we will continue our work analyzing patterns of amino acid substitution during divergent evolution under functional constraints, implementing our most advanced substitution matrices to improve pairwise alignments. We will then develop tools for detecting distant homologies, both through the reconstruction of probabilistic ancestral sequences in a protein family, and through the comparison of predicted secondary structures of separate protein families.