DESCRIPTION: The last several years have seen a paradigm shift in biology brought about by the sequencing of the genomes of numerous organisms. The next important step in the post-genomic era is to identify the function of all the genes and gene products in a given genome. To assist in this objective, the long-term goal of the research described in this proposal is the development of protein threading algorithms which, when coupled to the appropriate active-site functional descriptors, identify both the global fold and the biochemical function of the protein. To be relevant in the post-genomic era, these methods must be applicable on a genomic scale and provide information beyond sequence-based, evolutionary approaches. Thus, the predicted structures must be at the requisite resolution to be able to identify biochemical function and binding regions in the protein. However, threading models often have significant errors in their alignments, even when the global fold is correct. Thus, techniques must be developed to improve the initial alignments generated by threading approaches. To achieve these objectives, the following specific alms are proposed to realize the goal of a predicted sequence-to-structure-to-function methodology: (1) Dr. Skolnick's current generation of threading algorithms will be further improved by enhancements in the scoring functions that evaluate sequence-structure specificity. (2) Algorithms that refine distant threading models will be further developed and generalized. Such techniques should be able to take structures whose backbone root-mean-square deviation from native is 8-10 A to the 4-6 A range necessary for active site identification. (3) An expanded, high quality library of three-dimensional, active-site functional motifs will be developed. (4) Whole genome threading followed by threading model refinement and active-site library screening will be carried out on M. genitalium, E. coli, S. cerevisiae, C. elegans and human. (5) A structure-function database of fold/function prediction of these five genomes will be constructed and will be available on a website (http://bioinformatics.danforthcenter.org).