Algorithmic assignment of probable function to proteins of previously unknown function Objectives and Specific Aims: The goal of this project is to extend and apply algorithms that show promise in assigning a probable function for PDB entries of currently unknown function. This should contribute to deriving benefit from the Protein Structure Initiative by "help[ing] researchers illuminate structure-function relationships and thus formulate better hypotheses and design better experiments." Research Design and Methods: New protein structures are being determined at a rate faster than their biological function can be assigned. There are currently 2939 entries in the Protein Data Bank with the classification "Unknown Function". A number of computational methods have been developed to provide rapid, inexpensive means of function prediction for these structures, including those that focus on alignment of entire backbones and others that focus on identification and alignment of active site residues based on the unusual charge distributions in protein structures. We have developed a software plug-in for the PyMOL molecular graphics environment called ProMOL that relies on the geometric relationships conserved in enzyme catalytic sites. Motifs in ProMOL were created from the active site specifications found in the Catalytic Site Atlas (CSA) (http://www.ebi.ac.uk/thornton-srv/databases/CSA/). Our approach explicitly searches for CSA- defined catalytic site residues according to specific atomic geometry, similar in concept to the CSA JESS templates. This dispenses with the need to filter out confounding elements such as conserved folding domains or ligand binding regions. Extensive testing of structural files from the serine protease and peroxidase families confirmed that the geometric relationships of catalytic residues alone are effective and sufficient for function prediction in protein structures. In addition to extensive characterization of serine proteases and peroxidases, we also performed a preliminary study of 39 PDB entries classified as "Structural Genomics, Unknown Function" using the Motif Finder in ProMOL, which contains 22 "native" ProMOL motifs, along with the corresponding CSA JESS C1C2 motifs and CSA Functional Atom motifs. Of the 39 entries studied, 26 (67%) yielded prediction values of 1 (exact match to an existing template). An active site lacking one residue or containing an extra (outlier) residue was identified for 36 (92%) of the structures. No match was reported in only three of the test cases. We will extend the number of motifs in ProMOL's Motif Finder, using both newly created ProMOL motifs and existing JESS motifs to include representatives from the most prominent protein families, increase automation of the process and then evaluate all PDB entries described as having "unknown function". Entries that show positive correlation will then be further explored using sequence and structure alignment tools. Both software and results will be openly released to the community. PUBLIC HEALTH RELEVANCE: Algorithmic assignment of probable function to proteins of previously unknown function Relevance: One expected benefit of the Protein Structure Initiative (PSI) is that structural descriptions will help researchers illuminate structure-function relationships and thus formulate better hypotheses and design better experiments;however, even after a three dimensional structure of a protein has been obtained the function or functions of that protein are not always apparent. Algorithms that compare salient structural features of proteins of known function to similar features in PSI targets for which the function is not yet known can provide helpful guidance in assigning probable functions to those targets and the aim of this project is to use such algorithms to assign probable functions to a significant subset of the PSI targets of unknown function and thereby help in better understanding structure-function relationships.