The long-term goal of this project is to develop a structure-based approach for the prediction of protein molecular function so that the information provided by both genome sequencing and structural genomics can be more fully exploited. To achieve this overall objective, this proposal further develops a very promising and tightly integrated, sequence-to-structure-to-function approach that employs protein structure to predict protein- protein interactions, protein molecular function, and ligand binding sites. It also holds considerable promise for improved ligand screening. In particular, the following Specific Aims are proposed: (1) Monomeric sequence profile-based threading algorithms, which currently fail to find the good template structures in the PDB for the ~25% of single domain proteins with very low sequence identity to solved protein structures, will be extended and improved. (2) A purely structure-based version of threading will be developed, as the best contemporary threading algorithms have a strong evolutionary component that limits their structure recognition ability when the target and template proteins are evolutionarily distant or have analogous structures. In that regard, potentials of mean force suitable for structure-based threading will be derived from a new AMBER-related, physics-based atomic potential that shows significant ability to refine structures closer to native. (3) The multimeric structure prediction algorithm, m-TASSER, will be enhanced by improving the accuracy of interfacial side chain contact predictions and the use of physics-based interfacial potentials for structure refinement. In addition, by exploiting the fact that the library of single domain protein structures is likely complete, all-against-all docking will provide an estimate of the number of possible dimer complexes of single domain proteins. (4) The FINDSITE structure-based protein molecular function prediction algorithm will be extended and improved. Included are enhancements of its ligand screening ability based on the insight that for evolutionarily distant proteins, there are conserved anchor regions in both the protein binding site and in the 2 bound ligands that can be exploited for rapid ligand binding pose prediction and screening. (5) EFICAz , a precise enzyme function inference approach, will be combined with FINDSITE to develop a more powerful ligand screening approach. (6) The entire set of tools developed in Aims 1-5 will be applied to all sequenced 2 proteomes and the resulting sequence-to-structure-to-function, S F, database made available to the academic 2 community. Whole proteome structure predictions will be combined with EFICAz and FINDSITE to identify possible receptors of small regulatory molecules including the targets of anticancer metabolites, and to provide whole proteome screened ligand libraries, libraries of protein-protein interactions, quaternary structures and molecular functional annotations. In all cases, large scale, careful benchmarking will be done. Thus, this project holds the promise of making a significant impact across a wide spectrum of biologically important problems. PUBLIC HEALTH RELEVANCE: The development and whole proteome application of the tightly integrated, protein sequence-to-structure- function approach described in this project will be of utility to a broad spectrum of researchers. By assisting in the early stages of drug discovery, the proposed algorithms could have significant therapeutic utility. Also, most of the estimated 650,000 protein-protein interactions in the human interactome are unknown; by providing predicted protein quaternary structures, insights into how these proteins perform their function will result.