This research plan describes the computational aspects of a strategy for predicting the substrate specificities of unknown enzymes from the genome projects in order to direct and facilitate experimental assignment of their functions. Anchored by functional predictions that are validated by the experimental projects, high- quality functional annotations can then be made for many additional sequences by annotation transfer. Focusing on non-trivial problems in function prediction, we have integrated our various expertises in bioinformatics, in silico clocking, and comparative structural modeling to achieve substantial success, contributing to the discovery of 32 new functions in the large and functionally diverse enolase and amidodhydrolase (AH) superfamilies, and annotation of hundreds of orthologous sequences by annotation transfer. In close collaboration with the experimental investigators, we will continue to develop an iterative cycle in which multiple parallel and serial paths are integrated to obtain high quality information useful for functional prediction. We aim in the next funding period to build on breakthroughs in docking against both experimentally determined and modeled structures, especially, to predict the functions of proteins in metabolic pathways in which these superfamily members (and those of a new target superfamily, the RuBisCO-like proteins) reside, thereby extending our efforts toward a more general solution for prediction of functional specificity. Proteins in these operons are expected to catalyze reactions in the pathway that can be linked to the fundamental chemical capabilities of our target superfamily members that are members of those operons, providing clues for metabolic context. Similarly, we can expect substrates for enzymes in the pathway to contain substructures related to those of our target superfamilies, providing additional clues for filtering docking results against these proteins. To take advantage of these similarities, new methods developed by our groups for comparison of ligand structures and substructures will be applied to docking hit lists to identify patterns in multiple proteins of an operon useful for restricting potential substrates for further evaluation and experimental testing. To the extent we succeed, this effort will lay the groundwork for generalization of our approaches for the discovery of new enzyme functions, new pathways, and new biology. RELEVANCE (See instructions): Accurate prediction of molecular function for sequences in the genome projects is required to identify mechanisms of disease and improve drug discovery and development. In collaboration with the experimental projects, continuation of this computational project will contribute to this goal by applying orthogonal methods to correctly predict molecular function in large enzyme superfamilies and to extend those predictions on a large scale to determine the biological function of associated metabolic pathways.