Metabolic pathway databases provide a biological framework in which relationships among an organism's genes may be revealed. This context can be exploited to boost the accuracy of genome annotation, to discover new targets for therapeutics, or to engineer metabolic pathways in bacteria to produce a historically expensive drug cheaply and quickly. But, knowledge of metabolism in ill-characterized species is limited and dependent on computational predictions of pathways. Our ultimate target is to develop methods for the prediction of novel metabolic pathways in any organism, coupled with robust assessment of the validity of any predicted pathway. We hypothesize that integrating evidence from multiple levels of an organism's metabolic network - from the fit of a pathway within the network to evolutionary relationships between pathways - will allow us to assess pathway validity and to predict novel metabolic pathways. We have successfully applied machine learning methods to the problem of identifying missing enzymes in metabolic pathways and believe similar methods will prove fruitful in this application. Our preliminary studies have identified several properties of predicted metabolic pathways that differ between sets of true positive pathway predictions (i.e., pathways known to occur in an organism) and sets of false positive pathway predictions. We will expand on these features and develop methods to address the following specific aims: 1) Identify features that are informative in distinguishing between correct and incorrect pathway predictions in computationally-generated pathway/genome databases based on predictions for highly-curated organisms (e.g., Escherichia coli and Arabidopsis thaliana). 2) Develop methods for computing the probability that a pathway is correctly predicted. Informative features identified in Specific Aim #1 will be integrated into a classifier that will compute the probability that a predicted pathway is correct given the associated evidence. 3) Extend the Pathologic program (the Pathway Tools algorithm used to infer the metabolic network of an organism) to predict alternate, previously unknown pathways in an organism. We will search the MetaCyc reaction space (comprising almost 6000 reactions) for novel subpathways, explicitly constraining our search using organism-specific evidence (i.e., homology, experimental evidence, etc.) at each step.