In pursuit of novel therapeutics, drug developers are scanning more compounds and covering more chemical space than ever before. The time required to bring a new drug into the market has not decreased, though the cost for drug discovery is steadily increasing. The root causes of this problem are related to efficacy, toxicity, and inappropriate absorption, distribution, metabolism and excretion, as shown by recent rigorous analyses. To focus on the ADMET(Absorption, Distribution, Metabolism, Elimination and Toxicity) issues, pharmaceutical research groups have, since the late 1990s, moved various physicochemical property screens earlier in the drug discovery process. The metabolic transformations of pharmaceuticals profoundly impact their bioavailability, efficacy, chronic toxicity, metabolic idiosyncrasies, excretion rates and routes. Metabolism is one of the major hurdles to overcome. In silico tools enable fast and virtual screening of large numbers of compounds before compounds are synthesized. Such tools enable researchers to recognize complicated metabolic processes, to eliminate poor candidates, and then to use the knowledge gained to discern possible deficiencies in compounds. Still poorly understood, metabolism is the most difficult to predict. The overall goal of this proposal is to develop a system to predict xenobiotic metabolism in mammals, and to gain insights into metabolism mechanisms (aim 1), and to study the differences in metabolism between humans and model animals (aim 2). We will use MDL's Metabolite database as a source of information about drug metabolism reactions. For aim 1, we will develop both global metabolism prediction systems, which can be applied to diverse substrate without prior knowledge of enzymes, and local models for particular enzymes, when prior knowledge of the enzymes involved in reactions is available. Global metabolism prediction systems will comprise many individual models, each of which will focus on an animal species (e.g., humans), an enzyme (e.g., CYP3A4) and a specific biotransformation (e.g., hydroxylation). Machine learning techniques will be used to build each individual model using various features to characterize the chemical environments of functional groups within molecules. For local models for an enzyme, we assume that the tight binding of ligands and enzymes is not required, but rather that reactions occur at sites where enzymes can easily attack. We will design ways to model the probability that a particular site will be attacked by enzymes. In Aim 1, both the global metabolism prediction system and the local models are trained on human reactions, so the models are animal specific. In Aim 2, we will build models for rat, which is a model animal in drug development. Using rat reactions listed in MDL's Metabolite database, we will establish a global metabolism system for rats and local models for rats using the same methods outlined in Aim 1. Though the methods are the same, the training sets are different, and it is expected that the models will make different predictions. By using drugs that are known to be metabolized differently in humans and rats, we will study differences in the human and rat models. PUBLIC HEALTH RELEVANCE: In pursuit of novel therapeutics, drug developers are scanning more compounds and covering more chemical space than ever before. ADMET(Absorption, Distribution, Metabolism, Elimination and Toxicity) has assumed center stage in the drug discovery process. Predicting metabolism is one of the major challenges to be met, and metabolism is the most poorly understood of the ADMET processes, and the most difficult to predict. We propose a machine learning approach for improving metabolism prediction, and for gaining insights into metabolism mechanisms, as well as studying differences in metabolism between humans and model animals.