Overall: Our project combines the significant advantages of a genetic model organism, sophisticated pathway mapping tools, high-throughput and accurate quantum chemistry (QM), and state-of-the-art experimental measurements. The result will be an efficient and cost-effective approach for unknown compound identification in metabolomics, which is one of the major limitations facing this growing field of medical science. Caenorhabditis elegans has several advantages for this study, including over 10,000 available genetic mutants, well-developed CRISPR/Cas9 technology, and a panel of over 500 wild C. elegans isolates with complete genomes. Half of C. elegans genes have homologs to human disease genes, making this model organism an outstanding choice to improve our understanding of metabolic pathways in human disease. We will develop an automated pipeline for sample preparation to reproducibly measure tens of thousands of unknown features by UHPLC-MS/MS. We will use the wild isolates to conduct metabolome-wide genetic association studies (m-GWAS), and SEM-path to locate unknowns in pathways using partial correlations. The relevance of the unknown metabolites to specific pathways will be tested by measuring UHPLC-MS/MS data from genetic mutants of those pathways. Molecular formula and pathway information will be the inputs for automated quantum mechanical calculations of all possible structures, which will be used to accurately calculate NMR chemical shifts that will be matched to experimental data. The correct structures will be validated by comparing them with 2D NMR data of the same compound. The validated computed structures will then be used to improve QM-based MS/MS fragment prediction, using the experimental UHPLC-MS/MS data. The Computational Core (CC) will have two primary components, metabolite pathway mapping and quantum chemical calculations of NMR and MS/MS data. The pathway mapping interfaces with the Experimental Core in the generation of m-GWAS results from wild isolates and LC-MS/MS analysis. These genetic associations will relate known metabolites to known genes. These pathways will be expanded by locating unknown features through partial correlations, which will significantly reduce the chemical space available to the unknowns. QM calculations will use this pathway information to limit the number of possible structures for a given molecular formula, which will be obtained by the Experimental Core. The output of the QM calculations will be accurate NMR chemical shifts on data from the same chromatographic retention times as the LC-MS/MS of the unknown, allowing us to find the best computed structure. We also will improve computational MS/MS predictions. All of the experimental and computational data will be added to a relational database, which will allow us to search any field (e.g. retention time windows, m/z values, etc.). The CC will provide robust computing infrastructure at two sites, shared notebooks for analysis, and deposition of data to repositories.