Overall: Our project combines the significant advantages of a genetic model organism, sophisticated pathway mapping tools, high-throughput and accurate quantum chemistry (QM), and state-of-the-art experimental measurements. The result will be an efficient and cost-effective approach for unknown compound identification in metabolomics, which is one of the major limitations facing this growing field of medical science. Caenorhabditis elegans has several advantages for this study, including over 10,000 available genetic mutants, well-developed CRISPR/Cas9 technology, and a panel of over 500 wild C. elegans isolates with complete genomes. Half of C. elegans genes have homologs to human disease genes, making this model organism an outstanding choice to improve our understanding of metabolic pathways in human disease. We will develop an automated pipeline for sample preparation to reproducibly measure tens of thousands of unknown features by UHPLC-MS/MS. We will use the wild isolates to conduct metabolome-wide genetic association studies (m-GWAS), and SEM-path to locate unknowns in pathways using partial correlations. The relevance of the unknown metabolites to specific pathways will be tested by measuring UHPLC-MS/MS data from genetic mutants of those pathways. Molecular formula and pathway information will be the inputs for automated quantum mechanical calculations of all possible structures, which will be used to accurately calculate NMR chemical shifts that will be matched to experimental data. The correct structures will be validated by comparing them with 2D NMR data of the same compound. The validated computed structures will then be used to improve QM-based MS/MS fragment prediction, using the experimental UHPLC-MS/MS data. The Experimental Core (EC) will be responsible for the preparation and spectral data collection for several different types of C. elegans metabolome samples. This includes (i) a large-scale reference sample of the common laboratory strain ?N2?, (ii) a set of over 100 wild C. elegans isolates, representing a set of genetically diverse but homozygous ?individuals?, which will be used for mapping conserved biochemical pathways using a genome-wide association (m-GWAS) approach, and (iii) a set of deletion mutants that will be used to validate gene function predictions and characterize unknown features in known genetic pathways. These samples will be characterized by taking advantage of the complementary strengths of LC-MS/MS (speed and broad metabolite coverage), high-resolution FTMS (direct determination of experimental molecular formulas), and NMR (atomic-level structural data). When analyzed with approaches described in the Computational Core, the generated spectral data will be used to develop an automatic pipeline of unknown compound identification that will be generally applicable to a wide range of diverse model systems, including higher animals and human samples.