ABSTRACT Mass spectrometry (MS) based metabolomics is an important tool in biological and clinical sciences; however, unambiguous annotation and/or identification of detected compounds are major challenges. Errors in annotation are perpetuated through differential and pathway analysis, rendering interpretation difficult and inaccurate. Confident annotations and validated identifications can be obtained through additional experimentation, including comparison to authentic standards. However, this is expensive, time-consuming, and is generally performed after pathway and differential analysis have occurred. We propose a metabolite annotation algorithm that will result in improved annotation by integrating gene and pathway information. We expect to improve metabolite annotation, which will lead to better interpretation of metabolomics data. The goal is to develop a powerful tool that can aid biomedical researchers and bioinformaticians in metabolite annotation; the result will be improved utilization of datasets and more accurate interpretation in the context of disease. Briefly, gene expression will be used to determine which proteins are likely to be present in KEGG pathways. Their proximity to the metabolite of interest in the pathway is used to determine the likelihood of the metabolite?s presence. To develop the algorithm, we will use targeted data from the COPDGene cohort that not only contains MS data for which we want to improve annotation, but also tandem MS data using purchased standards. The latter will be used to evaluate different scoring schemes. We will validate the algorithm on untargeted data from a COPDGene cohort that has a small overlap with the targeted cohort. We expect to provide a tool for biomedical researchers and bioinformaticians that improves their ability to annotate and interpret their MS data; this will reduce resources needed to follow up on annotated metabolites. The gene expression data used here is publicly available. The metabolomics data will be available at the Metabolomics Workbench. Ultimately, we expect that this tool can be applied to any studies where combined genomics and metabolomics data is available.