PROJECT SUMMARY With the number of human health studies involving metabolomics rising at a rapid rate, the development of methods to address critical analytic barriers in the analysis of metabolomics data is of critical importance. Missing values (MVs) are a pervasive, and often ignored, issue in metabolomics, yet the treatment of MVs can have a substantial impact on differential abundance and other downstream statistical analyses. The MVs problem in metabolomics is quite challenging, namely because the source of MVs is not always clear and can arise because the metabolite is i) not biologically present in the sample, ii) present in the sample but at a concentration below the lower limit of detection (LOD), or iii) present in the sample but undetected due to technical issues related to sample pre-processing steps (e.g. peak resolution). Current commonly used methods (e.g., substitution by zeros, LOD, or the mean value) tend to be overly-simplistic and produce sub-optimal and potentially misleading results. Since there is a noticeable absence of imputation methods from the literature that properly account for the different types of missingness in metabolomics data, there is an urgent need to invest in improving statistical models of MVs that are specific to metabolomics. We have recently developed a modified K-nearest neighbors (KNN) imputation algorithm that accounts for the truncation point (i.e., the LOD) in the data (KNN-TN). Based on simulations derived from real metabolomics studies, this algorithm showed considerable improvement in imputation accuracy (root-mean squared error) compared to single value (LOD, mean, zero) imputation approaches and standard KNN imputation. In this proposal, we will develop an alternative Bayesian modeling approach that accounts for the uncertainty due to imputation and stabilizes estimates for small samples by sharing information across metabolites. Further, we will evaluate the impact of MV imputation on downstream statistical analyses based on simulations from a wide-variety of publicly available datasets from the Metabolomics Workbench. Our analyses will allow us to make comprehensive recommendations to analysts about which imputation algorithm(s) are optimal in terms of biological impact. Lastly, we will develop publicly available software for implementing all developed imputation methods, including a web-accessible interface to broaden outreach and impact. The overall long term goal of this proposal is to develop user-friendly software and best-practices guidelines for imputation strategies in metabolomics data, thereby improving accuracy of downstream statistical analysis and the resulting biological impact.