PROJECT SUMMARY Increasingly available metabolomics data enable greater understanding of metabolite changes in response to physiological or disease processes. Recent developments have proven metabolomics to be a valuable technology for significantly advancing medical research by accelerating the translation of knowledge from bench to bedside. However, the effective use of these data requires expertise from both metabolomics and statistics due to a series of data pre-processing steps prior to statistical analysis, such as data conversion, data scaling, data normalization, peak alignment and metabolites annotation, among many others. Despite the promise of metabolomics in the clinic, there are well documented challenges that limit the full potential of metabolomics, such as identification of metabolite biomarkers, validation of metabolite biomarkers, and metabolites-based disease predication or progression. These barriers have significantly hampered the application of metabolomics to clinical and translational research. To overcome these challenges, our team proposes to develop a series of multivariate statistical methods that are specifically designed for metabolomics data analysis. More specifically, instead of investigating one metabolite at a time, a group of biologically related metabolites will be modeled simultaneously. Meanwhile, other clinical covariates (such as gender, age, BMI, etc.) will be evaluated for their effects on the metabolites. The proposed project has three main goals: (1) introduce the new idea of using a group of metabolites as potential biomarkers for diseases. By incorporating the biological knowledge in grouping correlated metabolites, we propose to employ the seemingly unrelated regression model to investigate the relationship between a group of metabolites and disease status while adjusting the effects of other clinical covariates. (2) Construct metabolic networks to better understand their systematic perturbations accompanied by human diseases, where the network can serve as more robust biomarkers for disease diagnostics, and (3) advocate the disease prediction by the combination of metabolite profiles, clinical covariates, as well as their interactions. A direct modeling approach, generalized orthogonal components regression, is proposed to handle the large number of metabolites compared to the relatively small number of individuals. The utility of the methods will be evaluated extensively by simulation studies, and real data collected from different diseases including publically available as well as in-house data from our ongoing cancer care engineering project. With all these data, the methods will be compared to the most popular method, partial least squares discriminant analysis. The proposed statistical methods will be made freely available to the research community through GitHub, cceHUB, Metabolomics Consortium Data Repository, and the Metabolomics Workbench. The project is directly responsive to RFA-RM-15-021 because it will foster close collaboration between metabolomics experts and biostatisticians, produce efficient and reliable statistical methods that can be used to maximize the value of existing metabolomics resources, and enable the promise of metabolomics in early diagnosis of common complex diseases.