PROJECT SUMMARY/ABSTRACT Rationale. Diabetes is a leading cause of renal disease, accounting for 40% of the estimated 20 million US adult cases of chronic kidney disease. There is, however, substantial heterogeneity across diabetic patients with regards to development of kidney disease. Hence, there is an urgent need to identify prognostic biomarkers that can provide early and reliable evidence of future kidney disease, so that high-risk patients can receive optimal medical care. Existing clinical, proteomic and genomic markers do not consistently nor accurately predict kidney function decline. Metabolomics, a systematic evaluation of the end-products of cellular function in fluids, has the potential to inform physiological and pathological effects of chronic diseases. Metabolomic analysis combined with advanced quantitative methods could play a key role in building clinically useful prognostic signatures of diabetic kidney disease. Yet, development of computational methods with adequate rigor has lagged behind the technical capacity to perform large scale quantitative metabolomics. In this proposal we aim to address this computational gap in diabetic kidney disease research. Aims. We will implement rigorous computational methods to identify robust prognostic metabolite + clinical + genetic signatures of diabetic kidney disease progression. Specifically, we aim to (i) test the accuracy of previous signatures, and apply state-of-the-art analytic techniques and novel statistical methods to identify new multivariate metabolite sets for predicting kidney disease progression; (ii) quantify patterns of co-regulation of metabolites in diabetic kidney disease, and develop new tools in network biology to discover novel enzymes, proteins, metabolites, and molecular pathways which are implicated in diabetic kidney disease progression; (iii) test if these models can accurately predict kidney disease progression in independent prospective cohorts. Methods. Using clinical, genetic and metabolomic data from large prospective cohorts of > 1200 diverse, well- characterized patients with Type 2 diabetes, we will apply statistical methods for variable selection (e.g., penalized regression), and machine learning methods (e.g., random forest), which are known to perform well in the high-dimensional setting, to identify robust and parsimonious signatures of kidney disease progression. We will quantify inter-metabolite co-regulation patterns and infer biological pathways implicated in diabetic kidney disease. Throughout the modeling process, a rigorous training-validation paradigm will be adopted in order to improve reproducibility of models and reduce chance findings. Impact. A major product of this work will be the development of a clinically useful algorithm for identifying diabetic patients at high-risk for kidney function decline. Our findings will also provide insight into markers of renal dysfunction, and elucidate possible therapeutic targets for treating diabetic kidney disease, thus potentially informing the design of future clinical trials.