Anticipated Impacts on Veterans Health Care: This proposal will use natural language processing (NLP) methods and machine learning approaches to provide and compare predictive models of non-alcoholic fatty liver disease (NAFLD) among Veterans. Proposed analyses will also examine racial/ethnic differences in NAFLD diagnosis, treatment, and outcomes with the goal of identify patient groups at highest risk of progression to liver cirrhosis and cirrhosis-related complications. The long-term goal of this research, which this pilot study will facilitate, is the development and effective targeting of integrated multidisciplinary treatment algorithms alongside simple, culturally appropriate, and cost-effective interventions to curb the epidemic of NAFLD and its complications among Veterans. Background: NAFLD is a significant and growing health problem closely associated with obesity, type 2 diabetes mellitus (T2DM), hypertension, and dyslipidemia. In the VA, NAFLD prevalence has been estimated as high as 46%. The prevalence of NAFLD varies significantly depending on the population studied and on the tests used. In the Dallas Heart Study, it was estimated that over 30% of patients had NAFLD by MR spectroscopy. Importantly, investigators found that the highest prevalence of NAFLD occurred among Hispanics (58%), and those with T2DM (over 70%). Hispanic populations have higher incidence of NAFLD and potentially higher rates of progression to advanced fibrosis, compared to non- Hispanic White (NHW) patients. Current therapy aims to optimize both cardiovascular and liver-related risk factors (i.e. T2DM, hypertension, hyperlipidemia, obesity, smoking etc.). Lifestyle changes driven by dietary intervention and exercise are the first line of therapy to induce and maintain weight loss, reducing fat mass, hyperinsulinemia and insulin resistance, thus decreasing lipotoxic liver damage and multisystem metabolic consequences. The VA NAFLD Clinic provides Intensive Weight Loss that includes nutrition, exercise, behavioral, VA approved pharmaceuticals (e.g., Bupropion/Naltrex, Lorcascerin) and bariatric surgery. Hence it is important to identify patients that are at high risk of progression to the poor outcomes associated with advanced NAFLD and provide treatments available at VA NAFLD Clinics. Objectives: In this 1-year pilot, we propose using the VA NAFLD Team curated cohort (n=61,900) of Veterans from the national Veteran Affairs Informatics and Computing Infrastructure (VINCI) system who have received liver biopsies. The dataset will be augmented to include medical records 8-years prior and 1- year post biopsy. We will use clustering and machine learning predictive analytic approaches to identify patients with higher risk of developing cirrhosis, cirrhosis-related complications, and cardiovascular events with a focused analysis on racial and ethnicity disparities. Methods: The machine learning methodology of convolutional neural networks and random forests will be used to identify NAFLD patients using NLP variables, laboratory values and comorbidities available in the patient records in the VINCI system. In order to identify rapidly progressing NAFLD patients we will cluster fibrosis risk score trend data. We will tailor the approach to identification of NAFLD and progression and augment it with machine learning analysis. The outcome of our pilot will be predictive models of NAFLD patients along with their severity estimate that can be used to determine which groups of patients are at higher risk of progression to cirrhosis, cirrhosis complications and cardiovascular events and thus, would benefit from a clinical intervention to proactively reduce their risk. The next steps is a follow on study that uses the models predicting high risk patients, derived in the pilot, as part of an intervention to improve access of Veterans with a high risk of progression to liver complications and cardiovascular events to appropriate care in VA NAFLD Clinics.