Project Summary/Abstract Disparities in health and health care have been a longstanding challenge in the United States. One specific area of medical care in which racial/ethnic disparities have been identified is total joint arthroplasty (TJA), particularly total knee arthroplasty (TKA) and total hip arthroplasty (THA). Large, population based studies necessary to address healthcare disparities can be costly and difficult to perform, and may be compromised by sampling strategies and patient selection biases. Efficient alternatives are publicly-available nationally representative databases such as the HCUP State Inpatient Databases (SID) and National Inpatient Sample (NIS). The SID provide information on all patients admitted to hospitals within participating states, allowing for comparison of health care access among many vulnerable populations, across states, and over time. The NIS is the largest publicly-available all-payer inpatient health care database in the nation. It is sampled from the SID through a complex survey design, yielding national estimates of health care utilization, quality, and outcomes. A significant limitation of the NIS and the SID is the quantity of missing data. In particular, ?patient race?, a key indicator for health disparities research, has a high proportion of missingness. Multiple imputation (MI) approaches have been increasingly popular for providing sound statistical methods to account for missing data. When conducting MI, it is suggested that imputation models be as general as data allow them to be, in order to accommodate a wide range of subsequent analyses of imputed data sets. This requires all relationships that are going to be investigated in any subsequent analysis, such as nonlinearities and interactions, to be included in the imputation model. Unfortunately, traditional MI methods, such as the multivariate imputation by chained equations (MICE), are built on parametric imputation models. These models are often not flexible enough to capture interactions and nonlinearities in high dimensional and large scale data settings. Unlike parametric models, machine learning techniques (MLTs) are model-free methods, and thus provide flexibility for missing data imputation. MLTs use algorithms that automatically and iteratively learn from all data to detect statistical dependencies in observations without being explicitly programmed where to look. The goal of this study is to make the two HCUP databases a more useful resource for the study of surgical disparities and other areas of medicine. Accordingly, we propose novel MI methods based on MLTs to impute missing data in the SID and the NIS, and to use the imputed datasets to measure racial disparity in TKA.