Eliminating healthcare disparities so underserved communities (e.g., minorities, elderly, low income) and other AHRQ priority populations are assured access to quality medical care remains a national priority. Large, population based studies necessary to address healthcare disparities can be costly and difficult to perform, and may be compromised by sampling strategies and patient selection biases, an efficient alternative that is becoming increasingly attractive is the use of the Healthcare Cost & Utilizatio Project (HCUP) State Inpatient Databases (SID). A significant limitation of SID and other large databases is the quantity of missing data. In particular, patient race, a key indicator for healt disparities research, has a high proportion of missingness. The goal of this study is to make SID a more useful and reliable resource for the study of racial disparity. Accordingly, two multiple imputation (MI) methods (1) the sequential regression multivariate imputation, and (2) the latent normal multivariate imputation are proposed for addressing the missing data issue in the SID. These approaches will be compared through a comprehensive simulation study. Their advantages over the three commonly used missing data approaches (i.e. complete case analysis, missing indicator method, hot deck imputation) will also be illustrated through the simulation study. Based on the simulation, we will select the optimal MI method for imputation. As a result, multiply imputed datasets will be generated as a companion to the SID that will allow users to perform analysis using existing software for complete data for a wide range of substantive research questions. We will use imputed SID data to conduct musculoskeletal healthcare disparities research. The application is to determine whether race is a risk factor for set of adverse outcomes after total knee replacement (TKR) and whether racial disparities exist in utilization of TKR.