ABSTRACT Total joint arthroplasty (TJA) is the most common and fastest growing surgical procedure in the nation. Despite the high procedure volume, the evidence base for TJA procedures and associated interventions are limited. This is mainly due to lack of high quality data sources and the logistical difficulties associated with manually extracting TJA information from the unstructured text of the Electronic Health Records (EHR). Meanwhile, the rapid adoption of EHR and the advances in health information technology offer the potential to transform unstructured EHR notes into structured, codified format that can then be analyzed and shared with local and national arthroplasty registries and other agencies. We therefore propose to leverage unique data resources and natural language processing (NLP) technologies to build an informatics infrastructure for automated EHR data extraction and analysis. We will (1) develop a high performance, externally validated and user centric NLP- enabled algorithm for extraction of complex TJA-specific data elements from the structured and unstructured text of the EHR, (2) validate the algorithm externally in multiple EHR platforms and hospital settings, and (3) conduct a demonstration project focused on prediction of prosthetic joint infections using data elements collected by the NLP-enabled algorithm. Our overarching goal is to develop valid, open source and portable NLP-enabled data collection and risk prediction tools and disseminate them widely to hospitals participating in regional and national TJA registries. This research is significant as it leverages strong data resources and expertise to tackle the pressing need for high quality data and accurate prediction models in TJA. Automated data collection and processing capabilities will lead to an upsurge in secondary use of EHR to advance scientific knowledge on TJA risk factors, healthcare quality and patient outcomes. Accurate prediction of high risk patients for prosthetic joint infections will guide prevention and treatment decisions resulting in significant health benefits to TJA patients. The research is innovative because TJA-specific bioinformatics technology will shift TJA research from current under-powered, single-center studies to large, multi-center registry-based observational studies and clinical trials. Our deliverables have the potential to exert a sustained downstream effect on future TJA research, practice and policy.