Project Summary/Abstract The routine operation of the US Healthcare system produces an abundance of electronically-stored data that captures the care of patients as it is provided in settings outside of controlled research environments. The potential for utilizing these data to inform future treatment choices and improve patient care and outcomes of all patients in the very system that generates the data is widely acknowledged. Given these key properties of the routine-care data and the abundance of electronic healthcare databases covering millions of patients, it is critical to strengthen the rigor of analyses of such data. Our group has previously developed an analytic approach to reduce bias when analyzing routine-care databases, which has proven effective in more than 50 empirical research studies across a range of topics and data sources. However, this approach currently cannot incorporate free-text information that is recorded in electronic health records, such as clinical notes and reports. This limitation has left a large amount of rich patient information underutilized for clinical research. We thus aim to adapt and refine a set of established computerized natural language processing algorithms that can identify and extract useful information from the clinical notes and reports in electronic health records and incorporate them into our validated analytical approach for balancing background risks of different comparison groups, a key step to ensure fair evaluation when comparing different therapeutic options. To test this newly integrated and augmented approach, we will implement and adapt it in simulation studies where we can evaluate and improve the performance of these new analytic methods in a controlled but realistic fashion. In addition, we will assess the performance of our new approach in 8 practical studies comparing medical or surgical treatments that are highly relevant to patients. To ensure highest level of data completeness and quality, we have linked multiple healthcare utilization (claims) databases, spanning from 2007 to 2016, with 3 electronic health records systems, including one each in Massachusetts, North Carolina, and Texas. This data will allow testing of our newly integrated approach in a variety of care delivery systems and data environments, which will be very informative for the application of our products in the real-world settings.