Title: Improving comparative effectiveness research through electronic health records continuity cohorts PI: Joshua Lin, MD, MPH Abstract (about 30 lines) Epidemiologic analyses of health care data can provide critical evidence on the effectiveness and safety of therapeutics in the routine care setting since clinical trials often exclude frail and older patients who are the primary consumers of most medications. Electronic health record (EHR) databases contain rich clinical information vital for many comparative effectiveness studies and have been increasingly used for drug research. There are currently more than 50 EHR-based research networks in the US. It is thus critical to understand how we can conduct valid comparative clinical studies with EHR data. However, other than few highly integrated plans, most US EHR systems do not have comprehensive capture of medical encounters across the care continuum and may miss substantial amounts of information. Exposures, co-morbidities, and health outcomes that are recorded at a clinic or hospital outside of a given EHR system are invisible to the investigator, increasing misclassification or complete omission of essential variables. While such issues are pervasive, no prior study has ever quantified the magnitude of resultant bias and how to remedy the situation if linkage of more information is not feasible. To address this knowledge gap, we have combined longitudinal claims data from Medicare with EHR patient data from a large multi-center health care system as a `gold standard' setup where the claims data comprehensively capture medical information across care settings and provider systems and EHR provides necessary clinical data. We will (1) use these `gold standard' data to identify `EHR continuity cohorts' for whom the EHR system captures a high proportion of all encounters and evaluate whether misclassification/omission of a list of essential variables in the comparative effectiveness research is substantially reduced within vs outside of the EHR continuity cohort; (2) develop strategies to identify the EHR continuity cohort based on a set of proxy indicators available in typical EHR databases and validate the candidate prediction rules internally in a sample within the given EHR and externally using a second EHR system that is also linked to Medicare claims data; (3) assess research validity and generalizability in the EHR continuity cohorts in several empirical studies; and (4) Develop structured recommendation on how to conduct comparative effectiveness research using high-validity EHR continuity cohorts in an EHR system without linked claims data and make our program public available to facilitate future research using EHR-based research networks.