ABSTRACT CVD disparities across the race/ethnic and socioeconomic gradients are exacerbated by barriers to receiving guideline-based primary or secondary preventive treatment?such as appropriate blood pressure, dyslipidemia, and type 2 diabetes treatment.1?6 Most patients not receiving guideline-based treatment are actually insured and have seen a primary care provider in the past year.1 To reduce CVD disparities by better targeting disease prevention and treatment, healthcare administrators and county departments of public health have begun pooling data resources across healthcare and public health systems?such as across clinics, emergency rooms and hospitals, pharmacies, laboratories, and administrative datasets.7?9 The idea behind pooling such datasets is to better identify persons most in need, and direct targeted interventions to them. Solano County, California, a medium-sized, diverse, low-income county, has developed one of the first, and most comprehensive health information exchanges (HIEs), including: (i) electronic health record data from all emergency rooms, hospitals, primary care and specialty clinics and care facilities in the county; (ii) labs from all laboratory service providers; (iii) prescription details from all pharmacies; (iv) validated social determinants of health surveys administered in clinics; and (v) administrative datasets, including welfare, disability, housing, and geocoded neighborhood features. While several other counties are following suit to develop large, secure HIEs across healthcare and public health systems, a key challenge remains: how to cheaply, accurately, and rapidly analyze HIEs to identify which persons should be targeted for interventions. Without reliable, user- friendly, cost-effective, and generalizable data analysis programs, counties are unable to use the massive data at their disposal to address preventable causes of morbidity and mortality. The objective of this application is to apply our unique machine-learning innovations to develop open-source programs that can enable counties to identify persons at high risk for preventable CVD events and deaths. We will test the hypothesis that electronic health record data alone are insufficient to provide accurate risk prediction for preventable CVD events and deaths. Rather, we believe that key survey and administrative data providing information on social determinants of health will improve identification of high-risk patients. To test our hypothesis, we will develop and validate open-source, generalizable programs to: (Aim 1) rapidly identify persons in need of improved primary and secondary prevention of CVD by systematically comparing the performance of three alternative machine learning approaches to read HIE data, as compared to human clinician chart reviewers; and (Aim 2) perform multi-level risk assessment by automatically calibrating and validating models of CVD event risk, utilization and cost to HIE data, to identify the added value of administrative and social determinants data as compared to clinical or claims-based data alone. Our work will produce generalizable software tools for counties across the country to analyze HIE data and reduce preventable CVD disparities.