Abstract Cardiovascular disease (CVD) treatment is often guided by risk stratification tools (to decide who to treat), and randomized controlled trials (to decide which treatments to select). Prior CVD research reveals two major obstacles to improving our treatment approach: (i) longitudinal cohort data are unavailable for recalibrating risk stratification tools for local-area estimation (by zip code), or for people with major CVD-promoting comorbidities (e.g., chronic kidney disease); and (ii) the average treatment effect in randomized trials can be highly erroneous when projected onto individuals that vary from the ?average? participant in a trial. CVD risk- stratification and treatment effect estimation can be improved and personalized if we overcome a critical barrier to progress: correctly estimating risk and treatment effect from new, large participant data repositories, which have greater population size and include patients with more co-morbid conditions than common cohort studies, and which permit personalized risk/benefit prediction tool development from individual-level data. Our prior studies show that we can critically advance the field by applying novel statistical learning methods to this data, to address: (i) false-positives from multiple testing; (ii) the reliance on standard regressions that cannot account for non-linear, complex interactions between factors; and (iii) identifying the optimal approach among many alternative statistical learning methods. We propose to apply our work in these areas to (Aim 1) Develop CVD risk stratification tools for patients with inadequate sample sizes in common cohort studies. We will enhance CVD risk stratification to include local-area adjustment (by zip code) and major co-morbid conditions affecting CVD risk (e.g., chronic kidney disease). We will additionally (Aim 2) develop personalized treatment effect prediction tools to guide decisions for CVD therapies with high potential benefit and risk, for therapies where individual participant data from trials are available. We have obtained the individual participant data from the large randomized trials that reveal wide variations in CVD risk reduction and serious adverse event risk increase from three drug classes: non-vitamin K antagonist oral anticoagulants, intensive blood pressure treatment, and sodium-glucose co-transporter 2 inhibitors for diabetes. Our preliminary research shows that traditional regression methods cannot distinguish which patients are most likely to benefit or be harmed by such therapies, but our statistical learning methods can. Finally, we will (Aim 3) develop open-source tools to improve the ability of researchers to choose an optimal statistical learning approach for their dataset and problem. While numerous statistical learning methods have been proposed in the literature, a key problem for biomedical scientists without access to RCT data is: which method should I use to estimate treatment effects from observational data? Building on our innovative approach to identify the optimal inference method for observational data, we will construct an open-source tool to compare methods, identifying which method most often results in optimal treatment decisions that minimize error and maximize performance on standardized metrics.