Problematic prescription opioid use, defined as nonmedical use, misuse, or abuse of opioid medications, is epidemic in the US. Prescription opioid overdose deaths more than quadrupled from 1999 to 2015. Efforts by health care systems and payers to combat the opioid epidemic are impeded by a lack of accurate and efficient methods to identify individuals most at risk for problematic opioid use and overdose, leading to broad interventions that are burdensome to patients and expensive for payers. Payers are currently defining high risk and targeting interventions (e.g. pharmacy lock-in programs) based on individual risk factors, such as high opioid dosage, identified in prior studies using traditional statistical approaches. However, these traditional approaches have significant limitations, especially when handling large datasets with numerous variables, multi-level interactions, and missing data. Moreover, the prior studies focused on identifying risk factors rather than predicting actual risk. Alternatively, machine learning is an advanced technique that handles complex interactions in large data, uncovers hidden patterns, and yields precise prediction algorithms that, in many cases, are superior to those developed using traditional methods. Machine learning is widely used in activities from fraud detection to cancer genomics, but has not yet been applied to address the opioid epidemic. Accordingly, the proposed study will apply machine learning to develop prediction algorithms that can more accurately identify patients at high risk of problematic opioid use and overdose using data sources that are readily available to payers and health care systems. The project will build on existing academic-state partnerships to apply novel machine learning approaches to administrative claims data for all Medicaid beneficiaries in Pennsylvania (PA) and Arizona (AZ). The project will also link Medicaid data in AZ to electronic health records to capture clinical information (e.g., lab results, pain severity) not available in administrative data, along with death certificate data on lethal overdose. These data, covering 2007-2016, will be used to achieve two specific aims: (1) to develop and validate two separate prediction algorithms to identify patients at risk of problematic opioid use and opioid overdose; (2) to compare the accuracy of a prediction algorithm that integrates clinical data with Medicaid claims versus a claims-based approach alone to identify patients at risk of problematic opioid use and opioid overdose. The machine learning approaches will include random forests and TreeNet with representative classification trees, and the predictive ability (e.g., misclassification rates) of these algorithms will be compared to traditional statistical models. Given the high prevalence of mental health/substance use disorders (~50%) and opioid utilization (>20%) among Medicaid enrollees and the lack of adequate prediction algorithms, Medicaid is an ideal setting for the proposed project. These analyses will provide the partnering Medicaid programs with valuable information and tools that they can apply to more precisely target interventions to prevent problematic opioid use and overdose.