Project Summary: The cyclical and heterogeneous nature of many substance use disorders highlights the need to adapt the type or the dose of treatment to accommodate the specific and changing needs of individuals. This proposal is motivated by the Extending Treatment Effectiveness of Naltrexone (ExTENd) trial, a sequential multiple assignment randomized trial (SMART) designed to find a (personalized) rescue treatment for those who are non-responsive to initial Naltrexone. One of the main challenges in this trial is the presence of the many variables available for consideration when making treatment decisions at each stage of the trial. This feature has made it virtually impossible for investigators to fully explore the possibility of building high quality treatment strategies using the data. Our overarching aim is to address this particular challenge through developing and subsequently applying new statistical methods to the ExTENd trial data. A SMART trial is a multi-stage trial that can inform the design of a dynamic treatment regime (DTR) which formalizes an individualized treatment plan and where current treatment strategy can depend on a patient's past medical and treatment history. An optimal DTR is one that maximizes a specified health outcome of interest. Q-learning can be used with data from both SMARTs and observational studies to estimate an optimal DTR. However, like other model-based approaches, model misspecification can seriously affect the results and lead to the identification of suboptimal DTRs. The potential for misspecification increases with the number of variables that may influence treatment decisions through, e.g., incorrect assumptions on the relationship of variables to the outcome and the inclusion (exclusion) of unimportant (important) variables. These features represent the main analytical challenges for the ExTENd trial. We propose a new approach to Q-learning that leverages machine learning approaches to reduce the chances of misspecifying the relationship between the expected outcome and a given set of variables. We also develop a variable selection technique specifically designed for Q-learning that enables investigators to select the important variables from a long list of possibilities (e.g., genetic and demographic information, medical history over time) when estimating an optimal DTR. In both settings, we will develop new methods for conducting valid inferences (e.g., confidence intervals and p-values), including when there exist patients for whom treatment is neither beneficial nor harmful at a given decision stage (i.e., when an important technical assumption, ?uniqueness?, is violated). Finally, we will develop easy-to-use, publicly available software in the R language that implements our methods. This will allow re-analysis of the ExTENd trial data with a goal of constructing a DTR that improves upon the current rescue treatment strategy for those non-responsive to initial Naltrexone. It will also provide an expandable platform that will assist researchers in developing new optimal DTRs for patients suffering from alcoholism and other substance use disorders.