Treatment of cancer is an ongoing process during which clinicians make a series of therapeutic decisions over the course of the disease. However, while there is increasing interest in identifying the overall strategy of sequential decisions leading to the most beneficial clinical outcomes, where those decisions may be predicated on complex information on the patient up to that point, current cancer clinical trials evaluate only the therapeutic options available at a single decision point, mostly in a one-size-fits-all manner. Attempts to synthesize information from several isolated trials conducted at different milestones in the disease are problematic, because the best treatment at any one decision point may not be best when placed in the context of the entire decision process owing to possible delayed effects of past treatments on the efficacy of future treatments. Considering cancer treatment strategies as dynamic treatment regimes, which are formal algorithms for sequential decision-making that use accrued information on the patient at each decision point in an evidence-based manner to determine the next step of treatment, along with analytical reinforcement learning methods from computer science that provide a principled framework for identifying the optimal such regime, offers the potential to revolutionize how cancer treatment is viewed and effect a paradigm shift in the design and conduct of cancer clinical trials. The four specific aims of this project seek to catalyze this advance by studying these issues for the first time in the cancer treatment context. The first aim will evaluate various learning methods to establish the best techniques for use in developing optimal dynamic treatment regimes for cancer, and the second will focus on a specific version of this methodology when clinicians are interested in finding the best regime among a particular set of regimes. The third aim will develop new methods for making formal statistical inference on regimes developed based on data, which have been heretofore unavailable owing to the theoretical complexity of the problem. In the fourth aim, methods for design of so-called sequentially randomized trials for the specific purpose of developing dynamic treatment regimes, including determination of sample sizes that will ensure identification ofthe best regimes from among those in the trial, will be developed. Coupling trial design with learning methods for analysis, a new model, the clinical reinforcement trial, will be developed and applied to designing studies to identify optimal regimes for non-small cell lung cancer and other cancers. Collectively, these aims will result in high-impact, new methodology that will allow individualization of the therapy to the patient over time.