The ability to rapidly match the right patients to the right treatments at the right time is critical to ensuring patients receive high quality care. The vast majority of machine learning applications in healthcare focus on diagnosing or stratifying patients for a particular outcome. In contrast, reinforcement learning (RL) aims to learn how clinical states (i.e., sets of signs, symptoms, and test results) respond to specific sequences of treatments, with the goal of optimizing clinical outcomes. RL does not aim to diagnose, but infers diagnosis based on a patient's response to specific treatments--in many ways mimicking how clinicians operate in practice. This proposal will develop a novel clinician-in-the-loop reinforcement learning (RL) framework that analyzes electronic health record (EHR) clinical time-series data to support physician decision making, iteratively providing physicians the estimated outcome of potential treatment strategies. Our topic of focus for this work is the evaluation and treatment of patients hospitalized with acute dyspnea (shortness of breath) and signs of impending respiratory failure. Acute dyspnea is an ideal condition for an RL approach, since it can be due to three overlapping conditions: congestive heart failure, chronic obstructive pulmonary disease and pneumonia. Determining optimal treatment for these patients is clinically difficult, as a patient's presentation is frequently ambiguous, rapidly changing, and often due to multiple causes. Inappropriate treatment may occur in up to a third of patients leading to increased mortality. While developing this RL framework, we will also develop methods to learn more useful representations of high-dimensional clinical time-series data to improve the efficiency of RL model training. In addition, given the challenges of working with observational health data, we will develop new methods for evaluation of learned policies and develop new theory to better understand the limitations of RL using observational data. The project has four aims: 1) create a shareable, de-identified EHR time-series dataset of 35,000 patients with acute dyspnea, 2) develop techniques for exploiting invariances In tasks involving clinical time-series data to improve the efficiency of RL model training, 3) develop and evaluate an RL-based framework for learning optimal treatment policies for acute dyspnea, and 4) prospectively validate the learned treatment model. This research will result in new techniques for learning representations from time-series data and will study both the theoretical and practical limitations of RL using observational clinical data, leading to key advancements in ML and RL for clinical care. The tools developed for clinical decision support in this proposal have the potential for high impact because of their ability to generalize beyond the problem studied here to other conditions, laying the groundwork for clinical systems that directly impact society by aiding in the timely and appropriate treatment of patients.