Abstract The prefrontal cortex is thought to play a crucial role in cognitive flexibility, in part by updating a person's expectations about the external world and the likely consequences of candidate actions based on the feedback gained from past actions. Deficits in this form of cognition occur in multiple psychiatric conditions in which prefrontal cortex is implicated. Despite much research, the mechanisms by which prefrontal neural circuits contribute to flexible decision-making and switches in cognitive strategy remain unclear. We will examine these issues using reinforcement learning theory, which specifies the optimal strategies for selecting future actions given a subject's past history of actions taken and rewards received. We will first gather the largest set of multi- neuronal recordings ever taken in prefrontal cortex, and then use reinforcement learning theory to analyze the data and deduce the circuit mechanisms by which the prefrontal cortex stores and updates its internal beliefs about the external world and the likely results of future actions. Past studies in behaving animals have found evidence for individual prefrontal cells that, on average, encode information related to cognitive strategy and action selection, but with limited data it has not been possible to identify how prefrontal circuits maintain and update this information over the course of multiple decisions, actions and outcomes. To collect sufficient data and create better models of prefrontal circuits, we will use a miniature microscope enabling us to monitor large neural ensembles in active mice. Our goals are to: (1) Develop and validate an experimental paradigm for imaging the concurrent dynamics of hundreds of prefrontal cells in mice flexibly switching between two different strategies of spatial navigation. Our pilot data show mice can perform the task well, that prefrontal activity is crucial for strategy-switching, and that prefrontal cortex contains cells whose dynamics appear to signal estimates of the optimal strategy. We will verify mice can follow bona fide navigation strategies and not just memorize spatial paths that yield reward. We will also confirm the prefrontal cells stay healthy and have normal activity patterns throughout the multi-day experiment. (2) Use reinforcement learning theory to analyze our large datasets and create neural circuit models of how prefrontal cortex stores and updates its beliefs to guide future actions. Using the theory we will first create observer-actor models of mouse behavior. We will then apply supervised and unsupervised methods of data analysis to assess whether prefrontal neural ensembles encode task-related, abstract variables such as belief and value. Using our observer-actor models and analyses of neural dynamics, we will train recurrent neural network models to solve the strategy-switching task. The resulting circuit models of reinforcement learning will then yield testable predictions about how the mice and prefrontal cells should behave when we modify the task. Overall, our study will address key unanswered questions about prefrontal function and seeks to attain a mechanistic understanding of how prefrontal circuits contribute to flexible decision-making.