Project abstract According to the standard reinforcement learning framework, the basal ganglia implements estimation of long- term future reward and the control of actions to maximize future reward. Dopamine (DA) plays a central role by providing the learning signal (reward prediction error, or RPE) that guides updating of reward predictions and the action policy. Despite its success, the reinforcement learning framework has been challenged from a number of directions. Some studies have suggested that DA encodes reward predictions themselves, rather than reward prediction errors, and other studies have suggested that DA may play a role in invigorating action selection independently from its contribution to learning. A major goal of this project is to develop a reinforcement learning theory of basal ganglia function that addresses these challenges, and more broadly presents a unifying view of how learning, probabilistic inference, and action selection work together to produce adaptive behavior. Our theoretical innovation can be divided into three components. First, we argue that cortical inputs to the striatum encode a probability distribution over hidden states, known as the belief state. Second, we argue that striatal projection neurons transform this input through a set of basis functions, whose purpose is to facilitate reward prediction. The synaptic weights that parametrize these predictions are updated based on the DA RPE signal. Third, we argue that action selection circuits in the dorsal striatum use probabilistic information about rewards to implement uncertainty-guided exploration.