The ability to predict future outcomes increases the fitness of the animal. Decades of research have shown that dopamine neurons broadcast reward prediction error (RPE) signals—the discrepancy between actual and predicted reward—to drive learning to predict future outcomes. Recent studies have begun to show, however, that dopamine neurons are more diverse than previously thought. In this review, we will summarize a series of our studies that have shown unique properties of dopamine neurons projecting to the posterior “tail” of the striatum (TS) in terms of anatomy, activity, and function. Specifically, TS-projecting dopamine neurons are activated by a subset of negative events including threats from a novel object, send prediction errors for external threats, and reinforce avoidance behaviors. These results indicate that there are at least two axes of dopamine-mediated reinforcement learning in the brain—one learning from canonical RPEs and another learning from threat prediction errors. We argue that the existence of multiple learning systems is an adaptive strategy that makes possible each system optimized for its own needs. The compartmental organization in the mammalian striatum resembles that of a dopamine-recipient area in insects (mushroom body), pointing to a principle of dopamine function conserved across phyla.
Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. It has been proposed that when sensory data provide only ambiguous information about which state an animal is in, it can predict reward based on a set of probabilities assigned to hypothetical states (called the belief state). Here we examine how dopamine RPEs and subsequent learning are regulated under state uncertainty. Mice are first trained in a task with two potential states defined by different reward amounts. During testing, intermediate-sized rewards are given in rare trials. Dopamine activity is a non-monotonic function of reward size, consistent with RL models operating on belief states. Furthermore, the magnitude of dopamine responses quantitatively predicts changes in behavior. These results establish the critical role of state inference in RL.
Midbrain dopamine neurons are well known for their role in reward-based reinforcement learning. We found that the activity of dopamine axons in the posterior tail of the striatum (TS) scaled with the novelty and intensity of external stimuli, but did not encode reward value. We demonstrated that the ablation of TS-projecting dopamine neurons specifically inhibited avoidance of novel or high-intensity stimuli without affecting animals’ initial avoidance responses, suggesting a role in reinforcement rather than simply in avoidance itself. Furthermore, we found that animals avoided optogenetic activation of dopamine axons in TS during a choice task and that this stimulation could partially reinstate avoidance of a familiar object. These results suggest that TS-projecting dopamine neurons reinforce avoidance of threatening stimuli. More generally, our results indicate that there are at least two axes of reinforcement learning using dopamine in the striatum: one based on value and one based on external threat.
Animals make predictions based on currently available information. In natural settings, sensory cues may not reveal complete information, requiring the animal to infer the “hidden state” of the environment. The brain structures important in hidden state inference remain unknown. A previous study showed that midbraindopamine neurons exhibit distinct response patterns depending on whether reward is delivered in 100% (task 1) or 90% of trials (task 2) in a classical conditioning task. Here we found that inactivation of the medial prefrontal cortex (mPFC) affected dopaminergic signaling in task 2, in which the hidden state must be inferred (“will reward come or not?”), but not in task 1, where the state was known with certainty. Computational modeling suggests that the effects of inactivation are best explained by a circuit in which the mPFC conveys inference over hidden states to the dopamine system.
Parenting is essential for the survival and wellbeing of mammalian offspring. However, we lack a circuit-level understanding of how distinct components of this behaviour are coordinated. Here we investigate how galanin-expressing neurons in the medial preoptic area (MPOAGal) of the hypothalamus coordinate motor, motivational, hormonal and social aspects of parenting in mice. These neurons integrate inputs from a large number of brain areas and the activation of these inputs depends on the animal's sex and reproductive state. Subsets of MPOAGal neurons form discrete pools that are defined by their projection sites. While the MPOAGalpopulation is active during all episodes of parental behaviour, individual pools are tuned to characteristic aspects of parenting. Optogenetic manipulation of MPOAGal projections mirrors this specificity, affecting discrete parenting components. This functional organization, reminiscent of the control of motor sequences by pools of spinal cord neurons, provides a new model for how discrete elements of a social behaviour are generated at the circuit level.
Although modified rabies viruses have emerged as a powerful tool for tracing the inputs to genetically defined populations of neurons, the toxicity of the virus has limited its utility. A recent study employed a self-inactivating rabies (SiR) virus that enables recording or manipulation of targeted neurons for months.
Dopamine neurons facilitate learning by calculating reward prediction error, or the difference between expected and actual reward. Despite two decades of research, it remains unclear how dopamine neurons make this calculation. Here we review studies that tackle this problem from a diverse set of approaches, from anatomy to electrophysiology to computational modeling and behavior. Several patterns emerge from this synthesis: that dopamine neurons themselves calculate reward prediction error, rather than inherit it passively from upstream regions; that they combine multiple separate and redundant inputs, which are themselves interconnected in a dense recurrent network; and that despite the complexity of inputs, the output from dopamine neurons is remarkably homogeneous and robust. The more we study this simple arithmetic computation, the knottier it appears to be, suggesting a daunting (but stimulating) path ahead for neuroscience more generally.
Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a 'belief state'). Here we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling showed a notable difference between two tasks that differed only with respect to whether reward was delivered in a deterministic manner. Our results favor an associative learning rule that combines cached values with hidden-state inference.
Our motor outputs are constantly re-calibrated to adapt to systematic perturbations. This motor adaptation is thought to depend on the ability to form a memory of a systematic perturbation, often called an internal model. However, the mechanisms underlying the formation, storage, and expression of such models remain unknown. Here, we developed a mouse model to study forelimb adaptation to force field perturbations. We found that temporally precise photoinhibition of somatosensory cortex (S1) applied concurrently with the force field abolished the ability to update subsequent motor commands needed to reduce motor errors. This S1 photoinhibition did not impair basic motor patterns, post-perturbation completion of the action, or their performance in a reward-based learning task. Moreover, S1 photoinhibition after partial adaptation blocked further adaptation, but did not affect the expression of already-adapted motor commands. Thus, S1 is critically involved in updating the memory about the perturbation that is essential for forelimb motor adaptation.
Dopamine neurons are thought to encode novelty in addition to reward prediction error (the discrepancy between actual and predicted values). In this study, we compared dopamine activity across the striatum using fiber fluorometry in mice. During classical conditioning, we observed opposite dynamics in dopamine axon signals in the ventral striatum (‘VS dopamine’) and the posterior tail of the striatum (‘TS dopamine’). TS dopamine showed strong excitation to novel cues, whereas VS dopamine showed no responses to novel cues until they had been paired with a reward. TS dopamine cue responses decreased over time, depending on what the cue predicted. Additionally, TS dopamine showed excitation to several types of stimuli including rewarding, aversive, and neutral stimuli whereas VS dopamine showed excitation only to reward or reward-predicting cues. Together, these results demonstrate that dopamine novelty signals are localized in TS along with general salience signals, while VS dopamine reliably encodes reward prediction error.
Dopamine neurons encode the difference between actual and predicted reward, or reward prediction error (RPE). Although many models have been proposed to account for this computation, it has been difficult to test these models experimentally. Here we established an awake electrophysiological recording system, combined with rabies virusand optogenetic cell-type identification, to characterize the firing patterns of monosynaptic inputs to dopamine neurons while mice performed classical conditioningtasks. We found that each variable required to compute RPE, including actual and predicted reward, was distributed in input neurons in multiple brain areas. Further, many input neurons across brain areas signaled combinations of these variables. These results demonstrate that even simple arithmetic computations such as RPE are not localized in specific brain areas but, rather, distributed across multiple nodes in a brain-wide network. Our systematic method to examine both activity and connectivity revealed unexpected redundancy for a simple computation in the brain.
Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments.
Neurons in higher cortical areas, such as the prefrontal cortex, are often tuned to a variety of sensory and motor variables, and are therefore said to display mixed selectivity. This complexity of single neuron responses can obscure what information these areas represent and how it is represented. Here we demonstrate the advantages of a new dimensionality reduction technique, demixed principal component analysis (dPCA), that decomposes population activity into a few components. In addition to systematically capturing the majority of the variance of the data, dPCA also exposes the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards. To illustrate our method we reanalyze population data from four datasets comprising different species, different cortical areas and different experimental tasks. In each case, dPCA provides a concise way of visualizing the data that summarizes the task-dependent features of the population response in a single figure.
Dopamine neurons are thought to signal reward prediction error, or the difference between actual and predicted reward. How dopamine neurons jointly encode this information, however, remains unclear. One possibility is that different neurons specialize in different aspects of prediction error; another is that each neuron calculates prediction error in the same way. We recorded from optogenetically identified dopamine neurons in the lateral ventral tegmental area (VTA) while mice performed classical conditioning tasks. Our tasks allowed us to determine the full prediction error functions of dopamine neurons and compare them to each other. We found marked homogeneity among individual dopamine neurons: their responses to both unexpected and expected rewards followed the same function, just scaled up or down. As a result, we were able to describe both individual and population responses using just two parameters. Such uniformity ensures robust information coding, allowing each dopamine neuron to contribute fully to the prediction error signal.
Dopamine neurons are thought to facilitate learning by comparing actual and expected reward. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain. Furthermore, selectively exciting and inhibiting neighbouring GABA (γ-aminobutyric acid) neurons in the ventral tegmental area reveals that these neurons are a source of subtraction: they inhibit dopamine neurons when reward is expected, causally contributing to prediction-error calculations. Finally, bilaterally stimulating ventral tegmental area GABA neurons dramatically reduces anticipatory licking to conditioned odours, consistent with an important role for these neurons in reinforcement learning. Together, our results uncover the arithmetic and local circuitry underlying dopamine prediction errors.