Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a 'belief state'). Here we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling showed a notable difference between two tasks that differed only with respect to whether reward was delivered in a deterministic manner. Our results favor an associative learning rule that combines cached values with hidden-state inference.
Our motor outputs are constantly re-calibrated to adapt to systematic perturbations. This motor adaptation is thought to depend on the ability to form a memory of a systematic perturbation, often called an internal model. However, the mechanisms underlying the formation, storage, and expression of such models remain unknown. Here, we developed a mouse model to study forelimb adaptation to force field perturbations. We found that temporally precise photoinhibition of somatosensory cortex (S1) applied concurrently with the force field abolished the ability to update subsequent motor commands needed to reduce motor errors. This S1 photoinhibition did not impair basic motor patterns, post-perturbation completion of the action, or their performance in a reward-based learning task. Moreover, S1 photoinhibition after partial adaptation blocked further adaptation, but did not affect the expression of already-adapted motor commands. Thus, S1 is critically involved in updating the memory about the perturbation that is essential for forelimb motor adaptation.
Dopamine neurons are thought to encode novelty in addition to reward prediction error (the discrepancy between actual and predicted values). In this study, we compared dopamine activity across the striatum using fiber fluorometry in mice. During classical conditioning, we observed opposite dynamics in dopamine axon signals in the ventral striatum (‘VS dopamine’) and the posterior tail of the striatum (‘TS dopamine’). TS dopamine showed strong excitation to novel cues, whereas VS dopamine showed no responses to novel cues until they had been paired with a reward. TS dopamine cue responses decreased over time, depending on what the cue predicted. Additionally, TS dopamine showed excitation to several types of stimuli including rewarding, aversive, and neutral stimuli whereas VS dopamine showed excitation only to reward or reward-predicting cues. Together, these results demonstrate that dopamine novelty signals are localized in TS along with general salience signals, while VS dopamine reliably encodes reward prediction error.
Dopamine neurons encode the difference between actual and predicted reward, or reward prediction error (RPE). Although many models have been proposed to account for this computation, it has been difficult to test these models experimentally. Here we established an awake electrophysiological recording system, combined with rabies virusand optogenetic cell-type identification, to characterize the firing patterns of monosynaptic inputs to dopamine neurons while mice performed classical conditioningtasks. We found that each variable required to compute RPE, including actual and predicted reward, was distributed in input neurons in multiple brain areas. Further, many input neurons across brain areas signaled combinations of these variables. These results demonstrate that even simple arithmetic computations such as RPE are not localized in specific brain areas but, rather, distributed across multiple nodes in a brain-wide network. Our systematic method to examine both activity and connectivity revealed unexpected redundancy for a simple computation in the brain.
Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments.
Neurons in higher cortical areas, such as the prefrontal cortex, are often tuned to a variety of sensory and motor variables, and are therefore said to display mixed selectivity. This complexity of single neuron responses can obscure what information these areas represent and how it is represented. Here we demonstrate the advantages of a new dimensionality reduction technique, demixed principal component analysis (dPCA), that decomposes population activity into a few components. In addition to systematically capturing the majority of the variance of the data, dPCA also exposes the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards. To illustrate our method we reanalyze population data from four datasets comprising different species, different cortical areas and different experimental tasks. In each case, dPCA provides a concise way of visualizing the data that summarizes the task-dependent features of the population response in a single figure.
Dopamine neurons are thought to signal reward prediction error, or the difference between actual and predicted reward. How dopamine neurons jointly encode this information, however, remains unclear. One possibility is that different neurons specialize in different aspects of prediction error; another is that each neuron calculates prediction error in the same way. We recorded from optogenetically identified dopamine neurons in the lateral ventral tegmental area (VTA) while mice performed classical conditioning tasks. Our tasks allowed us to determine the full prediction error functions of dopamine neurons and compare them to each other. We found marked homogeneity among individual dopamine neurons: their responses to both unexpected and expected rewards followed the same function, just scaled up or down. As a result, we were able to describe both individual and population responses using just two parameters. Such uniformity ensures robust information coding, allowing each dopamine neuron to contribute fully to the prediction error signal.
Dopamine neurons are thought to facilitate learning by comparing actual and expected reward. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain. Furthermore, selectively exciting and inhibiting neighbouring GABA (γ-aminobutyric acid) neurons in the ventral tegmental area reveals that these neurons are a source of subtraction: they inhibit dopamine neurons when reward is expected, causally contributing to prediction-error calculations. Finally, bilaterally stimulating ventral tegmental area GABA neurons dramatically reduces anticipatory licking to conditioned odours, consistent with an important role for these neurons in reinforcement learning. Together, our results uncover the arithmetic and local circuitry underlying dopamine prediction errors.
Haroush and Williams trained pairs of monkeys to play in a prisoner’s dilemma game, a model of social interactions. Recording from the dorsal anterior cingulate cortex (dACC), they find neurons whose activity reflects the anticipation of the opponent’s yet unknown choice, which may be important in guiding animals’ performance in the game.
Serotonin's function in the brain is unclear. One challenge in testing the numerous hypotheses about serotonin's function has been observing the activity of identified serotonergic neurons in animals engaged in behavioral tasks. We recorded the activity of dorsal raphe neurons while mice experienced a task in which rewards and punishments varied across blocks of trials. We 'tagged' serotonergic neurons with the light-sensitive protein channelrhodopsin-2 and identified them based on their responses to light. We found three main features of serotonergic neuron activity: (1) a large fraction of serotonergic neurons modulated their tonic firing rates over the course of minutes during reward versus punishment blocks; (2) most were phasically excited by punishments; and (3) a subset was phasically excited by reward-predicting cues. By contrast, dopaminergic neurons did not show firing rate changes across blocks of trials. These results suggest that serotonergic neurons signal information about reward and punishment on multiple timescales.
Serotonin and dopamine are major neuromodulators. Here, we used a modified rabies virus to identify monosynaptic inputs to serotonin neurons in the dorsal and median raphe (DR and MR). We found that inputs to DR and MR serotonin neurons are spatially shifted in the forebrain, and MR serotonin neurons receive inputs from more medial structures. Then, we compared these data with inputs to dopamine neurons in the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc). We found that DR serotonin neurons receive inputs from a remarkably similar set of areas as VTA dopamine neurons apart from the striatum, which preferentially targets dopamine neurons. Our results suggest three major input streams: a medial stream regulates MR serotonin neurons, an intermediate stream regulates DR serotonin and VTA dopamine neurons, and a lateral stream regulates SNc dopamine neurons. These results provide fundamental organizational principles of afferent control for serotonin and dopamine.
How is sensory information represented in the brain?Along-standing debate in neural coding is whether and how timing of spikes conveys information to downstream neurons. Although we know that neurons in the olfactory bulb (OB) exhibit rich temporal dynamics, the functional relevance of temporal coding remains hotly debated. Recent recording experiments in awake behaving animals have elucidated highly organized temporal structures of activity in the OB. In addition, the analysis of neural circuits in the piriform cortex (PC) demonstrated the importance of not only OB afferent inputs but also intrinsicPCneural circuits in shaping odor responses. Furthermore, new experiments involving stimulation of the OB with specific temporal patterns allowed for testing the relevance of temporal codes. Together, these studies suggest that the relative timing of neuronal activity in the OB conveys odor information and that neural circuits in the PC possess various mechanisms to decode temporal patterns of OB input.
Hunger is a hard-wired motivational state essential for survival. Agouti-related peptide (AgRP)-expressing neurons in the arcuate nucleus (ARC) at the base of the hypothalamus are crucial to the control of hunger. They are activated by caloric deficiency and, when naturally or artificially stimulated, they potently induce intense hunger and subsequent food intake. Consistent with their obligatory role in regulating appetite, genetic ablation or chemogenetic inhibition of AgRP neurons decreases feeding. Excitatory input to AgRP neurons is important in caloric-deficiency-induced activation, and is notable for its remarkable degree of caloric-state-dependent synaptic plasticity. Despite the important role of excitatory input, its source(s) has been unknown. Here, through the use of Cre-recombinase-enabled, cell-specific neuron mapping techniques in mice, we have discovered strong excitatory drive that, unexpectedly, emanates from the hypothalamic paraventricular nucleus, specifically from subsets of neurons expressing thyrotropin-releasing hormone (TRH) and pituitary adenylate cyclase-activating polypeptide (PACAP, also known as ADCYAP1). Chemogenetic stimulation of these afferent neurons in sated mice markedly activates AgRP neurons and induces intense feeding. Conversely, acute inhibition in mice with caloric-deficiency-induced hunger decreases feeding. Discovery of these afferent neurons capable of triggering hunger advances understanding of how this intense motivational state is regulated.
Mice display robust, stereotyped behaviours towards pups: virgin males typically attack pups, whereas virgin females and sexually experienced males and females display parental care. Here we show that virgin males genetically impaired in vomeronasal sensing do not attack pups and are parental. Furthermore, we uncover a subset of galanin-expressing neurons in the medial preoptic area (MPOA) that are specifically activated during male and female parenting, and a different subpopulation that is activated during mating. Genetic ablation of MPOA galanin neurons results in marked impairment of parental responses in males and females and affects male mating. Optogenetic activation of these neurons in virgin males suppresses inter-male and pup-directed aggression and induces pup grooming. Thus, MPOA galanin neurons emerge as an essential regulatory node of male and female parenting behaviour and other social responses. These results provide an entry point to a circuit-level dissection of parental behaviour and its modulation by social experience.
Normalizing neural responses by the sum of population activity allows the nervous system to adjust its sensitivity according to task demands, facilitating intensity-invariant information processing. In this issue of Neuron, two studies, Kato et al. (2013) and Miyamichi et al. (2013), suggest that parvalbumin-positive interneurons in the olfactory bulb play a role in this process.