Reward/punishment learning

consciousnessMemory, Attention & Decision-Making (Chapter 3. Reward and punishment-related learning; emotion and motivation)

Edmund Rolls, Oxford University

Oxford University Press (2008)

INTRODUCTION: Neuroscience and Freewill:  The author views the orbitofrontal region of the prefrontal cortex as the most important region for determining the value of rewards or punishers. Objects are first represented in the visual, somatosensory and other areas of the cortex without having any aspect of reward value. This only arises in the orbitofrontal and the amygdala. Studies show that orbitofrontal activity correlates to the subjective pleasantness of sensory inputs rather than the actual strength of the signal. The orbitofrontal appears to project its assessment of reward preferences to the dorsolateral prefrontal, the area responsible for planning in order to obtain rewards and in deciding to defer short-term rewards in favour of longer-term rewards. The orbitofrontal also projects to the basal ganglia which appear to integrate a variety of cortical and limbic inputs in order to drive behaviour. What is not discussed, but is apparent from the discussion in this book is that subjective emotional assessment mainly in the orbitofrontal has an important part in determining behaviour including that directed at planned or longer-term goals. This contradicts the mainstream consensus, based mainly on studies of trivial actions, that behaviour rises entirely from unconscious processing.

Emotions are here considered to be the result of reinforcing stimuli described as rewards or punishers. Some stimuli are primary reinforcers, so-called because they do not have to be learned. Other stimuli are initially neutral, but become secondary reinforcers because through learning they become associated with primary reinforcers. Genes are seen to specify goals, but behaviour is driven by the reward value of the goals. This reward process is argued to be implemented in the orbitofrontal region of the prefrontal cortex and the amygdala within the subcortical limbic system.

The brain is organised first to process a stimulus to the object level, and only after that to access its reward value. It has been found that the visual representations of objects in the inferior temporal cortex are neutral in terms of reward value, which only arises after projections to the orbitofrontal and amygdala. Similarly, representations of touch in the somatosensory cortex are without reward value until this area also projects to the orbitofrontal. Other sensory inputs such as taste and smell also gets as far as the primary cortex without there being any representation of reward value.

Primary reinforcers are represented in the orbitofrontal cortex, and this region and the amygdala are also involved in learning associations between initially neutral stimuli and the primary reinforcers.  Activation in the human orbitofrontal has been shown to be correlated to the subjective pleasantness of a stimuli rather than the actual strength of the signal. Experiments with the sense of touch demonstrate that the orbitofrontal responds to reward sensation more strongly than neutral sensation. Thus the orbitofrontal responds more to the touch of velvet than to a more intense pressure from wood. With taste, the orbitofrontal can represent the reward value of a particular taste, and this activation relates to subjective pleasantness. Studies of taste in particular are seen as evidence that aspects of emotion are represented in the orbitofrontal. The orbitofrontal is also thought to be involved in the subjective experience of pain. With faces, the activation of the orbitofrontal has been found to correlate to the subjective attractiveness of a face.

The orbitofrontal is also suggested to be involved in amending responses to stimuli that used to be associated with rewards but are no longer linked to these. Orbitofrontal neurons respond to the absence of expected rewards, and this appears to be part of the mechanism for rapid reversal of behaviour that is no longer rewarding. The orbitofrontal computes mismatches between stimuli that are expected and stimuli that are obtained and changes reward representations in accord with this. Damage to the orbitofrontal impairs the ability to make such changes, and is associated with irresponsible and impulsive behaviour, and difficulty in learning which stimuli are rewarding and which are not.

The dorsolateral region of the prefrontal is involved with working memory and attention and the executive functions of planning many steps ahead to obtain rewards, or of deferring a short-term reward in order to obtain a larger reward in the longer term. This relates to rationality and language. The dorsolateral can reflect preferences but in these cases the orbitofrontal reflects these preferences earlier. This is consistent with the hypothesis that expected reward value is represented in the orbitofrontal, but then projected to the dorsolateral, where it can be utilised for planning, for instance planning to obtain particular rewards or assessing the deferment of short-term reward for long-term reward.The overall conclusion of the author is that ‘moral-based’ knowledge generated by rewards and punishers cannot take place without the orbitofrontal.

The orbitofrontal, amygdala, dorsolateral frontal, hippocampus and other brain areas all project to the ventral striatum and other parts of the basal ganglia. The author takes the view that areas such as the dorsolateral and orbitofrontal do the crucial processing in preparation for actions to gain rewards or avoid punishers, while the basal ganglia areas are more important for preparing action on the basis of inputs from the prefrontal areas and the amygdala. The striatum may switch behaviour in response to these inputs, and its role may be to balance out a variety of sometimes conflicting inputs from different parts of the prefrontal and the limbic areas. Behaviour may be switched quickly if a higher priority is detected in the inputs received. Integration of differing signals is seen as an important part of the function of the basal ganglia. In particular inhibitory (GABA) activity in the striatum acts to modulate the excitatory inputs from the cortex and limbic system and possibly the action of dopamine is important in determining eventual behaviour.

The amygdala and orbitofrontal are concerned with learning associations between previously neutral stimuli and primary reinforcers. The brain is organised to process first to the level of recognising an object, and only after that does it use areas such as the orbitofrontal and the amygdala to determine expected reward values. From there the signal can be passed to output regions of the brain. These output regions include the autonomic and endocrine systems and bodily actions, but also systems such as the dorsofrontal that are involved in long-term planning and deferment of short-term rewards.

Activity in the orbitofrontal is shown to reflect changing preferences for particular stimuli. Some neurons in the dorsolateral, which is related to long-term planning, also reflect preferences for stimuli, but the orbitofrontal reflects them quicker, and is connected to bodily responses. Rolls suggests that the reward representation created in the orbitofrontal is projected to the dorsolateral prefrontal cortex for use in planning and decisions on the deferment of short-term rewards.


Tags: , , Posted by

Leave a Reply