Face Perception in Context:
The Neural Representation of Value Learning in the Human Brain
Mona Zimmermann
11099119
Internship Report 2020
Scholte Lab
Abstract
The context in which we interact with other humans can influence how we perceive
their faces and ultimately how we behave towards our peers. One such context is affective
learning, which is the process of assigning value to a face according to past (rewarding or
punishing) outcomes associated with that face (Bliss-Moureau, Barrett & Wright, 2008). To
date it is unclear how exactly the processing of faces is influenced by such learning. It is
unclear how the brain tracks the value of faces and at what stages of face processing, i.e. at a
perceptual or higher level, such value tracking is evident. To address these questions this
functional MRI study used representational similarity analysis (RSA) to test how five models
of value learning relate to the neural representations of faces in 10 ROIs (number of subjects:
5). Results showed that the Rescorla Wagner (RW) rule frequently fit the neural data best in
areas such as the MFG, SFG and PCC. Interestingly, also in the visual cortex (OCC) the
representational structure of two models of value learning were able to explain the neural
data. Results are discussed in light of recent studies involving the RW rule and hypotheses of
enhanced visual processing of beneficial faces.
Introduction
Humans are experts at perceiving faces. Within milliseconds of viewing a face we are
able to understand a person’s emotional state (De Sonneville et al., 2002; Streit et al., 2003),
can determine whether he or she is familiar or not (Dobs, Isik, Pantazis & Kanwisher, 2019)
and retrieve or make inferences about the characteristics of that person (Todorov, Gobbini,
evans & Haxby, 2007). This rapid processing enables us to interact with our peers in a
sensible and dynamic way and is crucial for effective social communication (Jack & Schyns,
2015).
The contexts in which we interact with others has an effect on how we express
ourselves and perceive our peer’s faces (Barret, Adolphs, Marsella, Martinez & Pollak, 2019;
Martinez, 2019; but see Stein et al., 2017). Here, context is defined as the conditions that
make up and influence the interactions between people. For example, the physical
environment is a rich source of information for us to interpret another person’s emotional
state (Martinez, 2019). But also, the knowledge we form about another person or learning the
affective value of another person based on previous (positive/negative) experiences (i.e.
affective learning; Bliss-Moureau, Barrett & Wright, 2008) might influence the way in which
we perceive and interpret another person’s face (e.g. Suess, Rabovsky & Rahman, 2015).
Therefore, learning about another person – e.g. learning about their characteristics or forming
associations with a person through experience – can be one such form of context. The
overarching aim of the here presented research was to understand how context in the form of
affective learning influences face perception.
Behavioral and neuroimaging studies have shown that learning about another person’s
past positive or negative behaviors or associating specific experiences with a person has an
effect on the perception and processing of their face (e.g. Bliss-Moreau et al., 2008; Morel,
Beaucousin, Perrin & George, 2012; Petrovic, Kalisch, Pessiglione, Singer & Dolan, 2008;
Suess et al., 2015). For example, in a study conducted by Suess et al. (2015), subjects viewed
neutral faces and rated the valence of their facial expressions before and after associating
positive, negative and neutral behaviors with that person. Neutral faces that were paired with
negative stories were significantly rated as depicting more negative facial expressions than
neutral faces paired with neutral stories (Suess et al., 2015). This effect illustrates that
learning about a person’s past behavior influences the subject’s perception of the facial
processing of emotional facial expressions (Suess et al., 2015). Similarly, studies using
classical conditioning paradigms pairing neutral faces with aversive stimuli, have found both
behavioral and neural evidence for a changed processing of such faces after learning
(Petrovic et al., 2008; Visser, Scholte & Kindt, 2011). For example, Petrovic (2008) found
that subjects rated faces that were paired with an aversive stimulus as less likeable after
conditioning, while rating faces that were not paired with an aversive stimulus as more
likeable after conditioning. This effect was present even in participants that did not remember
which face was paired with an aversive stimulus (Petrovic et al., 2008). Furthermore, the
neural activations in areas involved in the emotional processing of faces changed as a
function of value attributed to such faces after classical conditioning (Petrovic et al., 2008).
Overall, these findings indicate that context in the form of affective learning has an
effect on the processing of faces that is evident on both the behavioral and neural level
(Petrovic et al., 2008; Suess et al., 2015). However, the specific mechanisms by which
context influences such information processing are still poorly understood (Petrovic et al.,
2008). Firstly, it is unclear how the brain calculates and keeps track of the value associated
with a specific face (Petrovic et al., 2008). Secondly, how such value attribution is encoded in
the brain and thus influences the neural representation of faces (i.e. changes information
processing) and at what level remains to be elucidated. Therefore, this study set out to answer
the following research question:
How is the value of faces computed and updated during affective learning and what
are the neural correlates of such mechanisms?
To answer this question, the multivariate analysis technique of representational
similarity analysis (RSA) was used. This method allows to investigate the manner in which
the brain represents information. This representation can then be compared to possible
information processing underlying affective learning and face perception. (Kriegeskorte, &
Kiviet, 2013).
Methods
The here presented research was part of the ‘FEED: Facial expression encoding and
decoding’ project and used data obtained from the reinforcement learning session of that
project.
Participants
Thirteen healthy subjects (six males and seven females) that had normal or
corrected-to-normal vision participated in this study. All subjects gave written consent and were screened
before scanning to ensure their safety. Data analysis focused on five subjects (two males,
three females) due to time constraints of the project.
Procedure
Subjects had to do a reinforcement learning task twice, once outside the scanner (the
“offline” session) and a day later in the scanner (the “online” session), in order to investigate
the potential difference between the temporally distant and immediate effects of value
learning on face perception (which is not the topic of the current study). In the online session,
subjects did the task in an 7T MRI scanner to measure the blood-oxygenation level dependent
Materials
Four faces were used as stimuli in the reinforcement learning task. The faces were
generated with a computer graphics toolbox developed by Glasgow University (Yu, Garrod
& Schyns, 2012). This toolbox contains 3D photos of scanned faces of people. The toolbox
allows to implement the movement of different action units (AUs) of the face onto these 3D
photos. AUs are (groups of) muscles that underlie specific facial movements such as raising
the upper lip (Barrett et al., 2019). In this task four faces were used for which no AUs were
activated (i.e. faces without any dynamic expression). The set of faces used in the task
differed per subject. In a rating session before the task, subjects rated each face on the
dimension of valence. The four faces that differed the least in their valence ratings were
chosen for each subject to control for this possible covariate.
Task
This experiment used a “contextual two-armed bandit” reinforcement learning task, in
which participants had to learn the optimal action (out of two choices) in different “contexts”
(or “states”). The contexts, in this experiment, were cued by the presentation of the different
faces. In each trial, participants were randomly presented with one of the four faces and had
to make a choice between pressing one of two keys (associated with the right index and
middle finger). Each face was shown for 1200 ms. After each key press, participants received
feedback whether they won or lost money (1 Euro) or whether their balance remained the
same. The goal of the task was for participants to associate half of the faces with obtaining
money (‘rewarding faces’) and the other half with losing money (‘punishing faces’). Each
key was associated with a certain chance of winning money (for the rewarding faces) or
losing money (for punishing faces), which stayed the same throughout the experiment. For
example, for one of the two rewarding faces, pressing the left key was associated with
winning money 90% of the trials, while not winning anything in 10% of the trials. The other
key was associated with opposite chances: in 10% of the trials the participant would win
money, while in 90% of the trials the participant would not win anything. The goal was to
learn which key for each face would give the best outcome most of the time and ultimately to
associate two faces with positive outcomes and the other two faces with negative outcomes.
For rewarding faces the best outcome would mean winning money most of the time, while
for punishing faces this would mean maintaining one’s current balance and avoiding losing
money. For one of the two faces in each category (reward and punishment) it was harder to
learn that contingency as the chances of winning or losing were 60 to 40 instead of 90 to 10
(See Figure 1 for schematic visualization of task). Overall, there were two factors to associate
with the faces: valence (reward vs. punishment) and uncertainty (faces associated with
Computational Models
Several models of value computation and tracking were developed and compared to
each other in their ability to explain the neural data. Note that, unlike most behavioral
reinforcement learning models, the models in the current study aim to estimate value of the
stimulus (often called the state value) rather than the actions (often called the action value).
The models included models used in the reinforcement learning literature (e.g.
Rescorla-Wagner Rule; Rescorla, 1972) and more simple models that were developed for this study.
Generally, the models output value on a trial-by-trial basis as a function of the rewards,
punishments or neutral outcomes received for a specific face. Due to the fact that the models
are based on this similar premise it is not surprising that they highly correlate (see Figure 2).
Nevertheless, their specific structure and mechanisms do differ in certain aspects and we
therefore deemed it interesting to investigate whether one of the models actually captures the
way in which the brain encodes and keeps track of value best. The below five models were
chosen based on the fact that they were the ones least correlating from the pool of initial
models.
Model 1
Model 1 operationalizes value updating and tracking as summing up rewards and
punishments received over the course of the experiment for each face (for value
developments per subject see Figure 3). Neutral outcomes are disregarded, and do not count
towards the value development:
Model 2
In Model 2, value is again operationalized as the sum of rewards and punishments received
for a face. However, in this model neutral outcomes do have an influence on the value
development. This influence is operationalized as a form of ‘forgetting’, in which after
receiving the neutral outcome the value is updated in the direction of the initial value of a
face (i.e. in the direction of 0). For rewarding faces this means that the value of a face goes
down, while for punishing faces the value of a face rises. This computation also incorporates
our assumption that a neutral outcome for rewarding faces might be perceived as confusing
or more negative than receiving a reward. For punishing faces this omission of punishment
however might be perceived as something rewarding in itself. Perceiving the omission of
punishment as rewarding has been previously found in another study (Seymour et al., 2005 as
cited in Petrovic et al., 2008). To set the weight of confusion following neutral outcomes, a
parameter search was conducted. The parameters were set using a form of cross-validation in
which Subject 05’s data served as ‘training set’ to set the parameters. The other subjects’ data
served as a form of independent ‘testing set’. This avoided overfitting given that the data
from different subjects are independent.
Different parameter settings for the different regions of interests (ROIs) were tested
for their optimal outcome in the analysis. Then, the median of the two optimal parameter
Figure 3 Value development for Model 1 per subject
spaces over all ROIs were taken. We are aware of the fact that this method of choosing
parameters might not be the most optimal one as such parameters might differ between ROIs
and/or subjects. However, we decided to take the median of the parameters to make further
analysis more interpretable and to avoid overfitting to some extent. Furthermore, not enough
data was available to do an exhaustive search of the parameter space. The final medians of
the optimal parameter spaces were -0.6 for the rewarding faces and 0.35 for the punishing
faces:
𝑉 = ∑ 𝑅 (𝑡)
𝑡𝑅(𝑡) = {
1, 𝑖𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑓𝑜𝑟 𝑟𝑒𝑤𝑎𝑟𝑑𝑖𝑛𝑔 𝑓𝑎𝑐𝑒 𝑖𝑠 1 𝐸𝑢𝑟𝑜
−0.6, 𝑖𝑓𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑓𝑜𝑟 𝑟𝑒𝑤𝑎𝑟𝑑𝑖𝑛𝑔 𝑓𝑎𝑐𝑒 𝑖𝑠 0 𝐸𝑢𝑟𝑜
−1, 𝑖𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑓𝑜𝑟 𝑝𝑢𝑛𝑖𝑠ℎ𝑖𝑛𝑔 𝑓𝑎𝑐𝑒𝑠 𝑖𝑠 − 1𝐸𝑢𝑟𝑜
0.35,
𝑖𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑓𝑜𝑟 𝑝𝑢𝑛𝑖𝑠ℎ𝑖𝑛𝑔 𝑓𝑎𝑐𝑒 𝑖𝑠 0 𝐸𝑢𝑟𝑜
Model 3
Equation 2 Equation for Model 2, where t = trial, R(t) = outcome added/subtracted from face value at trial t.
since one last saw the current face has an influence on the weight of the new reward or
punishment one receives for that face. This model is based on the idea that not having seen a
face for a longer time might slow down the value learning for that face (i.e. leads to
forgetting, flattens the slope of value development) which is operationalized through a lower
weight assigned to the current reward or punishment received. Having seen the same face in
subsequent trials might lead to a stronger learning of the contingency of that face (i.e. the
value), therefore the reward or punishment might have a higher weight:
𝑉 = ∑
1
𝑡𝑠
∗ 𝑅(𝑡)
𝑡