• No results found

The Role of Facial Landmarks in the Emotional Perception of Facial Expressions

N/A
N/A
Protected

Academic year: 2021

Share "The Role of Facial Landmarks in the Emotional Perception of Facial Expressions"

Copied!
14
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The Role of Facial Landmarks in the Emotional

Perception of Facial Expressions

Michelle K ¨uhn, University of Amsterdam

Abstract

Using the muscle movements of the face, so-called action units, humans can produce a variety of different facial expressions and use these to communicate emotional states to each other. While the production of these expression can be described by the combination of such action units, it is clear that these features are not directly available for others to perceive facial expressions. It is proposed that instead, morphological features of the face, called facial landmarks, can serve as a representational information space to decode the meanings of facial expressions by observing how the distances between them change as the face dynamically moves. The aim of this research project was to use facial landmark distances as predictive features to model the behavioural process and neural representation in relation to emotion perception from facial expressions. Specific to this project is the aim to validate these theoretically driven behavioural models with functional MRI data. Using Representational Similarity Analysis (RSA), these predictive features were correlated with brain activation data to study wether distances between landmarks can serve as representational features in the process of emotion perception based on facial expressions. The results suggest that emotional ratings of facial expressions given by participants can be correctly predicted based on landmark distances with high accuracy. However, the results from analysing the functional imaging data suggest that the mental representations underlying this perception of emotions do not seem to rely on these configural features.

1. Introduction

Facial expressions are omnipresent in our everyday lives. We see other’s faces, familiar or unknown, and almost involun-tarily infer all kinds of information about a person, including mental states (Cunningham, Kleiner, B¨ulthoff, and Wallraven, 2004), emotional expressions (e.g. Calvo and Lundqvist, 2008; Calvo and Nummenmaa, 2016), affective attributes such as social judgements like dominance, attractiveness or trustworthiness (eg. Oosterhof and Todorov,2009) or a per-son’s intention or expectation (Scarantino,2017). To make these judgements, we utilize features of the face such as com-plexion, movement or morphology (Ghimire and Lee,2013). However, while these forms of social communication are com-monplace, how this message is received and understood is still not entirely clear (Barrett, Adolphs, Marsella, Martinez, and Pollak,2019). The focus of this project is the process of perceiving emotions in facial expressions.

In the simplest form, emotional expression recognition can be understood as a simple model of communication: the face as information source encodes emotions through muscle movements, which are visually transmitted to the receiver. The observer is able to decode this message and infer the meaning (Shannon and Weaver, 1949). According to the common(or classical) view of basic emotion theory, these expressions are perceived as prototypical categories and can be reliably and universally decoded by the observer (Ekman, 1999). While this view has been considerably changed to account for a more multidimensional and culturally sensitive

representation of facial expressions (Crivelli and Fridlund, 2019), the basic principle that specific facial configurations represent discrete categories of emotions is still present in scientific literature (Hutto, Robertson, and Kirchhoff,2018) and popular culture (e.g. Emojipedia.org,2019). As the relia-bility and specificity of this approach have been questioned (Barrett et al.,2019) and various opposing models have been proposed (Barrett,2017; Tamir, Thornton, Contreras, and Mitchell,2016; Skerry and Saxe,2015; Crivelli and Fridlund, 2019), the focus of this analysis is on one particular approach. The basis of basic emotion theory is configurations of facial muscle movements. Facial expressions, such as smiling or frowning, are generated by moving the muscles of the face. These muscle movements can be classified as discriminative Action Units (AUs) using the Facial Action Encoding System (FACS) (Ekman and Friesen,1978) based on their appearance. Examples of action units are shown in Fig. 1

Figure 1.Examples of action units (AUs) of the upper face

and combinations between them.

(2)

Figure 2.Example of stimulus animation. In the roughly 1.3 second clip participants saw the facial stimuli produce the specified AUs with the given intensity (noted in the Stimulus ID). The animation starts with a neutral expression and then reaches peak expression at roughly 0.6 seconds, where AUs are displayed at the maximum intensity. After the peak frame, the expression returns to back to neutral where no AUs are active. A selection of facial landmarks are annotated to show the change in distance between the neutral and peak expression; i.e., the growing distance between the points at the top and bottom lip. This annotation was not visible for participants in the experiment.

These action units are not directly observable, as the mus-cles are hidden below the skin, they can only be approximated using visual features (Martinez,2017b). One approximation is the use of facial landmarks and the distances between them. Facial landmarks describe locations in the face relative to the positions of key features of the face, such as the eyes, nose or mouth. As the face dynamically moves with its muscle move-ments, this configuration changes and the distance between i.e., the corners of the mouth grow shorter or longer, as shown in Fig. 2.

This approach diverges somewhat from the classical view, as it does not suppose that the features used for generating emotional facial expressions, action units, are the same ones used for the perception of them. It rather supposes these con-figural features, the distances between landmarks, are used to identify emotions in the face of the transmitter using back-wards inference (Neth and Martinez,2009; Martinez,2017b).

Previous studies within the field of computer vision have used facial landmarks to successfully classify facial expres-sions (Alugupally, Samal, Marx, and Bhatia,2011) and emo-tions (Fabian Benitez-Quiroz, Srinivasan, and Martinez,2016; Martinez,2017a; Tautkute, Trzcinski, and Bielski,2018). It is proposed that if these configural features can be used by a computer vision algorithm to classify expressions, they could be a possible representation to decode emotion from facial expression in human cognition as well (Martinez,2017b; Mar-tinez,2017a). Support for this approach comes from multivari-ate neuroimaging studies showing successful classification of action units from neural patterns (Srinivasan, Golomb, and

Martinez,2016). Importantly, while configural features are present in the expressive face, a ”neutral” face with no action units displayed also has its own unique configuration of land-mark distances. The anatomy of each face and the individual differences in control of facial muscles (M¨uri,2016) lead to unique configural features. In computer vision, these neutral configural features can be classified as an AU, although none is present. In human cognition this would translate to perceiv-ing emotions when none are communicated (Martinez,2017b) and potentially offer an explanation to why social judgements of the neutral face affect emotion perception as well (Ooster-hof and Todorov,2009; Neth and Martinez,2009).

Based on these results from modelling and neuroimag-ing experiments, the question is whether features other than action units, which are implcitly thought to be the building blocksof emotion perception (Jack and Schyns,2017, Ekman, 1999), offer an alternative account. To test this, an analysis was conducted to combine behavioural and neuroimaging data using techniques from computer vision, psychology and neu-roscience. The value of this approach lies in using the results from the behavioural experiment as a theoretical basis for the neuroimaging analysis. Whereas behavioural data can give insight in the observable mechanisms of cognitive functions, multiple behavioural models offer possible accounts. In order to confirm the underlying information processing, using neu-roimaging data is vital. However, decoding the representation of information in the brain using multivariate neuroimaging analyses is far from straightforward. While inferences about mental representations are difficult to make, connecting

(3)

decod-ing models to behavioural ones is a way to provide evidence for functional processing (Ritchie, Kaplan, and Klein,2019). Taking an approach from the field of psychophysics, the aim of this analysis is to define a relationship which relates the features of physical stimuli to an observed effect in the behavioural data (Jack and Schyns,2017). Then, these results are correlated with neuroimaging data to find a mental state that links the observed stimuli to the brain activity pattern, using multivariate analyses (Kriegeskorte and Kievit,2013; Popal, Wang, and Olson,2019).

2. Methods

2.1 Materials

The basis of both the behavioural and the fMRI experiment were short clips of dynamic facial expressions. The stimuli were acquired by Yu, Garrod, and Schyns (2012), who de-veloped a toolbox dedicated to adaptably synthesize facial expression animations. Based on real life photographs, the authors used computer graphics techniques to create 3D mod-els of facial expressions which can be flexibly manipulated to generate combinations of muscle movements (AUs). Us-ing a subset of this dataset, 50 unique facial identities were selected. 22 AUs were activated with random combinations and intensities, resulting in 848 stimuli videos of 1.25 seconds (30 frames), with the AU amplitude peak at 15 frames (0.6 seconds), shown in Fig. 2. The occurrence and intensity of AUs was counterbalanced for each facial identity, so that each action unit was presented in each intensity equally often in every face.

For the behavioural experiment, participants were pre-sented with the stimuli on a screen and asked to rate each of the faces on the perceived emotion (fear, anger, sadness, happiness, surprise, disgustor none of all) and the intensity (if any) of the chosen emotion (low to high), as well as social judgements (trustworthiness, dominance, attraction) using a mouse click. Instructions were presented in Dutch. Partici-pants rated all the facial stimuli in four sessions and received financial compensation per hour.

The design of the behavioural study follows the psycho-physics approach (Jack and Schyns,2017). By generating random, but counterbalanced facial movements (specific AU combinations), no prior hypothesis about the features or their interaction is made. There is no fixed assumption on which action units elicit which emotional rating. In turn, this means the paradigm tests the emotional perception of the facial stim-uli, not the correct recognition of emotions (Barrett,2017). The choice of the six basic emotions was done in compliance to earlier studies for comparability (Ekman,2016; Keltner, Sauter, Tracy, and Cowen,2019) and since the facial stimuli were originally validated using the six basic emotions (Yu et al.,2012).

2.2 MRI acquisition

Functional neuroimaging (fMRI) data was acquired using a Philips 7T MRI scanner, an 8-channel multiransmit RF

coil and a 32-channel receive coil. Functional T2*-weighted sequences were acquired using 3D gradient-echo, echo planar imaging (TR = 1.317 s, TE = 17 ms, FOV = 175 x 200 x 200 mm) and voxel size of 1.8 x 1.786 x 1.786 mm. 348 volumes were acquired for each run. For the expressive task, each participant completed three sessions of eight runs. During one additional run, subjects only viewed neutral faces.

The task in the scanner was similar to the behavioural experiment. Participants were presented with the facial stimuli on a computer screen and prompted to rate faces on social judgements using eight buttons on the provided button boxes ranging from least to most trustworthy/dominant/attractive. Importantly, for the data analysis, only the behavioural ratings were used, as participants did not rate every single trial they saw due to time constraints.

As an additional functional localizer task, participants were presented with a series of varying images of objects, such as faces, houses or body parts.These stimuli were presented in two different study designs, where they were grouped in either event-related or a blocked design. Participants were given a 1-back task and instructed to press any button on the provided button boxes when an identical image appears twice in a row. At the end of each session, participants received financial compensation per hour.

In total, 13 participants (6 male, 7 female) completed the experiment. Twelve participants were used for the behavioural modeling and five participants used for the RSA analysis due to time constraints.

2.3 Procedure

Participants completed the behavioural experiment first. They rated the short clips of facial expressions in four separate sessions. Subjects rated images of neutral faces, with no ex-pression activation in the last session. After the behavioural experiment, participants proceeded with the fMRI experiment for six sessions, three for the presentation of the expressive and neutral faces, two mapper sessions including the func-tional face localizer task, and a reinforcement learning task which was part of a different experiment.

The first step of the data analysis was done on the be-havioural data. All data analysis was performed in Python. To extract facial landmarks, two different approaches were used. Since the facial stimuli were generated using a 3D model, the facial landmarks can be extracted from this 3D mesh as de-scribed in Yu et al. (2012). To test more practical approaches, a different feature space contained landmarks extracted using commercial computer vision APIs from Google Vision and Microsoft Azure. The benefit of extracting the vertex based landmarks is the accuracy of the landmark location, whereas the computer vision APIs are error prone (Appendix 1). How-ever, if the results are comparable between the vertex based or Google landmarks, they would offer an attractive alternative to be used on natural stimuli such as photographs, or other pre-existing stimuli in further experiments. Based on these landmark coordinates, the Euclidean distance between each

(4)

of the landmark points was calculated using the SciPy and NumPy libraries (Virtanen et al.,2020; Walt, Colbert, and Varoquaux,2011).

Fabian Benitez-Quiroz et al. (2016) proposed a second measure to quantify the change in landmark distances. For their algorithm to detect facial expressions and action units, they computed the Delaunay triangulation for each face and extracted the angles within these triangles, as shown in Fig. 3.

Figure 3.Original features used by Fabian Benitez-Quiroz,

Srinivasan, and Martinez (2016), showing the landmark annotation in the left panel (a) and the Delaunay triangulation and angle computation on the right (b).

Given the success of their model, a similar approach was taken in this analysis. However, Delaunay triangulation opti-mizes the formation of triangles by maximizing the minimum angle to avoid skinny triangles. Since every facial stimuli has the same number of landmarks, but an unique configuration, this leads to a unique triangulation as well; the same triangle between three landmarks in one face may be suboptimal (e.g., too small) in another face, leading to a different configuration. To keep the computed features comparable between the stim-uli, the triangles were instead formed between two landmarks and a reference point, the tip of the nose.

As shown in Fig. 4, Euclidean distance and the angle γ describe the same change in the distance between the two land-mark points in different feature spaces. Using 50 landland-marks from the vertex extraction the number of features was 1225 Euclidean distances and 3528 angles for each facial stimuli. Landmarks were either extracted from either the expressive face (amplitude at the time of the ”peak” frame, 0.6s) or from the neutral face (first frame at 0s) (see Fig. 2). Importantly, the amplitude of the AUs in the neutral face was zero, which means no emotion was expressed in these stimuli.

2.4 Behavioural Modeling

The computed distance features were determined for every fa-cial stimulus and used as training data in a logistic regression classifier from the Scikit-learn library (Pedregosa et al.,2011), to investigate whether subject ratings from the behavioural experiment can be reliably predicted. Importantly, the number of classes for the prediction was restricted to the six basic

emotions (anger, disgust, fear, happiness, sadness, surprise) meaning all trials rated ”none” by the participants were ex-cluded.

Prior to training, the features were standardized by de-meaning the features and scaling them to unit variance. The training itself was done in a within-subject design, where one model is trained for each subject, with the distance features of the face as an independent, and the subject’s ratings as dependent variable. To prevent overfitting, for every subject a k-fold split was implemented (k = 10). The models were evaluated using the area under the receiver operating charac-teristic (ROC-AUC) metric. This metric was chosen due to the distribution of emotional ratings.

fear disgust sadness surprise anger happiness

Emotion 0 5 10 15 20 25 % of expressive ratings

Distribution of ratings for all subjects

Figure 5.Bar graph showing the frequency of each of the six

basic emotions as perceived by the participants. Even though the generated AUs were counterbalanced to occur equally often in all stimuli, participants rated facial expressions most often as displaying happiness and least often as fear or disgust.

As shown in Fig. 5, not all emotions were perceived equally often, which leads to an imbalance in the number of observations belonging to each class. This imbalance cannot be accurately deflected with traditional accuracy scores, but is taken into account using the AUC-ROC curve. The AUC scores were obtained after every fit of the model and subse-quently averaged for every k-fold and over all subjects which resulted in a single, cross-validated score for each feature space. Based on the results of the behavioural modeling, the features with the highest accuracy were used in the following neuroimaging analysis.

2.5 fMRI Preprocessing

Results included in this manuscript come from preprocessing performed using FMRIPREP version stable (Esteban et al., 2019, fMRIPrep Available from:10.5281/zenodo.852659), a Nipype (Krzysztof Gorgolewski et al.,2011, KJ Gorgolewski et al.,2017) based tool. Each T1w (T1-weighted) volume was corrected for INU (intensity non-uniformity) using N4Bias-FieldCorrection v2.1.0 (Tustison et al.,2010) and skull-stripped using antsBrainExtraction.sh v2.1.0 (using the OASIS tem-plate). Brain surfaces were reconstructed using recon-all

(5)

Figure 4.Different feature spaces used for the behavioural modeling. The two panels on the right show an example stimuli with a few landmarks and triangles between these points. Using this feature space, the distance between two points (i.e., the tip of the nose and the bottom of the chin) is described by the angle γ. In the two right panels, pairwise distances between the landmarks are shown. In this case, the distance between the two points on the nose and chin is described by the Euclidean distance.

from FreeSurfer v6.0.1 (Dale, Fischl, and Sereno, 1999), and the brain mask estimated previously was refined with a custom variation of the method to reconcile ANTs-derived and FreeSurfer-derived segmentations of the cortical gray-matter of Mindboggle (Klein et al.,2017). Spatial normal-ization to the ICBM 152 Nonlinear Asymmetrical template version 2009c (Fonov, Evans, Mckinstry, Almli, and Collins, 2009) was performed through nonlinear registration with the antsRegistration tool of ANTs v2.1.0 (Avants, Anderson, Grossman, and Gee,2008), using brain-extracted versions of both T1w volume and template. Brain tissue segmentation of cerebrospinal fluid (CSF), white-matter (WM) and gray-matter (GM) was performed on the brain-extracted T1w using fast (Zhang, Brady, and Smith,2001) (FSL v5.0.9).

Functional data was motion corrected using mcflirt (FSL v5.0.9, Jenkinson, Bannister, Brady, and Smith,2002). Dis-tortion correction was performed using an implementation of the TOPUP technique (Andersson, Skare, and Ashburner, 2003) using 3dQwarp (AFNI v16.2.07, Cox,1996). This was followed by co-registration to the corresponding T1w using boundary-based registration (Greve and Fischl,2009) with six degrees of freedom, using bbregister (FreeSurfer v6.0.1). Motion correcting transformations, field distortion correct-ing warp, BOLD-to-T1w transformation and T1w-to-template (MNI) warp were concatenated and applied in a single step using antsApplyTransforms (ANTs v2.1.0) using Lanczos interpolation.

Physiological noise regressors were extracted applying CompCor (Behzadi, Restom, Liau, and Liu,2007). Principal components were estimated for the two CompCor variants: temporal (tCompCor) and anatomical (aCompCor). A mask to exclude signal with cortical origin was obtained by erod-ing the brain mask, ensurerod-ing it only contained subcortical

structures. Six tCompCor components were then calculated including only the top 5% variable voxels within that subcor-tical mask. For aCompCor, six components were calculated within the intersection of the subcortical mask and the union of CSF and WM masks calculated in T1w space, after their projection to the native space of each functional run. Frame-wise displacement (Power et al.,2013) was calculated for each functional run using the implementation of Nipype.

Many internal operations of FMRIPREP use Nilearn (Abra-ham et al.,2014), principally within the BOLD-processing workflow. For more details of the pipeline see fmriprep. readthedocs.io/en/stable/workflows.html.

2.6 Single trial pattern estimation

To obtain the neural patterns for each facial stimuli, single trial patterns had to be estimated. This was done by using a general linear model (GLM) to create one regressor, a con-voluted hemodynamic response function (HRF) for one trial each. However while the single trials were spaced with an inter-stimulus-interval of five seconds, the temporal duration of the HRF is much longer than that, leading to variable and correlated trial regressors (Soch, Allefeld, and Haynes,2020, Mumford, Turner, Ashby, and Poldrack,2012). This tempo-ral correlation is problematic for representational similarity analysis (RSA), as correlated trials may be more similar to each other and thus have a greater effect on the fMRI pat-tern similarity than the experimental variable (Alink, Walther, Krugliak, van den Bosch, and Kriegeskorte,2015). To ac-count for this ”pattern drift”, the trial regressors need to be de-correlated.

As a pre-processing step, the fMRI data were high-pass filtered, with a cutoff of 0.1 Hz using a discrete cosine trans-form set, and afterwards standardized. The noise

(6)

regres-sors, including fMRIPrep confounds as well as physiology-derived RETROICOR regressors (Kasper et al.,2017) were column-wise concatenated and pre-processed the same way as the fMRI data. The filtered noise regressors were PCA-transformed, whereas only the first 50 components that ex-plain most of the variance in the regressor set were retained. To find the optimal number of components, a k-fold cross-validated model was fit to the fMRI using ordinary least squares (OLS). This procedure was repeated with an increas-ing amount of components, storincreas-ing the resultincreas-ing r2for each fold, number of components and each voxel. Across folds, the r2was averaged. Based on this selection procedure, the number of components with the highest average r2was cho-sen. For each voxel separately, this optimal number of noise components is regressed out using again OLS.

Following this, the neural patterns were estimated using a design matrix with one regressor for each trial including an expressive face in the GLM, (see ”least-squares all” Mumford et al.,2012) and one regressor for all other events that were not of interest (i.e., participant ratings of the stimuli). Importantly, this design matrix was filtered and standardized the same way as the fMRI data and noise regressors.

The resulting patterns were pre-whitened by multiplying the β -patterns resulting from the GLM with the square root of the covariance matrix of the design matrix. This process was repeated for every trial and the extracted patterns were saved in the N × K matrix shown in Fig. 7. The patterns were then masked for the given ROI and pairwise distances were computed as described in the following sections.

2.7 Defining Regions of Interest

Regions of interests (ROIs) were chosen on the basis of pre-vious literature. All of the anatomical regions were extracted from the cortical parcellation (Fischl et al.,2004) included in the Freesurfer package (surfer. nmr.mgh.harvard.edu), for each of the five subjects in T1 weighted space.

Firstly, a functional region responsive to faces versus other types of stimuli was defined using the data from the face lo-calizer task (Fig. 6). This fusiform face area (FFA), located in the inferior temporal cortex (IT) is thought to be special-ized for facial recognition and representation of facial iden-tity (Kanwisher, McDermott, and Chun, 1997; Carlin and Kriegeskorte,2017), as well as the involvement in percep-tion of facial expressions (Ganel, Valyear, Goshen-Gottstein, and Goodale,2005). To exclude voxel activity outside of the IT, this functional FFA mask was then intersected with the anatomical mask of the fusiform gyrus.

Functional FFA Fusiform Gyrus L R y=-53 x=35 L R z=-20 -12 -6.1 0 6.1 12 L R y=-29 x=-35 L R z=-10

Figure 6.Upper panel showing the activation for the face

localizer task with the contrast face > other, z-values were thresholded at 5. The lower panel shows the fusiform cortex as estimated by the cortical parcellation. For the ROI selection, the functional mask was intersected with the anatomical mask to exclude activity outside the fusiform cortex.

Another ROI used for the RSA analysis was the uni- and bilateral amygdala, which is known to be involved in pro-cesses of emotional memory, but has been shown to activate in response to facial expressions as well (Wang et al.,2017; Adolphs, 2008), especially in response to fear (Costafreda, Brammer, David, and Fu,2008). The orbital frontal gyrus has been shown to respond to angry faces in particular (Blair and Cipolotti,2000; Vytal and Hamann,2010; Beyer, M¨unte, G¨ottlich, and Kr¨amer,2015) and the superior temporal sulcus (STS) has been shown to activate to facial expressions (Fox, Moon, Iaria, and Barton,2009) and part of facial expressions (Greening, Mitchell, and Smith,2018) as well as action units (Martinez,2017b). The cerebro-spinal fluid (CSF) as well as the lateral ventricles were chosen as control ROIs, as no voxel activity would be expected in these regions. Further regions are theorized to activate to discrete emotional categories (for a critical review see: Lindquist, Wager, Kober, Bliss-Moreau, and Barrett,2012), but exceeded the scope of the project.

2.8 Representational Similarity Analysis

In classical, univariate analysis of fMRI data, the effect of experimental conditions is tested in a given ROI. Crucially, this effect is tested on individual voxels or averaged voxel activity within a cluster (Poldrack, Mumford, and Nichols, 2011). While this type of analysis was pioneering in the field of cognitive neuroscience, its applications are limited. Uni-variate analysis can show if a given selection of voxels shows higher overall activity compared to another region, or if the given region shows stronger activity to one experimental

(7)

con-Figure 7.Pipeline from experimental stimuli to a Representational Dissimilarity Matrix (RDM). MVP analyses assume that different conditions, in this case single trials of facial expressions (N), are represented as distinct patterns of voxel activity (K) within the brain. Using metrics such as Euclidean distance, Cosine similarity or 1 minus the correlation coefficient (r), these patterns can be compared in their (dis)similarity. By computing all pairwise distances between the trials, a square N × N dissimilarity matrix (RDM) is constructed for every run. The values of this matrix are in dissimilarity space and thus not dependent on the original unit of the patterns. Image adapted from Nili et al. (2014).

dition versus another (Haxby,2012; Popal et al.,2019). What univariate analyses cannot show is whether the voxels in a given region encode information differently, or to a different degree (Davis et al.,2014), or whether one region is function-ally specific to the given condition (Haxby,2012). In other words, distributed or multidimensional information cannot be studied easily.

To account for these shortcomings, the use of multivariate pattern analysis (MVPA) is more and more frequently em-ployed. The assumption of MVP analysis is that experimental variables, such as distinct conditions or single trials, are repre-sented as a neural pattern within the brain (Lewis-Peacock and Norman,2014). If these patterns can be used as independent variables to predict experimental conditions, then they should encode some form of information about these stimuli.

However, limitations apply to this technique as well. While MVPA is good at detecting where stimuli are represented in the brain, it fails to inform how data is represented (Popal et al.,2019; Nili et al.,2014). In regular classification applica-tions, one can tell whether a stimulus is correctly classified or not, but it is much harder to determine what part of the stimuli drives this classification (Ritchie et al.,2019). Further, real life stimuli often contain continuous and multidimensional in-formation, which MVPA struggles to handle (Diedrichsen and Kriegeskorte,2017; Dimsdale-Zucker and Ranganath,2018). Lastly, when information can be decoded form multiple re-gions of the brain, teasing apart the functional specificity of them is a challenge for MVP analysis (Popal et al.,2019).

Es-pecially since there is a tendency to report only the best fitting model, information about how regions may encode for multi-ple stimuli is discarded (Haxby,2012; Dubois, de Berker, and Tsao,2015).While there are solutions for these shortcomings, MVP analyses based on statistical classification will always need a large amount of data, and takes a long time to run (Carlin and Kriegeskorte,2017).

Thus, a different flavor of MVPA includes comparing the multidimensional spaces of voxel activation data with psy-chological spaces without using linear classifiers (or possibly without any classifier) as an approximation for cognitive pro-cesses (Ritchie et al.,2019). This type of analysis, called representational similarity analysis (RSA) is ideal for study-ing continuous and multidimensional links between stimuli and their mental representations (Kriegeskorte, Mur, and Ban-dettini,2008; Dimsdale-Zucker and Ranganath,2018).

RSA lends itself to studying emotional facial expressions, as it ties into the concept of a face-space (Valentine et al., 2001). Face-space is a multidimensional similarity space, where faces are represented as coordinates within that space. The dimensions of this space are not fixed, they can be fea-tures such as the age or shape of a face or specifics such as the distance between the eyes (Valentine, Lewis, and Hills, 2016). In this case the features would be different distance measures between landmarks. These two concepts are concep-tually linked as RSA and face-space view different features and neural activity as coordinates in space. The following section will describe how RSA was employed in studying

(8)

facial expressions.

Using RSA to compare single trials of facial expression

perception Similarly to the MVP analyses discussed

previ-ously, RSA also represents brain activity in neural patterns. As shown in Fig.7, these patterns can be represented as N × K matrices, where N refers to the number of observations, in this case single trials of facial stimuli, and K refers to the number of voxels and their estimated activity in the particular region of interest. Using distance measures such as the Euclidean distance, or 1 minus the correlation coefficient (Popal et al., 2019), these patterns can be compared in (dis)similarity. The result of this comparison is a representational dissimilarity matrix (RDM) showing the pairwise distances between the stimuli and representing them in an N × N dimensional space. This square N × N matrix is symmetric and has a diagonal equal to zero, as the distance between a stimuli and itself is zero. The advantage of RSA is that this neural RDM can be correlated with any other matrix regardless of what the origi-nal feature space was, since all features are transformed into similarity measures (Kriegeskorte et al.,2008). Thus, RSA type of analyses compare the structure of neural activation patterns as opposed to the simple discriminability between classes of patterns (Kriegeskorte and Kievit,2013).

To construct the neural RDMs, the brain data was masked with the specified ROI using the Nilearn package in Python (described in Abraham et al., 2014). Then, a design ma-trix was created with one HRF regressor for every single trial within a run. After fitting a first-level model, (i.e., a ”least-squares all” design (Mumford et al.,2012), the

result-ing patterns were stacked for each of the eight runs, for three sessions, resulting in a large pattern matrix consisting of 24 runs with rows of voxel activity for every of the 848 stim-uli. Subsequently, the pairwise distances between the trials in the pattern matrix were computed using the column-wise demeaned cosine distance, which is equivalent to the 1 − r distance.

This process was repeated for the distance features, where the trials of the N × K matrix from the behavioural exper-iment were compared using the Euclidean distance, where K now refers to the number of pairwise distances between landmarks, not voxel activity. Since the RDMs are symmetric, the actual correlations are performed on the array extracted from the upper (or lower) triangle of the matrix, called repre-sentational dissimilarity vector (RDV). The neural RDV and feature RDVs were correlated using Spearman r.

Reweighting RDMs By simply correlating the neural and

feature RDVs, it is assumed that every feature contributes equally to the trialwise dissimilarity in the neural pattern. However, given how large the feature space is (with 1225 distances) it is questionable whether all of them, if any, are utilized in facial expression perception. One way to account for this is to extract every feature column from the N × K feature matrix, representing one distance feature across all trials (i.e., the distance between the eyes across all 848 trials),

and to use this vector to construct a RDM. This results in a single RDV for each feature. These RDVs can now be used as independent variables in a general linear model (GLM) to assign a weight to each feature RDV using a least-squares fit. Importantly, using such a large amount of features within a single model poses a risk for overfitting, so similarly to the behavioural model, these reweighted features need to be cross-validated.

Firstly, a design matrix was created with each feature RDV in one column plus one column with a constant of 1 as an intercept. This design matrix as well as the neural RDV was split into train and test sets using 10-fold cross-validation. For each split, a non-negative least-squares function was fitted on the RDV features of the train set and the neural RDV data. Using the resulting β -values, the RDV features of the test set of the design matrix were reweighted and stored in a pre-allocated array. This process is repeated for all k-folds until the pre-allocated array is filled (L¨onn et al.,2017). This way of cross-validation ensures that the constructed RDV results from fits with different splits of the original features. This cross-validated RDV is then correlated with the neural RDV of the given ROI and subject.

3. Results

3.1 Behavioural modeling

The results of the behavioural experiment are presented in Fig. 8. As expected, better performance is obtained from expressive faces (peak frame condition), but surprisingly using only the neutral face, classification of emotional ratings is still above chance for most subjects for happiness and anger. It is also shown that using Euclidean distance metrics outperform metrics using the angle between landmark points. The scores of combinations of feature spaces are mostly driven by the accuracy of the Euclidean distance features. The results of the behavioural experiment using the landmarks detected by the Google Vision API are reported in Appendix 5.

3.2 RSA

The Spearman correlations and p-values between the neural RDMs and feature RDMs are shown in Fig. 9. While it was expected that facial identity (corresponding to neutral face condition) is represented in the FFA (Kanwisher et al., 1997) and facial expressions (peak face condition) are repre-sented in the STS (Fox et al.,2009; Martinez,2017b) none of these assumptions could be corroborated by the results, as the correlations are close to zero and the associated p-values do not survive (Bonferroni) multiple-comparison correction. Whereas some particular ROIs show a (close to significant) effect, the ventricles, which were chosen as a control region, show a similar effect, indicating that the effect in the ROI is likely a false positive.

(9)

angerdisgust fearhappinesssadnesssurprise 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 AUC score neutral, distance

angerdisgust fearhappinesssadnesssurprise

neutral, angles

angerdisgust fearhappinesssadnesssurprise

peak, angles

angerdisgust fearhappinesssadnesssurprise

peak, distance

angerdisgust fearhappinesssadnesssurprise

peak+neutral, distance

angerdisgust fearhappinesssadnesssurprise

peak, distance+angles mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13

AUC Scores Per Subject (10-fold averaged, vertex features)

Figure 8.Results of the behavioural modeling experiment. The different panels show the accuracies for each combination of

the feature spaces (expression (peak frame or neutral) x feature (Euclidean distance or angle) and their combinations. The y-axis shows the AUC score for the given class (different emotion) on the x-axis. Each of the colored points represents the 10-fold averaged AUC score for one subject. The dashed grey line represents the average AUC for all subjects and the dotted red line indicates the chance level of an item belonging to this class. The highest AUC scores were obtained using the Euclidean distance features from the expressive face.

(a) Spearman’s ρ

(b) p-values

Figure 9.Spearman’s ρ (a) and p-values (b) for all subjects,

obtained by correlating the neural RDM for each region of interest with the feature RDM of the given feature space.

3.3 Reweighting

Given the extent of the distance features it was presumed that reweighting the features would improve the correlations between the features and neural patterns. The results of the reweighting analysis are shown in Fig. 10. Similarly to the

”standard“ correlation between the RDMs, the Spearman correlations of the reweighting analysis are near zero and very few results are significant for some of the subjects, but not for all, suggesting there is no systematic effect driving these results.

(a) Spearman’s ρ

(b) p-values

Figure 10.Spearman’s ρ (a) and p-values (b) for all subjects,

obtained by reweighting and cross-validating every feature (every distance between two landmark points) and correlating the reweighted RDM with the neural RDM for each region of interest.

4. Discussion

This analysis investigated wether facial landmark-based rep-resentations of facial expressions are predictive of both be-havioral ratings of emotions and representational geometry of brain patterns.

In the behavioural part of the analysis, various distance features, from the neutral and the expressive face, were used in a logistic regression classifier to predict subject ratings of the six basic emotions. The behavioural model showed that

(10)

distance features based on the Euclidean distance between landmark points of the expressive face reliably predict emo-tional ratings with high accuracies. Lower accuracy scores, but still above chance level, were achieved by using distance features of the neutral face, with no AUs displayed. While the latter result is somewhat surprising, it is in line with the underlying theory of facial landmark distances. Martinez (2017a) has proposed that that this

”false-positive“ error of detecting an emotion (and underlying AU) is due to unusually short or long distances between landmarks in certain faces. Behavioural experiments have shown that human participants rate shorter faces, where facial features are closer together, more frequently as sad, and longer faces as angry, even if they do not actively show an emotion (Neth and Martinez, 2009). Unusually short or long distances between landmarks may thus be classified as an emotion, whether this distance is manipulated or not. It is proposed that the brain accounts for this using contextual information when presented with real life stimuli (Martinez,2017a). A way to account for this in a behavioural model could be to compute the difference be-tween the distances of the neutral face and the expressive face. This measure would incorporate the dynamic change from neutral to emotional expression rather than the configuration of landmarks.

The next step of the analysis was to verify the findings of the behavioural model by investigating whether landmark distances also serve as representation in the brain. The results with correlations around zero and p > 0.05 do not allow such a conclusion. This contrast between the behavioural and neuroimaging results can have multiple reasons.

For one, only very few ROIs were investigated based on the limited extend of this analysis. It is possible that neural patterns do encode landmark distances, just not within the given ROIs. In a large meta-analysis, Lindquist et al. (2012) found that ROIs thought to be involved in emotional processing also show activation in a variety of other tasks, suggesting that emotion perception may emerge from multiple ’basic’ psychological processes. This suggests testing regions involved in attentional or motivational processes in a more network based approach may lead to stronger results.

Furthermore, experimental factors have to be taken into ac-count. While the temporal correlation between the single-trial predictors was accounted for with the de-noising procedure discussed in the methods section, the regressors are still only very short in time, and thus have a higher signal-to-noise ratio and limited statistical power, which may lead to low correlations. Considering the general structure of the anal-ysis, landmark distances were proposed as a feature space to decode emotions from facial expressions, but this feature spaces was not explicitly tested against an approach based on action units. While this analysis was not possible due to time constraints, future studies should also focus on comparing classical models with alternative ones.

Another concern of multivariate investigations of mental representations is that the information which is used by a

classifier to make predictions may be partially or completely inaccessible for the brain (Ritchie et al.,2019). Even if it can be shown that behaviour can be reliably predicted from land-mark distances, this does not show that the information used by the classifier is structured in way that is usable by the brain. Simply put:

”[. . . ] the problem is typically not a lack of infor-mation or noisy inforinfor-mation, but that the inforinfor-mation is badly formatted’

”(DiCarlo and Cox,2007, p. 335). Previous studies have shown that expression perception is not solely based on face morphology or geometry, but also appearance based fea-tures, such as the texture of the face (Ghimire and Lee,2013). Whereas facial viewpoints (same configuration of landmarks from a different perspective) could be decoded from neural activity in primate brains using MVPA, researchers failed to decode facial identity (Dubois et al.,2015). In human studies using facial landmarks to predict facial expressions, Gabor filters centered around these landmarks serve as an approxima-tion of skin texture. The combinaapproxima-tion of both feature spaces results in accurate predictions (Fabian Benitez-Quiroz et al., 2016).

These attempts show that there is a multidimensional and temporal feature space of facial expressions. Different fea-ture dimensions will need to be studied - in combination - to shed light on the underlying mental processes involving facial expression perception.

5. Conclusion

This research project has shown that a statistical classifier based on Euclidean distance features of facial landmarks can reliably predict participant’s responses from a behavioural experiment where the six basic emotions had to be assigned to short clips of dynamic facial expressions. Surprisingly, these predicitons are still above chance when using the distance fea-tures from the non-expressive face, showing no AU activation. This suggests that the unique configuration of morphological features of the face contribute to the perception of emotion and that the perception of facial identities and expression are linked. While the behavioural data suggest that these land-mark distances may be used as a mental representation, the neuroimaging analysis could not corroborate these results. The results suggest that these distance features may not be directly employed by the brain to perceive emotion in facial expressions, or at least not in the investigated ROIs. Future research will have to compare more multidimensional and temporal feature spaces to study facial emotion perception.

References

Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., . . . Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in neuroinformatics, 8, 14.

Adolphs, R. (2008). Fear, faces, and the human amygdala. Current opinion in neurobiology, 18(2), 166–172.

(11)

Alink, A., Walther, A., Krugliak, A., van den Bosch, J. J., & Kriegeskorte, N. (2015). Mind the drift-improving sensitivity to fmri pattern information by accounting for temporal pattern drift. bioRxiv, 032391.

Alugupally, N., Samal, A., Marx, D., & Bhatia, S. (2011). Analysis of landmarks in recognition of face expres-sions. Pattern Recognition and Image Analysis, 21(4), 681–693.

Andersson, J. L., Skare, S., & Ashburner, J. (2003). How to correct susceptibility distortions in spin-echo echo-planar images: Application to diffusion tensor imaging. Neuroimage, 20(2), 870–888.

Avants, B., Anderson, C., Grossman, M., & Gee, J. (2008). Symmetric normalization for patient-specific tracking of longitudinal change in frontotemporal dementia. Med Image Anal, 12, 26–41.

Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and catego-rization. Social cognitive and affective neuroscience, 12(1), 1–23.

Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak, S. D. (2019). Emotional expressions recon-sidered: Challenges to inferring emotion from human facial movements. Psychological science in the public interest, 20(1), 1–68.

Behzadi, Y., Restom, K., Liau, J., & Liu, T. T. (2007). A component based noise correction method (compcor) for bold and perfusion based fmri. Neuroimage, 37(1), 90–101.

Beyer, F., M¨unte, T. F., G¨ottlich, M., & Kr¨amer, U. M. (2015). Orbitofrontal cortex reactivity to angry facial expres-sion in a social interaction correlates with aggressive behavior. Cerebral cortex, 25(9), 3057–3063.

Blair, R. J., & Cipolotti, L. (2000). Impaired social response reversal: A case ofacquired sociopathy’. Brain, 123(6), 1122–1141.

Calvo, M. G., & Lundqvist, D. (2008). Facial expressions of emotion (kdef): Identification under different display-duration conditions. Behavior research methods, 40(1), 109–115.

Calvo, M. G., & Nummenmaa, L. (2016). Perceptual and affective mechanisms in facial expression recognition: An integrative review. Cognition and Emotion, 30(6), 1081–1106.

Carlin, J. D., & Kriegeskorte, N. (2017). Adjudicating be-tween face-coding models with individual-face fmri re-sponses. PLoS computational biology, 13(7), e1005604. Costafreda, S. G., Brammer, M. J., David, A. S., & Fu, C. H. (2008). Predictors of amygdala activation during the processing of emotional stimuli: A meta-analysis of 385 pet and fmri studies. Brain research reviews, 58(1), 57–70.

Cox, R. W. (1996). Afni: Software for analysis and visual-ization of functional magnetic resonance neuroimages. Computers and Biomedical research, 29(3), 162–173.

Crivelli, C., & Fridlund, A. J. (2019). Inside-out: From basic emotions theory to the behavioral ecology view. Jour-nal of Nonverbal Behavior, 43(2), 161–194.

Cunningham, D. W., Kleiner, M., B¨ulthoff, H. H., & Wall-raven, C. (2004). The components of conversational facial expressions. In Proceedings of the 1st symposium on applied perception in graphics and visualization (pp. 143–150).

Dale, A. M., Fischl, B., & Sereno, M. I. (1999). Cortical surface-based analysis: I. segmentation and surface re-construction. Neuroimage, 9(2), 179–194.

Davis, T., LaRocque, K. F., Mumford, J. A., Norman, K. A., Wagner, A. D., & Poldrack, R. A. (2014). What do differences between multi-voxel and univariate analysis mean? how subject-, voxel-, and trial-level variance impact fmri analysis. Neuroimage, 97, 271–283. DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant

object recognition. Trends in cognitive sciences, 11(8), 333–341.

Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding en-coding, pattern-component, and representational-similarity analysis. PLoS computational biology, 13(4), e1005508. Dimsdale-Zucker, H. R., & Ranganath, C. (2018). Repre-sentational similarity analyses: A practical guide for functional mri applications. In Handbook of behavioral neuroscience(Vol. 28, pp. 509–525). Elsevier. Dubois, J., de Berker, A. O., & Tsao, D. Y. (2015). Single-unit

recordings in the macaque face patch system reveal lim-itations of fmri mvpa. Journal of Neuroscience, 35(6), 2791–2802.

Ekman, P. (1999). Basic emotions. Handbook of cognition and emotion, 98(45-60), 16.

Ekman, P. (2016). What scientists who study emotion agree about. Perspectives on psychological science, 11(1), 31– 34.

Ekman, P., & Friesen, W. (1978). Action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto.

Emojipedia.org. (2019). Emoji people and smileys meanings. https://emojipedia.org/people/.

Esteban, O. [Oscar], Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., . . . Snyder, M., et al. (2019). Fmriprep: A robust preprocessing pipeline for functional mri. Nature methods, 16(1), 111–116. Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M.

(2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 5562– 5570).

Fischl, B., Van Der Kouwe, A., Destrieux, C., Halgren, E., S´egonne, F., Salat, D. H., . . . Kennedy, D., et al. (2004). Automatically parcellating the human cerebral cortex. Cerebral cortex, 14(1), 11–22.

(12)

Fonov, V., Evans, A., Mckinstry, R., Almli, C., & Collins, D. (2009). Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. neuroimage 47 (suppl. 1), s102.

Fox, C. J., Moon, S. Y., Iaria, G., & Barton, J. J. (2009). The correlates of subjective perception of identity and expression in the face network: An fmri adaptation study. Neuroimage, 44(2), 569–580.

Ganel, T., Valyear, K. F., Goshen-Gottstein, Y., & Goodale, M. A. (2005). The involvement of the “fusiform face area” in processing facial expression. Neuropsycholo-gia, 43(11), 1645–1654.

Ghimire, D., & Lee, J. (2013). Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors, 13(6), 7714–7734.

Gorgolewski, K. [KJ], Esteban, O., Burns, C., Zeigler, E., Pin-sard, B., Madison, C., et al. (2017). Nipype: A flexible, lightweight and extensible neuroimaging data process-ing framework in python. 0.13. 1. Zenodo.

Gorgolewski, K. [Krzysztof], Burns, C. D., Madison, C., Clark, D., Halchenko, Y. O., Waskom, M. L., & Ghosh, S. S. (2011). Nipype: A flexible, lightweight and ex-tensible neuroimaging data processing framework in python. Frontiers in neuroinformatics, 5, 13.

Greening, S. G., Mitchell, D. G., & Smith, F. W. (2018). Spatially generalizable representations of facial expres-sions: Decoding across partial face samples. Cortex, 101, 31–43.

Greve, D. N., & Fischl, B. (2009). Accurate and robust brain image alignment using boundary-based registration. Neuroimage, 48(1), 63–72.

Haxby, J. V. (2012). Multivariate pattern analysis of fmri: The early beginnings. Neuroimage, 62(2), 852–855. Hutto, D. D., Robertson, I., & Kirchhoff, M. D. (2018). A

new, better bet: Rescuing and revising basic emotion theory. Frontiers in psychology, 9, 1217.

Jack, R. E., & Schyns, P. G. (2017). Toward a social psy-chophysics of face communication. Annual review of psychology, 68, 269–297.

Jenkinson, M., Bannister, P., Brady, M., & Smith, S. (2002). Improved optimization for the robust and accurate lin-ear registration and motion correction of brain images. Neuroimage, 17(2), 825–841.

Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of neu-roscience, 17(11), 4302–4311.

Kasper, L., Bollmann, S., Diaconescu, A. O., Hutton, C., Hein-zle, J., Iglesias, S., . . . Pruessmann, K. P., et al. (2017). The physio toolbox for modeling physiological noise in fmri data. Journal of neuroscience methods, 276, 56– 72.

Keltner, D., Sauter, D., Tracy, J., & Cowen, A. (2019). Emo-tional expression: Advances in basic emotion theory. Journal of nonverbal behavior, 1–28.

Klein, A., Ghosh, S. S., Bao, F. S., Giard, J., H¨ame, Y., Stavsky, E., . . . Chaibub Neto, E., et al. (2017). Mindboggling morphometry of human brains. PLoS computational biology, 13(2), e1005350.

Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: Integrating cognition, computation, and the brain. Trends in cognitive sciences, 17(8), 401–412. Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008).

Repre-sentational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuro-science, 2, 4.

Lewis-Peacock, J. A., & Norman, K. A. (2014). Multi-voxel pattern analysis of fmri data. The cognitive neurosciences, 512, 911–920.

Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., & Barrett, L. F. (2012). The brain basis of emotion: A meta-analytic review. The Behavioral and brain sci-ences, 35(3), 121.

L¨onn, G. et al. (2017). Representational similarity analysis with multiple models and cross-validation in magne-toencephalography.

Martinez, A. M. (2017a). Computational models of face per-ception. Current directions in psychological science, 26(3), 263–269.

Martinez, A. M. (2017b). Visual perception of facial expres-sions of emotion. Current opinion in psychology, 17, 27–33.

Mumford, J. A., Turner, B. O., Ashby, F. G., & Poldrack, R. A. (2012). Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses. Neuroimage, 59(3), 2636–2643.

M¨uri, R. M. (2016). Cortical control of facial expression. Journal of comparative neurology, 524(8), 1578–1585. Neth, D., & Martinez, A. M. (2009). Emotion perception in emotionless face images suggests a norm-based repre-sentation. Journal of vision, 9(1), 5–5.

Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representa-tional similarity analysis. PLoS computarepresenta-tional biology, 10(4), e1003553.

Oosterhof, N. N., & Todorov, A. (2009). Shared perceptual basis of emotional expressions and trustworthiness im-pressions from faces. Emotion, 9(1), 128.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of ma-chine Learning research, 12, 2825–2830.

Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of functional mri data analysis. Cambridge University Press.

Popal, H., Wang, Y., & Olson, I. R. (2019). A guide to repre-sentational similarity analysis for social neuroscience.

(13)

Social Cognitive and Affective Neuroscience, 14(11), 1243–1253.

Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlag-gar, B. L., & Petersen, S. E. (2013). Methods to detect, characterize, and remove motion artifact in resting state fmri. Neuroimage, 84, 320–341.

Ritchie, J. B., Kaplan, D. M., & Klein, C. (2019). Decod-ing the brain: Neural representation and the limits of multivariate pattern analysis in cognitive neuroscience. The British journal for the philosophy of science, 70(2), 581–607.

Scarantino, A. (2017). How to do things with emotional ex-pressions: The theory of affective pragmatics. Psycho-logical Inquiry, 28(2-3), 165–185.

Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication, by ce shannon (and recent contributions to the mathematical theory of communi-cation), w. weaver. University of illinois Press. Skerry, A. E., & Saxe, R. (2015). Neural representations of

emotion are organized around abstract event features. Current biology, 25(15), 1945–1954.

Soch, J., Allefeld, C., & Haynes, J.-D. (2020). Inverse trans-formed encoding models–a solution to the problem of correlated trial-by-trial parameter estimates in fmri de-coding. Neuroimage, 209, 116449.

Srinivasan, R., Golomb, J. D., & Martinez, A. M. (2016). A neural basis of facial action recognition in humans. Journal of Neuroscience, 36(16), 4434–4442.

Tamir, D. I., Thornton, M. A., Contreras, J. M., & Mitchell, J. P. (2016). Neural evidence that three dimensions orga-nize mental state representation: Rationality, social im-pact, and valence. Proceedings of the National Academy of Sciences, 113(1), 194–199.

Tautkute, I., Trzcinski, T., & Bielski, A. (2018). I know how you feel: Emotion recognition with facial landmarks. In Proceedings of the ieee conference on computer vision and pattern recognition workshops(pp. 1878–1880). Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A.,

Yushkevich, P. A., & Gee, J. C. (2010). N4itk: Improved n3 bias correction. IEEE transactions on medical imag-ing, 29(6), 1310–1320.

Valentine, T. et al. (2001). Face-space models of face recogni-tion. Computational, geometric, and process perspec-tives on facial cognition: Contexts and challenges, 83– 113.

Valentine, T., Lewis, M. B., & Hills, P. J. (2016). Face-space: A unifying concept in face recognition research. Quar-terly Journal of Experimental Psychology, 69(10), 1996– 2019.

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., . . . Bright, J., et al. (2020). Scipy 1.0: Fundamental algorithms for scientific com-puting in python. Nature methods, 17(3), 261–272. Vytal, K., & Hamann, S. (2010). Neuroimaging support for

discrete neural correlates of basic emotions: A

voxel-based meta-analysis. Journal of cognitive neuroscience, 22(12), 2864–2885.

Walt, S. v. d., Colbert, S. C., & Varoquaux, G. (2011). The numpy array: A structure for efficient numerical com-putation. Computing in science & engineering, 13(2), 22–30.

Wang, S., Yu, R., Tyszka, J. M., Zhen, S., Kovach, C., Sun, S., . . . Chung, J. M., et al. (2017). The human amyg-dala parametrically encodes the intensity of specific facial emotions and their categorical ambiguity. Nature communications, 8(1), 1–13.

Yu, H., Garrod, O. G., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers & Graph-ics, 36(3), 152–162.

Zhang, Y., Brady, M., & Smith, S. (2001). Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algo-rithm. IEEE transactions on medical imaging, 20(1), 45–57.

(14)

Appendix

Appendix 1 0 100 200 300 400 500 0 100 200 300 400 500 600 700 0 1 2 3 4 5 67 811109 121317141516 18 19 202221 23 242526 27 2829 30 31 323334 3536 37 38 39 40 41 42 43 44 45 46 47 48 49 3D based landmarks 0 100 200 300 400 500 0 1 2 3 4 5 6 7 8 9 10 12 11 13 14 15 16 1718 19 20 24 252321 22 26 27 28 29 30 31 32 33

Google Vision based landmarks

Figure 11.Comparison of facial landmarks annotations, left: landmarks based on the 3D generated stimuli and right: detected

landmarks of the image by the Google Vision API. Overall the performance of the Google API is good, but some landmarks may not be precicely or consistently placed (i.e., the points on the eyebrows and the bottom of the chin) across stimuli.

angerdisgust fearhappinesssadnesssurprise

0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 AUC score neutral, distance

angerdisgust fearhappinesssadnesssurprise

neutral, angles

angerdisgust fearhappinesssadnesssurprise

peak, angles

angerdisgust fearhappinesssadnesssurprise

peak, distance

angerdisgust fearhappinesssadnesssurprise

peak+neutral, distance

angerdisgust fearhappinesssadnesssurprise

peak, distance+angles mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13 mean AUC of all subjects sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09 sub-10 sub-12 sub-13

AUC Scores Per Subject (10-fold averaged, vertex features)

Figure 12.Performance of Google based landmarks in the behavioural experiment. The overall performance is worse

compared to the landmarks shown in Fig. 8, especially in the ”neutral” conditions. This may be due to the slight inaccuracy and inconsistency of the landmark annotations, but notably these models were trained using fewer features (34 landmarks were annotated by the Google API as opposed to 50 for the 3D based ones). Further, the Google based landmarks are annotated based on the image and can only return the location based on image coordinates (x, y), whereas the 3D based landmarks incorporate the depth of the face as well (x, y and z coordinates). Still, using the Euclidean distance features of the Google landmarks still achieves high AUC scores, especially for happiness. So while this technique may be flawed, it is a viable alternative for real-life photographs or pre-existing stimuli.

Referenties

GERELATEERDE DOCUMENTEN

Legionella growth in domestic water heating systems in South Africa.. Booysen d,∗ a Water Institute and Department of Microbiology, Stellenbosch University,

Recent EMG studies have shown that emotionally-congruent expressive reactions in the observer’s face may be also elicited by the perception of bodily expressions, thus

vraatschade die door zowel larven als kevers in het veld wordt toegebracht aan al deze getoetste plan- ten, duidt de afwezigheid van aantrekking door middel van geur- stoffen

To further test if moving landmarks were considered as good as stable ones, we asked a different group of participants to watch videos depicting relevant or irrelevant motion and

The findings of my research revealed the following four results: (1) facial expres- sions contribute to attractiveness ratings but only when considered in combination with

This section describes several interesting social interaction theories that will be investigated and incorporated into a computational model for our conversational tutoring agent:

Binnen deze triades kwam naar voren dat de relatie tussen de JIM en de jongeren en ouders in orde was, maar er tussen jongeren en ouders nog wel veel spanning aanwezig was, en of/

We present analysis algorithms for three objectives: expected time, long-run average, and timed (in- terval) reachability.. As the model exhibits non-determinism, we focus on maxi-