• No results found

Masters of Arts and Culture (Musicology)

N/A
N/A
Protected

Academic year: 2021

Share "Masters of Arts and Culture (Musicology)"

Copied!
93
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Expectancy and Music Appreciation

A Literature Review of Computational, Cognitive,

and Neuroscientific Approaches

Milan Simas July 22nd 2018

Thesis: MA Arts and Culture (Musicology) University of Amsterdam

1st Examiner: Dr. Benjamin G. Schultz 2nd Examiner: Dr. Makiko Sadakata

(2)

Abstract

In this thesis, the relationship between expectations and emotions in music is examined through computational, cognitive, and neuroscientific approaches. The IDyOM model (Pearce &

Wiggins, 2012) is introduced as a representation of statistically learned pitch predictions in monophonic melodic sequences. Reaction-time studies underline the internal representations of melodic and harmonic expectancy, with neuroimaging studies further showing distinct event-related potentials (ERPs) for violations of expectancy. The dynamic attending theory (Jones, 1976; Jones & Boltz, 1989) provides a neural basis for temporal predictions in music, further discussed through examples of temporal irregularities commonly found in music performance (syncopation, non-integer ratio subdivisions of a beat, tempo variation). Physiological and self-reported evidence show the effects of violations of expectancy on emotions, as suggested by the ITPRA theory (Huron, 2007). These observations can be interpreted by considering musical events as “rewarding”, a concept further discussed through neuroimaging studies focusing on brain structures that are associated with reward evaluation and prediction during music listening. Finally, tentative evidence for generally preferred levels of predictability is presented, with methodological suggestions to further empirically test this idea. Overall, the literature suggests distinct predictive cognitive and neural processes for melodic, harmonic, and rhythmic

expectations, while a unified model of musical predictions that includes these three components is still currently lacking.

(3)

Table of contents

Abstract...2

Introduction...3

1. Models of Musical Predictions Mechanisms...5

1.1 Statistical Learning...5

1.2 The Information Dynamics of Music Model (IDyOM)...7

1.3 The Dynamic Attending Theory...11

1.4 Considering Individual and Cultural Factors...14

2. Musical Expectations and Emotions...20

2.1 The ITPRA theory...21

2.2 The Prediction Effect...31

3. Predictions and the Brain...34

3.1 General Theory...35

3.2 The Reward System...38

3.3 Neuroscientific Research on Musical Predictions...40

3.4 Reassessing the Prediction Effect: New Interpretations...42

4. Expectancy Violations...44

4.1 Response-Time Studies...45

4.2 Neuroimaging Studies...47

4.3 Violations of Temporal Expectancy...49

4.4 Expectancy Violation and Emotion...54

5. Towards an Optimal Level of Predictability...57

Conclusion...61

(4)

Expectancy and Music Appreciation

The experience of listening to music, like other arts, such as dance and cinema, is a process that unfolds through time. Unlike looking at a painting or a photograph, music is enjoyed by listening and reacting to a complex string of events, with some more salient than others. The ability to enjoy music implies that listeners can perceive, consciously or not, meaningful

relationships in the organization of pitches and durations, considered as the foundation of music perception (Dowling, 2010). From these relationships, listeners are able to infer future musical events. A repeated melody, lyrics known by heart, an unexpected interval leap or a sudden modulation, all have the potential to interact with a listener's expectations, either by confirming or breaching them. The ability for these expectations to affect emotions was first explored by Meyer (1956) and further theorized by Huron (2006). Other researchers have considered

expectancy as the principal conveyor of meaning and emotion in music (Cooper & Meyer, 1960; Lerdahl & Jackendoff, 1977; Monelle, 1992), and a central component of human behaviour (Kveraga, Boshyan, & Bar, 2007).

In this thesis, I aim to investigate the interdisciplinary sources of evidence and their interpretations, as models or theories, that seek to further the understanding of expectancy in music. I examine the underlying cognitive and neural processes involved in musical expectation, and explore how these processes differ across musical components, such as melody, rhythm, and harmony. Furthermore, I bring together evidence from multiple disciplines to discuss how musical expectancies elicit emotions in listeners. Theoretical and computational models of music perception are discussed, as well as theories applied to general predictive processing. Empirical evidence from behavioural and neural studies are addressed to assess the validity of these

(5)

theories and models. Finally, I discuss physiological, neural, and self-reported evidence for the potential of violated expectancies to elicit different emotional reactions in listeners, as well as tentative evidence for an optimal balance between both confirmed and violated expectancies. By combining these approaches, I wish to demonstrate how musical expectation, mediated by different cognitive abilities and specific brain structures, can be better understood.

1. Models of Musical Predictions Mechanisms

A living organism's ability to process information from the surrounding environment is crucial to learn and predict potential future events. Although certain behaviours are innate, as in the case of automatic reflexes or species-specific behaviours (Hoy, 1974), a large part of human experience is dictated by previous events that are likely to occur again. Thus, expectation and anticipation provide an important evolutionary advantage for survival. In this first part, I discuss what cognitive mechanisms are considered to be the basis of musical predictions, how they are applied in models, and how individual differences may interact with these mechanisms.

1.1 Statistical Learning

The study of learning has provided insight into the various ways human beings learn to build expectations of everyday events. The ability to filter, perceive, categorize, and recall information promotes prediction mechanisms that can modulate behaviour to react to a future event (Saffran, Aslin, & Newport 1996). An important part of this study is the development of statistical learning, which suggests that human beings extract statistical regularities from the environment, often unconsciously and unintentionally (Saffran et al., 1996). This phenomenon was notably observed in the context of the learning of word-boundaries in speech. Babies were

(6)

initially exposed to streams of nonsensical syllables that included recurring syllable pairs, mimicking real-life word boundaries. During the second part of the test, two types of stimuli were presented. One contained invented words containing the same syllables, without using statistically reoccurring syllable combinations from the first part. The second type of stimulus used the same syllables and their statistically prevalent combinations as was previously heard. Using a head-turning paradigm and sustained visual fixation, Saffran et al. observed that the babies' attention was directed towards the first stimulus that did not contain statistical syllable combinations similar to the stimulus used during exposure phase, thus showing that changes in common syllable combinations were noticed.

The observed results provide an interpretation of how speech signals are translated into words categories at a young age. Children were not provided any other word-boundary markers during the experiment in contrast to real-life situations where pauses, intonation, and visual cues can all be used to infer where speech boundaries occur. This suggests that statistical learning is part of a larger ensemble of learning mechanisms that co-operate to filter, process, and store valuable information for future use. Saffran (2003) later expanded the concept of statistical learning in language acquisition to other levels of language, such as syntax. Thus,

word-boundary learning based on statistical associations allows new and larger units of meaning to be exposed to the same process, as in the case of word orders. For example, in the same way sy and la (as in the word syllable) are more likely to follow each other than sy and pu in the English language, the words "the" and "chair" are more likely to appear together than "the" and "learn" in the presented order. Following this, statistical learning can be defined as a multi-level learning process that not only applies to the language-learning domain. Infants have been observed to be

(7)

able to discern word boundaries segmented by intonation (Saffran, Johnson, Aslin, & Newport, 1999) and visual patterns (Fiser & Aslin, 2002; Kirkham, Slemmer, & Johnson, 2002). Although it was observed in the context of language acquisition, it is unlikely that statistical learning is a product of language first and foremost, as it has also been observed among primates and rats (Newport, Hauser, Spaepen, & Aslin, 2004; Toro & Trobalón, 2005).

Finally, statistical learning suggests that learning is constrained by complex, multi-level unconscious calculations commonly shared among humans (Newport & Aslin, 2000). This provides a competing explanation to Chomsky's Universal Grammar theory that posits that the similarities in worldwide linguistic structures stem from innate linguistic rules. Instead, statistical learning and its limitations can be argued to be a driving force shaping language structure, along with limits in memory and attention. In this view, components of language that promote learning efficiency are likely to have persisted across languages and their development through time, with linguistic structures being considered as a product of cognitive constraints (Saffran, 2003). Statistical learning has also been studied in the context of music (e.g., Daikoku, Yatomi, & Yumoto, 2015; François & Schön, 2011; Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012). In the following section, a computational model of melodic expectancy based on this theory is described and discussed.

1.2 The Information Dynamics of Music Model (IDyOM)

As music involves multiple musical components unfolding through time (pitch, harmony, rhythm) and according to specific structures (scales, tonal centres, rhythmic patterns), it renders itself particularly approachable from the standpoint of statistical learning. Expectations of

(8)

can also act as tools for artists to convey emotional meaning. In Emotion and Meaning in Music (1956), Leonard Meyer suggests that the manipulation of expectations can convey emotional meaning through tension and release, anticipation, surprise, and deception. Testing this approach, Pearce and Wiggins (2012) provided a computational model of expectation based on statistical learning, aiming to mimic similar processes, initially applied to the perception of melodic sequences. Efforts to employ information theory has been carried in music analysis (Cohen, 1962) and composition (Ames, 1987, 1989; Hiller, 1970; Hiller & Isaacson, 1959). The implication-realization (IR) model (Narmour, 1990, 1992) proposes that melodic pitch expectation are categorized as simple processes extracted from pitch sequences, such as

duplication (G-G-G) and registral return (C-G-C). However, if statistical learning was to account entirely for the process of melodic pitch expectations, Narmour's classification of the different prediction mechanisms would be deemed unnecessary (Pearce & Wiggins, 2012).

The Information Dynamics of Music (IDyOM) model is based on empirical evidence suggesting that melodies are learned unconsciously and unintentionally (Oram & Cuddy, 1995; Saffran et al., l996; Saffran et al., l999;). This type of musical learning has been studied using harmonic (Ponsford, Wiggins, & Mellish, 1999) metrical, and non-metrical patterns (Schultz, Stevens, Keller, & Tillmann, 2013). The model use Markov chains, a stochastic system used for probability forecasting at several hierarchical levels. The main component of this system is the Markov chain, which suggests that the likelihood of a future outcome depends on the variables leading up to the current state (Rabiner & Juang, 1986). The IDyOM model contains two parts, as it takes into account a corpus of melodies from which the melody is presented, and the melody itself to produce likelihood estimates. Although the two mentioned parts are identical, the

(9)

long-term model (LTM) that takes melodies from the corpus into account, strives to imitate the listener's long-term memory of past experiences outside of immediate, real-time perception of a musical event. The short-term listening model (STM) calculates predictions solely on the basis of a single melody chosen. Both models are said to analyze data in light of temporal unfolding. That is, after each note is sequentially played. Similarly to hypothesized human development, the model undergoes optimization by first establishing relationships based on absolute pitch, then discarding it to use the relationship between pitches instead. Human beings have been observed to use absolute pitch in infancy, with most individuals adopting relative pitch through the course of development (Saffran & Griepentrog, 2001).

According to Pearce and Wiggins, (2012), the required output of the model is never given prior to the model's calculations, resulting in only learning-based calculations. This is argued to support the bottom-up view implied in statistical learning, in which top-down adjustments to the model to produce more accurate outcomes are not considered (Pearce & Wiggins, 2012). Instead, the model solely relies on a variety of features of a melody including tonic pitch, scale degree, metrical level, and note duration, which can nonetheless be considered as pre-conceived

categories representing the most salient features for human perception. Instead, Pearce considers these features as objectively relevant characteristics of sound that are part of bottom-up

processing.

Using the concept of entropy, defined as the calculated uncertainty of an outcome based on average information content (Shannon, 1948), the model can be used as a measure of how unpredicted or surprising melodic pitches may be interpreted by listeners (Pearce & Wiggins, 2012). Comparative results show that the model outperforms Shellenberg's (1997) simplified

(10)

version of the implication-realization (I-R) model (1990) but also has higher accuracy than human predictions (Pearce & Wiggins, 2012). This can be attributed to the model's capacity to recall past information with perfect accuracy. Human memory does not have this feature, as it may can contain errors at the level of perception, encoding or recall of past events. Therefore, the IDyOM model only partly mimics human statistical learning, since information retrieval, a memory mechanism upon which pitch probabilities are calculated, remains largely

misrepresented. Previous efforts have been made in this direction, with the introduction of variable memory length in machine learning (Ron, Singer, & Tishby, 1996). Furthermore, a recent study established a significant relationship between the unpredictability of pitches and its memorization. Agres, Abdallah, and Pearce (2018) used a probe tone paradigm to assess how higher unpredictability of pitches affected confidence in the recall of previously heard notes. The level of unexpectedness was modulated by the level of stimulus complexity, hence establishing that less predictable note sequences are harder to memorize. These findings provide a sensible addition to the IDyOM model when future implementation is done as proposed by the

researchers. This would allow not only a statistical model representing cognitive processes, but also a more accurate account of how the information from which the calculations are based, are memorized in relation to the stimulus complexity. Furthermore, these results are in line with the Perceptual Fluency/Attributional model (Bornstein & D'Agostino, 1994), which proposes that the effective recall of past stimuli tends to generate positive emotions when it next occurs. As familiarity increases ease of processing, familiar information tends to generate more positive emotions through the mere-exposure effect, which is discussed later.

(11)

The IDyOM model contains some limitations, allowing space for further developments. One aspect is the nature of the information itself. The model is only adapted to monophonic melodies, and has not been applied yet to harmony and rhythm. The possibility to extend to other musical components would allow a more accurate representation of the process of musical expectations, especially as all music components simultaneously occur and are likely to modify predictions. Nonetheless, the IDyOM model has been applied to meter perception (van der Weij, Pearce, & Honing, 2017), suggesting promising future applications.

1.3 The Dynamic Attending Theory

Cognitive anticipation mechanisms allow the prediction of musical events, such as pitch in the context of melodies. By nature, music's temporal unfolding allows rhythms to divide time in regular amounts, with multiple simultaneously occurring periodicities. These regularities are perceived by the listener, and are processed to predict subsequent events in time. Temporal expectancies provide an important role in the prediction of all musical events, considering that pitch and harmony are inherently time-bound. Yet, although the organization and subdivision of temporal events is conventionally written in musical notation (4/4 representing four equal quarter-note subdivisions of a bar for example), the perception of salient periodicities may vary among listeners (Jones, 2009). The presence of stronger and weaker beats may attend to different levels of periodicities and create mental constructions of how they should be organized. Studies suggest that perceived regularities afford greater ease in processing sounds from the environment (Arnal and Giraud, 2012) and are preferentially processed (Lange, 2009). Furthermore,

(12)

temporal expectations should be considered as a valuable tool for facilitating the perception and processing of information for acoustical events, as well as other sensory modes.

Multiple models of musical perception have been proposed to account for the ability of listeners to perceive and anticipate rhythm. For example, Povel and Essens (1985) suggested that an “internal clock” was influenced by metrical meter, synchronizing to perceived regularities in music. However, the dynamic attending theory (Jones, 1976; Jones & Boltz, 1989,) is discussed here. The theory suggests that temporal regularities found in music entrain neural oscillations that adapt and synchronize to direct attentional resources to future, regularly occurring events (Barnes & Jones, 2000). In turn, focused attention to specific events facilitates information processing when in phase with a perceived rhythm (Bauer, Jaeger, Thorne, Bendixen, &

Debener, 2015). The entrainment process suggests that neural oscillators are flexible. When out of phase, these oscillations may speed up or slow down gradually to match external periodicities (Barnes & Jones, 2000). This furthers the study and theorization of attention, as it suggests that it can be allocated to regular, expected events. The neural oscillators are additionally defined as memories of time differences, from which comparisons are made to anticipate the next event onset.

The dynamic attending theory makes a clear distinction between statistically-based and dynamic expectations (Barnes & Jones 2000). Statistically-based expectations are considered to be founded upon the long-term assessment of the relative frequency for an event to occur to predict the likelihood for it to happen again. This implies an emphasis on a large-scale musical context for increasing accuracy. Particularly relevant events are used as cues or anchor points that listeners can use as temporal reference points. As with statistical learning, the dynamic

(13)

approach also applies to the visual domain (Downing, 1988; Posner, 1980). Second, auditory sequences and patterns offer the deducing of future events that have not yet been perceived in the local context. By understanding a sequence, individuals may expect a forthcoming event, without prior exposure to it, as in guessing what letter comes after the letter F, or what number follows the sequence 2, 4, 6.

The dynamic attending theory has later been supported by dynamic attending models (Large, 1994; Large & Kolen, 1994; McAuley, 1995) and behavioral studies (Barnes & Jones 2000; Jones, Moynihan, MacKenzie & Puente, 2002; Large & Jones, 1999). In their study, Large and Jones (1999) showed that time judgment accuracy for a standard duration between the onsets of two consecutive sounds (inter-onset interval, or IOI) varied according to whether it was larger, smaller or equal to an induction sequence presented prior to it. Results suggest an inverted-U shape of expectancy judgment accuracy, with the apex being the accurate prediction of the standard IOI when equal to preceding IOIs. This has been described as the expectancy profile, which suggests increasing temporal judgment accuracy when interval deviations from a sequence are closest to 0ms. A follow-up study (Barnes & Jones, 2000) was designed similarly, with no rhythmic context. A sequence of only two intervals had listeners judge the length of the second interval with regard to the first. Surprisingly, an expectancy profile was also noted, although not as clearly as the inverted-U observed in the six other experiments found in the same study. Other experiments presented in Barnes and Jones (2000), along with further behavioural (Jones et al., 2002) and neurophysiological studies (Nozadaran, 2014; Will & Berg, 2007; Zanto, Snyder, & Large, 2006), support neural entrainment as a feature involved in temporal expectancy. A more recent addition to the dynamic attending theory was proposed by Jones (2008), addressing the

(14)

possibility of multiple co-occurring neural oscillations being perceptually bound together into a distinct unit or metric cluster. The metric binding hypothesis provides the trained listener a greater flexibility in the division of attentional resources, with the possibility to shift attention between different periodical events. The duration of co-occurring oscillations, their phase, and their ratio (3:2, 4:1) allow the attentional oscillations to become more strongly bound through time. Jones presents the hypothesis as a learning mechanism which, through enculturation, allow increasing familiarity to particular metrical categories. This hypothesis, along with the dynamic attending theory, provides a neural basis for the ability to synchronize to rhythm, found in humans but also among non-human species (Schachner, Brady, Pepperberg, & Hauser, 2009). Through neural entrainment, the synchronization to external auditory events allows future expectancies to be built upon time interval comparisons.

1.4 Considering Individual and Cultural Factors

Although similar cognitive and neural prediction mechanisms can be assumed to be shared across human beings, the impact of the environment and cultural context may entail small variations in predicted outcomes of events. Among others, research has pointed out to age and cultural background as factors potentially modulating musical expectations (Eerola, Louivuori, & Lebaka, 2009). Evidence for variations in expectations on the basis of these factors is presented here, although musical training has also been presented as relevant to the study of musical expectancy (Corrigall & Schellenberg, 2015a).

(15)

Chinese listeners, and between Western and Indian listeners, have suggested that cultural context does not provide a clear distinction between listener expectancies (Castellano,

Bharucha, & Krumhansl, 1984; Krumhansl, 1995). Similarities in results between groups tend to support a bottom-up approach to unfamiliar music. Subjects facing statistically different note relationships and tonal hierarchies than what they are accustomed to had no difficulty to adapt to new stimuli. This tends to undermine previous accounts of implicitly integrated musical systems and hierarchies, as posited in the implication-realization model (Narmour, 1990, 1992). With this in mind, the IDyOM model, which is solely based on statistical learning, can be argued to follow early cross-cultural studies on musical expectations, especially when using only the local musical context (STM) for prediction calculations. The ability to solely use data from one melody (STM) as opposed to an entire selected corpus (Long-Term Model, or LTM) for calculating predictions resembles the cross-cultural research results, in which individuals unfamiliar with a musical style produced similar predictions than individuals who are familiar to it as a result of culture.

Bharucha's (1987) distinction between schematic and veridical expectation may provide a further interpretation of the results. Expectations have the potential to be formed based either on the general structure of a musical piece (harmonic progression, the melody's rhythm) or conversely on past experiences of it. In light of this, the cross-cultural studies mentioned seem to suggest that in the context of limited exposure to a certain music style, individuals may use the local musical context, in this case a melody, and infer its structure to generate predictions, rather than using memories of previously heard music that is similar.

(16)

yoiks, a musical tradition found among Sami people in Northern Finland. The three groups tested were individuals from Sami culture, musically trained Finnish students, and finally, Western musicians. Although similarities in results were observed, a few limitations deserve mentioning. First, yoiks mostly use the western pentatonic scale as a tonal centre, which contains five out of seven notes of the major scale, thus potentially rendering Western listeners relatively familiar to the melodic content presented, in contrast to other foreign tuning systems, such as Arabic maqams or Indian thats. Second, listeners are provided practice trials before the test, therefore giving further musical context than the test itself. This may reinforce the process of statistical learning in the local musical context, resulting in higher accuracy of pitch predictions.

Consequently, lower exposure to musical context would perhaps provide a better account of the listeners' internalized musical structures based on their own respective culture.

Indeed, a cross-cultural study on musical expectation by Carlsen (1981) asked listeners from the United States, Germany, and Hungary to sing the next anticipated notes after a short musical context. This context, however, was only made of song beginnings, thus allowing less possibility for subjects to create predictions based on the musical context presented. Perhaps unsurprisingly, the results of this experiment showed that using melodic beginnings as stimuli yielded different results for each nationality, contradicting results from other studies using larger melodic contexts. Significant expectancy patterns differences were found between all three nationality-based groups, with the United States and Hungary containing most differences in melodic expectations. Addressing the results, Carlsen et al. suggest that cultural differences are likely diminishing over time, as the gradual expansion of telecommunication networks in the growing international musical markets may create a homogenization of exposure to global

(17)

musical diversity, resulting in increasingly similar expectancy patterns in individuals from different cultural backgrounds. The study's date, in comparison to the other studies containing conflicting results, seem to confirm this, although some have taken this into consideration in the experimental methods. In Kessler, Hansen, and Shepard (1984), the tested listeners were

recruited through unconventional methods. Searching for individuals isolated from Western musical influence on the island of Bali, they inquired locals on where, in their opinion, the most remote village was. Upon arrival, the researchers asked again and did so repeatedly to find the most remote village, where listeners were least likely to having been exposed to Western music. Although results between Western and non-Western listeners were largely similar, the response strategies were argued to be different. Using pitch height and repeated pitch occurrence as methods to infer the relatedness of a probe-tone towards a musical context was mostly used when the music was not common to the listener's own culture.

Aside from culture-based expectancies, listeners themselves unconsciously modulate their expectations not only throughout an event, but also through development (Corrigall & Schellenberg, 2015b). Schellenberg, Purdy, Adachi, and McKinnon, (2002) compared melodic expectancies between adults and children aged 5, 8 and 11 years old. By testing 60 participants of different age groups, researchers found age-specific trends: adults developed new types of expectancies based on increasingly complex melodic features. By being asked to sing a note that listeners believed was likely to follow a presented melodic sequence, adults showed preference towards pitches that created smaller intervals with the previously heard pitch. Along with pitch proximity, pitch reversal was equally largely employed by adults. A change in pitch direction after a large interval was often anticipated. This common feature of sung melodies has been

(18)

argued to be created by biological constraints of the vocal apparatus and has equally been observed in bird song (Gill & Purves, 2009; Tierney, Russo, & Patel 2011). In contrast, 5-year-olds showed limited signs of pitch-reversal-based expectancies whereas 8- and 11-year-5-year-olds showed increasing consideration for these statistically salient features, suggesting increasing accuracy of melodic anticipatory skills through aging. According to Schellenberg et al. (2002), this developmental shift may be attributed to (a) increasing exposure through time, (b)

development of general perceptual and cognitive capabilities or (c) auditory skills found in early language learning are transferred to music perception. Young infants are generally able to perceive only a crude, general melodic contour, without distinction between interval sizes of similar directions (Trehub, Schellenberg, & Hill, 1997). Among 4- and 6-year-olds for example, interval changes that change the melodic contour direction are more likely to be perceived than intervals that do not (Morrongiello, Trehub, Thorpe, & Capodilupo, 1985; Pick et al., 1988). Therefore, melodic perception becomes increasingly refined throughout development, narrowing from general contour to interval size to multiple-interval grouping (Schellenberg et al., 2002). As mentioned earlier, differences in cognitive capacities between developmental stages may be predictors of musical expectancy. As memory capacity, speed, and efficiency develops through childhood (Case, Kurland, & Goldberg, 1982; Cowan, 1997), and is central to the formation of experience-based expectations, children should show lower accuracy in pitch prediction compared to adults. Memory for pitch has been tested comparatively between multiple age groups ranging from 6-years-old to adults, with pitch memorization increasing proportionally with age (Keller & Cowan, 1994).

(19)

increasing accuracy throughout childhood development, effectively contributing and modifying younger listeners' musical expectations. According to Jonaitis and Saffran (2009), tonal

hierarchy and harmony is internalized through the implicit learning of statistical properties in music at an early age. Furthermore, the ability to perceive key membership is developed at an earlier stage than the knowledge of harmony (Corrigall & Schellenberg, 2015b). Through studies on response time in harmonic contexts, children from ages 6 to 11 show implicit knowledge of harmony by making faster judgments of musical features, such as timbre, sung vowel sound, and consonance when it is performed within the context of a tonic chord instead of another chord function (Schellenberg, Bigand, Poulin-Charronnat, Garnier, & Stevens, 2005).

Similarly, children's development of rhythm synchronization or entrainment implies early rhythmic anticipation mechanisms which are based on different developmental stages of rhythm perception. Studies reveal that the distinction between two simple rhythms can be distinguished by infants aged only 2- and 5-months-old (Chang & Trehub, 1977; Demany, McKenzie,

& Vurpillot, 1977). Considering the suggestion that temporal regularity allows greater perceptual fluency of musical elements (Drake & Bertrand, 2001), the early development of rhythm

perception may contribute to greater accuracy in musical expectation especially during infancy. In fact, neurophysiological measuring suggest that the beat perception already occurs among newborns (Winkler et al., 2009). The ability to synchronize to a beat only develops around 2 years of age, when cognitive (attention, memory) and motor (muscle control, coordination) skills are sufficiently developed to physically mimic temporal regularities (Drake, Jones, & Bharucha, 2000). Beat synchronization at this age can only occur for a small range of tempi, with children's preferred tempo gradually slowing through time and the ability to synchronize to different beats

(20)

increasing (Corrigall & Schellenberg, 2015b). Interestingly, the social setting has a strong influence on beat-synchronization accuracy, with human drumming examples positively

affecting results comparatively to an audio recording or a robot-like drum machine giving visual cues (Kirschner & Tomasello, 2009).

In sum, research on musical predictions have provided mixed support for culture-driven expectations. The study of musical universals (Savage, Bron, Sakai, & Currie, 2015) further suggests that foundational elements of music, such as the use of rhythm, discrete pitches, and seven or less scale degrees within an octave, are shared among almost all cultures. However, human development has been observed to play an important role in musical prediction differences. This factor provides an examples of the ways in which different variables can modify musical expectancy. Additionally, statistical learning has showed how exposure is integrated and processed into probabilistic calculations of future events, along with a

computational model pertaining specifically to pitch predictions. Finally, temporal expectancy was addressed through the dynamic attending theory and IDyOM model.

2. Musical Expectations and Emotions

To adapt to environmental conditions, living organisms have acquired various physical abilities throughout evolution, from flying to retracting in a shell to the ability to see at night. In The expression of emotions in Man and Animals (1872), Charles Darwin provided an

evolutionary account of affective behaviour, defining emotions as mechanisms encouraging adaptive behaviour that evolve and mutate through time, just like physical abilities. Accordingly, emotions interact with a variety of cognitive mechanisms, including anticipation, to direct

(21)

and functional approach to emotions poses difficulties when applied to music-induced emotional states. The increase in the likelihood of survival from experiencing pleasure in eating food containing nutrients for example is much more evident than in the case of music. Nonetheless, different evolutionary theories explaining music's potential for pleasure have been proposed, such as parent-infant bonding (Dissanayake, 2008), mate-selection (Miller, 2000), social

cohesion (Cross, 2001), and beneficial play (Honing, 2011). Leaving this debate aside, accounts of how music-induced emotions are produced in real-time may provide relevant additions to our understanding of the relationship between expectations and emotions.

Meyer (1956) proposed that tensions from dissonant chords and melodies, when prolonged, shortened or displaced further than expected, had the potential to elicit anticipation-based emotions, such as anxiety or surprise. This was later supported by Steinbeis, Koelsch, and Sloboda (2006), which found increases in subjective and physiological responses to more

unexpected harmonic progressions. Juslin and Västfjäll (2008) proposed a theoretical framework that captured the various mechanisms, such as episodic memory and emotional contagion, by which emotions may arise in the context of music, but also in other modalities, such as visual art. Included in these mechanisms is musical expectancy, which may elicit an emotional reaction by being either confirmed or violated (Juslin, 2013).

2.1 The ITPRA theory

Huron (2007) offered a theory of the different psychological processes involved in expectation, positing that expectations can be translated into a sequence of affective and

(22)

(imagination, tension, prediction, reaction, appraisal) that serve different functions, as summarized here.

Imagination, the first step in the ITPRA theory sequence, entails the imagining of a future event and the hypothetical emotional consequences that are implied. By expecting to be drenched at the end of the day, one may choose to bring an umbrella at work. By projecting, an individual can assess the potential danger or opportunity the future can afford. This mechanism is strongly related to motivation, as it allows the pursuit of long-term rewards by partly experiencing what their outcome would produce emotionally. Huron (2007) underlines the neural basis to this mechanism through a study by Bechara et al., (1994) where damage to the prefrontal cortex of patients rendered them able to feel negative or positive emotions as a consequence of an event, but unable to anticipate the emotional consequences of future events. Events subjectively associated with negative affect (negative valence), were thus impossible to preemptively avoid, resulting in poor financial health, reckless behaviour, and limited social relationships. The intricacies of how the brain generates these expectations and the calculation of their affective value are further developed in the next chapter, which provides an overview of the main brain structures involved in predictions and the specific neurochemistry involved in their interactions.

Following the recognition of a hypothetically expected event, the next step in the sequence is its physiological preparation. This mechanism, named the tension response,

generally translates to physiological and emotional consequences, such as dilated pupils, increase in heart rate, tense muscles, and heightened awareness. (Huron, 2007). These reactions are considered as manifestations of heightened attention and arousal levels, associated to states of concentration and wakefulness. These reactions may seem obvious in the context of an

(23)

anticipated negative outcome (e.g., coming across a bear during a walk in the woods), but also apply to more complex situations where the assessment of danger is harder to discern. The increase in attentional and arousal levels can be understood as an effort for energy conservation, as these responses are taxing for the body and mind. Accordingly, physiological arousal and attention are rapidly increased only prior to the event, considering their constantly high levels would be too energetically taxing. Here, a parallel deserves mentioning. The dynamic attending theory, mentioned earlier, precisely supports this view. The synchronizing of attentional

oscillations to perceived time regularities may convey the best example of how the temporal unpredictability of an event can be decreased. This allows for more attention to be allocated to other events, and promotes energy-conservation efficiency. With this in mind, rhythm perception can be viewed as having an adaptive value, considering it concentrates attention to specific points in time, freeing up the potential to react to other more unexpected events with greater speed and accuracy.

Following a hypothetical event, three responses are distinguished as having the potential to evoke emotions. Firstly, the prediction response reinforces prediction accuracy by rewarding correct expectations and discouraging inappropriate ones. According to Huron (2007), this translates to accurate predictions containing positive emotional valence, and inaccurate

predictions containing negative emotional valence. This statement is supported only by limited references. For this reason, the following chapter's summary of neuropsychological research on predictions will help better assess whether or not prediction accuracy contains affective value.

Finally, the response to the event itself first contains the reaction response, which

(24)

of a sudden movement, occurring less than 150 milliseconds after the event itself. The second response, the appraisal response, is a conscious, slower assessing of the consequences to a given event. As it appears after the reaction response, there might be differences between both

reactions, as in the example of giving a friend a scare by hiding behind a door and jumping out. The initial response assumes the worst outcome by flexing muscles, closing eyes, and perhaps even jumping up. A later assessment of the situation entails the recognition of the individual being a harmless friend, resulting in a sense of restored safety.

The IPTRA theory's distinction between various mechanisms through which expectations have the potential to elicit different emotions is joined by an additional theory, the prediction effect, which suggests that predictions that were confirmed to be accurate during the appraisal stage have the potential to induce pleasure. This statement is built upon the perceptual

fluency/attributional model (Bornstein & D'Agostino, 1994), which suggests that the ease of processing information from a stimulus is positively valenced and misattributed to the stimulus itself. Research findings on the relationship between familiarity and appreciation are further described here, to provide better context for the prediction effect.

3.2 The mere-exposure effect

In the 19th century, researchers observed that individuals tended to prefer objects they have seen before rather than unfamiliar ones. The theory was further developed by Zajonc (1968), who observed that novel stimuli first elicit a fear reaction and avoidance in all organisms, before familiarity gradually diffuses this first impression. Aptly named the mere-exposure effect, Zajonc's theory attempted to explain how individuals who were presented words, images, and geometric shapes, seemed to prefer the option that benefitted from the most amount of exposure.

(25)

Further experiments showed that the mere-exposure effect also applied to unborn chicken eggs (Rajecki, 1974). Two different tones were presented to two different sets of fertilized eggs. Upon birth, chicks consistently showed preference for the tone they were exposed to prior to birth. The distinction between conscious recognition and preference for stimuli was further examined by Zajonc and led to his claim that “Preference needs no inference”, suggesting that affective judgements may occur prior to recognition (Zajonc, 1980). Although the application of the mere-exposure effect may seem obvious in the case of marketing purposes, mixed results have been observed for the exposure to products and its positive effect on customer judgement (Brooks & Highhouse, 2006). Marketing research suggests that exposure breeds ambivalence. On one hand, a familiar brand can be judged positively by the consumer who either saw or bought a similar product. However, repeated exposure to ads of a same brand may be interpreted as a lack of reputation that must be compensated by strong visual presence.

Although empirical evidence for the mere-exposure effect provided mitigated results in commercial application, a meta-analysis of 208 studies from 1968 to 1989 revealed that the theory is robust and reliable (Bornstein, 1989). Important variables influencing the effect include stimulus complexity and delay between exposures, which is addressed later. Bornstein tested the mere-exposure effect with conscious recognition of stimuli, providing a more in-depth

interpretation of the cognitive processes involved in the preference for familiarity, although unconscious exposure had been proven to provide stronger results of the mere-exposure effect (Murphy, Monahan, & Zajonc, 1995). According to the perceptual fluency/attributional model (Bornstein & D'Agostino, 1992), repeated exposure to stimuli leads to easier fluency of

(26)

which listeners are able to recognize an object is judged as positive, although this impression is attributed to the object itself instead of the recognition process involved in its perception. As an example, finding oneself singing the lyrics of a familiar song playing on the radio may produce positive judgements about the song itself, although upon hearing it previously, it was not particularly appreciated. The step from “I know this song!” to “I like this song!” is argued to be mediated by the efficiency by which our explicit memory is able to make connections with previous, similar experiences.

Taken to an extreme, Bornstein and D'Agostino's standpoint would mean that positive judgement of stimuli increases as exposure does, although multiple later studies produced contradictory results. In a research investigating the varying level of appreciation for painting reproductions through the course of repeated exposures, Zajonc, Shaver, Tavris, and Van Kreveld (1972) observed an initial increase, followed by a decrease in positive judgements after only few exposures. Furthermore, the observed curve did not depend on initial judgements. Both highly and lowly rated paintings after the first exposure underwent a similar inverted-U shaped function. This curvilinear relationship has been studied by Berlyne (1970), who proposed a two-factor model in which the evaluation of stimuli varies according to its arousal potential. The arousal, in turn, partly depends on the individual's prior acquaintance to the stimulus, along with other variables, such as its perceived complexity. A novel stimulus, for example, initially

generates increasing interest through multiple exposures. This first part of the inverted U-shape curve is attributed to the gradual dissipation of an initial aversion to novel stimulus, shifting gradually into familiarity. After repeated exposure, the stimulus becomes too familiar, and its capacity to arouse diminishes, resulting in boredom, also known as satiation. This suggests that

(27)

stimulus producing moderate levels of arousal should in turn generate the most positive

judgements. The curve's apex is considered to bear the characteristics of being not too familiar, nor too complex (Szpunar, Schellenberg, & Pliner 2004).

Initially applied to general stimuli, the inverted-U shaped curve (or Wundt Curve) has also been studied in the context of music ( Bonnel, Gaudreau, & Peretz 1998; Brentar,

Neuendorf, & Armstrong, 1994; Hargreaves, 1984; Nunes et al., 2015). Szpunar et al. (2004) examined the effect of attention to the stimulus as a variable affecting the exposure effect applied to music appreciation. Separating participants into two groups, those from the first group were required to count the number of notes in an excerpt, producing undivided attention towards the auditory stimulus. By contrast, the second group was asked to listen intently to a story in one headphone while ignoring the same musical excerpts sent to the other headphone, thus producing low attention to the tones. Results showed that an inverted U-shape curve only occurred in the focused listening condition, whereas a distracted form of listening produced gradual increase in liking ratings through exposures, without any sign of a peak followed by a decrease. Researchers concluded that the lack of explicit memory observed in the incidental listening group may provide a good reason for the results. Without proper memorization and recall of the elements previously heard, satiation cannot occur. This supports the perceptual fluency/attributional model (Bornstein, 1989), which suggests that the mere-exposure effect has stronger results when explicit memory is least involved.

Another relevant finding from the study is the effect of the stimulus' nature. Three types of stimuli were used to represent varying levels of complexity. This variable is tested according to Bornstein's observation that the mere-exposure effect is heavily modulated by the complexity

(28)

of the stimulus perceived (Bornstein, 1989). The first recordings were multiple synthesizer-generated series of tones of equal duration and onset distance that were not associated to the Western major or minor scale system.

The second type of recordings were real-life orchestral excerpts. This condition was deemed closer to real-life experience of music, and was expected to produce the highest liking ratings through multiple exposures, due to its complexity. Indeed, although the Wundt curve was observed in all three conditions, the biggest increase and subsequent decrease in likeness was observed in this condition, suggesting that increasingly complex musical excerpts provoked an increasingly pronounced curvilinear relationship between exposure and appreciation. The third condition was designed to have intermediate ecological validity, as a middle-ground between conditions 1 and 2. To do so, recordings from the first condition were used, but played by multiple different instruments (cello, flute, horn, oboe, piano, or violin). whether the tones were from real acoustic instruments or from electronic sources (synthesizer, software) was not mentioned.

Szpunar et al. (2004) provide insight into the variables that affect how listeners'

judgement of music is affected by exposure. Nonetheless, some weaknesses deserve mentioning. Firstly, the different nature of all three musical excerpts underline the questionable validity of their comparison. From MIDI-produced note series to music recordings, the length of exposure time varied, which could have important effects on the results. Furthermore, to produce a distracted, or incidental listening condition, subjects were asked to focus on a spoken voice stimulus in one ear, while the musical stimulus was heard in the other, at a 50% lower amplitude. The music excerpt's lowered amplitude could have potentially acted as hindrance to the ability to

(29)

clearly perceive the tone series, especially within the context of dual, competing recordings sent to the opposite ears.

In more recent years, the relationship between exposure and musical appreciation has been revisited, showing that the inverted U-shape model is compatible with more contemporary interpretations. A review of 57 studies on the mere-exposure effect was done in 2017, suggesting 87.7% of the studies supported the model, the others representing either partial (8.8%) or

contradictory (3.5%) results (Chmiel & Schubert, 2017). Other researchers, instead of reviewing, used Berlyne's proposed variables involved in music appreciation, such as novelty/familiarity and complexity to present a larger, more inclusive model accounting for music appreciation (Hargreaves, 2012). Hargreaves proposed the reciprocal-feedback model, which includes social, cultural, musical, and psychological contexts as part of a multicomponent account of musical appreciation. This approach has the advantage of producing a more realistic account of how everyday judgements about music are made, but only vaguely explains the interaction between the multiple contextual variables presented.

Although most studies on the mere-exposure effect applied to music seem to use full songs and musical excerpts as in the case of studies mentioned above, liking as a function of repetitive features found within a song has been less studied. However, Nunes, Ordanini and Valsesia (2015) examined the impact of word repetition within songs as a predictor of

commercial and popular success. Following the principle that perceptual fluency is hedonically marked (Reber, Schwarz, & Winkielman, 2004), songs containing higher levels of lyrical repetition were expected to be perceived more fluently and thus, were more likely to be

(30)

most popular songs between 1958 and 2012, Nunes and colleagues made statistical comparisons of over two thousand songs, concluding that not only does higher word repetition within a song indicates its likelihood of accessing the #1 position on the Billboards chart, every addition of a chorus within a song augments its likelihood to become #1 on the singles chart by 14,5%. In light of the inverted U-shaped curve observed in the studies mentioned above, these numbers are likely to experience a ceiling effect, although the study only mentioned this without any

description of the average amount of choruses needed or lyrical repetition to observe this. Furthermore, a negative correlation was also established between lyrical repetitiveness and average time for a song to rise up to #1 on the charts. Although the Billboard charts is used to measure relative song success, ratings are not only defined by popularity. The amount of downloads and radio airtime are factors used to judge a song's position, reflecting only indirectly individual appreciation. Within these limitations, using the Billboard charts provides a historical, in-depth access to the most generally successful western pop songs since its creation in 1958. Nunes and colleagues (2015) showed that the mere-exposure effect can be observed on multiple levels. The effect of repeated exposure on appreciation not only applies to songs themselves, but also to a song's internal features, such as words and lyrical structure.

Contributions from neurological research provide supporting evidence to the inverted-U shape described previously. The reduction in neural response to familiar and predictable stimuli allows individuals to perceive change in the environment and has been studied in human and non-human animals (Grill-Spector, Henson, & Martin, 2006; Ponnath et al., 2013). Focusing on the frog auditory midbrain, Ponnath and colleagues demonstrated that the decrease in sensitivity to auditory stimuli was made at the localized phasic cell level. Stimulus-specific adaptation has

(31)

been studied under various conditions and involves multiple parts of the brain (Grill-Spector, Henson, & Martin, 2006). This supports Berlyne's suggestion (1974), that arousal varies as a function of exposure among other factors. Observed inhibited neuronal activity through

hemodynamic responses and fMRI scans provide physiological evidence for gradual decrease in arousal levels through exposure. This decrease can be interpreted as the second, descending part of the inverted-U shaped curve.

2.2 The Prediction Effect

Through a brief review of the main findings linking exposure and appreciation, the limited conditions in which the mere-exposure effect can be observed have been described. At first, exposure and liking where proposed to be only positively correlated (Zajonc, 1980), only to be further contradicted by the effects of over-exposure (Bornstein, 1989). Variables, such as attention levels (attentive or incidental listening) and stimulus complexity, were observed to influence the effect of exposure on liking (Szpunar et al., 2004). The inverted-U curve observed among a large number of studies proposes that the hypothesis is robust and reliable (Chmiel & Schubert, 2017). Finally, stimulus-specific adaptation, a neural phenomenon in which living organisms adapt to predictable, repeated stimuli to increase the ability to detect change in the environment, further demonstrates how the effects of increased familiarity decreases auditory sensitivity not only among humans, but also non-human species. The evidence for the limits of the positive effect of exposure on appreciation support the proposal that the perception of novel and unexpected stimuli is an equally valid topic to address within the context of music

(32)

prediction effect. As mentioned earlier, this is a reformulation of the perceptual

fluency/attributional model proposed by Bornstein and D'Agostino (1994). In Huron's view, the appreciation of more familiar stimuli does not depend directly on the frequency of its occurrence, but rather on the accuracy of its prediction. This prediction is rewarded, and further misattributed to the stimulus. This is included in the last section of the ITPRA theory, where the prediction response assesses prediction accuracy and either reinforces fulfilled predictions or punishes mistaken ones (Huron, 2007). The author provides different musical examples in which prediction accuracy provides a better answer than exposure for the potential to induce positive emotions in a musical context.

Firstly, pitch preference differs from pitch frequency of occurrence. In other words, the most reoccurring pitch in musical songs, the dominant pitch according to Aarden (2003), is not as preferred by listeners as the tonic. This distinction is attributed to the tonic's higher

predictability rather than its occurrence, according to the prediction effect (Huron, 2007). Furthermore, the “pleasantness” of the tonic differs depending on harmonic context. Used as a passing note, it sounds more unstable and unpredictable than in a tonal cadence. This suggests that the tonal context of a pitch may be more relevant than its repeated occurrence for preference judgments.

Second, assessing the effect of repeated exposure on response time is argued to provide evidence for the prediction effect. Studies have shown that repeated visual stimuli increases ease of processing and hence, lowers the response time of participants (Huron, 2007). However, such is not the case for pitch sequences, and rather has the opposite effect. Intervals diminish response time, whereas repeated notes increase it (Aarden, 2003). This is attributed to the statistical

(33)

observation that pitch repetitions are rarer than pitch interval sequences in Western music. Therefore, according to the author, the study results suggest that listeners use schematic

predictions of tonal hierarchy rather than repeated exposure to process pitch information. This is further supported by the idea that resolving cadences to unstable chords provide a sensation of pleasure even if they do not resolve to a stable chord. The relationship between tones and chords determine their “pleasantness” rather than their level of occurrence.

Thirdly, Huron suggests that the prediction effect provides a plausible explanation for the appreciation of rhythm. The ability to predict temporal regularities is argued to be positively valenced, from percussion to pitches to harmony. Events occurring on downbeats are deemed more pleasant since their moment in time is predictable. In this view, melody and rhythm respectively contain the what and when of musical predictions, as suggested by music theorist John Roeder (Huron, 2007). The displacement and suppressing of salient beats can be seen as manipulations of the expected time of events, much like manipulations in harmony as in the case of the deceptive cadence or an extended V-I cadence.

Huron uses the distinction between exposure-based and schematically-based predictions to support his theory. However, this produces some inconsistencies. Aside from very limited empirical evidence supporting claims such as listener preference for tonic pitches and pleasure from chords resolving to tonically unstable chords, the author's distinction between the two types of predictions may be an overstatement. As the process of learning tonal and metrical hierarchies is unconscious, exposure remains the basis from which these mental schemas may emerge. The ability to accurately predict the next pitch in a melody does not solely depend on how many times it has previously occurred, but on a larger series of complex factors including tonal

(34)

context, harmonic context, and preceding melody sequence. Therefore, as much as a tonic pitch is preferred to a dominant pitch even though it statistically occurs less often, this is only a limited interpretation of how the mere-exposure effect may be applied to music. For example, in the same context, the overwhelming presence of tonic chords compared to dominant ones, the high expectancy of a dominant cadence resolving to a tonic chord, and leading tones repeatedly followed by a tonic, may all be considered as statistical regularities that are rooted in exposure and which may affect preference. Hence, limiting the scope of the mere-exposure effect solely to the probability distribution of pitches in a melody is bound to fail in explaining pitch preferences, as other factors may also interact in the implicit learning of patterns and regularities in music

Moreover, some event predictions, with almost 100% accuracy, may never afford any type of emotional response. Predicting that the sun will rise tomorrow or that one's refrigerator door will always open are not met with satisfaction from the accuracy of the expectations themselves. As predictions vary in their potential for pleasure, evidence from research in the field of neuroscience may provide a more in-depth assessment of theories linking emotion and expectancy. Accordingly, this next section covers the neural processes involved in predictions, from different brain regions to their interaction via neurotransmitters, such as dopamine, and more importantly, how these mechanisms may explain the relationship between musical predictions and pleasure.

3. Predictions and the Brain

Through the past two decades, the development of neuroscientific research in conjunction with the increase in academic interest on the central role predictions have in directing behaviour,

(35)

(Bubic, Cramon, & Schubotz, 2010). As predictions can direct attention towards specific objects and events, they are argued to be not only a reacting mechanism, but also an acting one (Bubic et al, 2010). This ability to foresee hypothetical outcomes allows individuals to act according to the best possible outcomes and follow goals (Hommel, Müsseler, Aschersleben, & Prinz, 2001). Furthermore, by assessing an event outcome, associations are made between actions and results, generating in an increasingly accurate representation of the environment through what is called reinforced learning (Zacks, Kurby, Eisenberg, & Haroutunian 2011). As prediction errors appear to be implicated in the process of learning, understanding how future events are recognized, how their value is judged, and how their appropriate reactions are calculated, may explain certain behaviour, like music listening. In this section, different types of predictions are differentiated, before summarizing research establishing links between predictions and brain structures.

3.1 General Theory

An essential component of predictions is the characteristics of the predicted event itself, from which the evaluation of its potential consequences are inferred. More specifically, in an overview of theory and empirical research on predictions and the brain, Bubic et al. (2010) make a distinction between five features. The Domain refers to whether the perceived event is to be experienced through either sensory perception, or cognitively. This opposition can be considered as part of a continuum, a context in which music could be argued to contain both domains. Although its source remains auditory, implied structures, such as metrical and tonal hierarchy, suggest that more abstract cognitive processes may be used to infer future musical events. The Level makes a distinction between implicit and explicit calculations, both mentioned during

(36)

of a predicted event may be more or less tangible, depending on its specificity. For example, predicting that a car will come to a halt at a stop sign has two potential outcomes: either it stops or it does not. In contrast, losing one's reading glasses entails future consequences, without precise knowledge of the time of the next situation in which this might be problematic. The timescale refers to short and long-term predictions, that have been found to be processed in different brain regions. The premotor cortex directs short-term predictions, defined as predictions that are likely to require immediate motor activation, while the prefrontal cortex manages various lengths of temporal predictions including long-term predictions, defined as predictions that do not necessarily require immediate action (Schacter, Addis, & Buckner 2007). Finally, the prediction Type entails two different ways of inferring predictions from regularities. The estimated time of an event's occurrence may be inferred either from the number of times it has previously occurred, or its place in a known sequence of events. Although being distinguished, both types of predictions rely on creating associations between past experienced events.

The ability to simultaneously perform multiple predictions is a central component of prediction mechanisms. These may pertain to different events and therefore, varying levels of the aforementioned features. According to Pezzulo, Butz, and Castelfranchi (2008), expectations may be organized through hierarchal predictive systems, a concept that applies well to music. This closely resembles the IDyOM's computational approach to musical predictions, that contains a probabilistic distribution of over 10 statistical features found in melodies. These probabilities coalesce and may all influence predictions. Moreover, co-occurring predictions may come into conflict, as they may be related to different cognitive systems. In Ritter, Sussman, Deacon, Cowan, and Vaughan (1999), participants were presented a visual stimulus, followed by

(37)

a standard tone. In the unpredictable condition, the visual stimulus was identical on all trials, with some deviant tones replacing the standard tones (20% probability). Participants were required to respond to this deviation by pressing a key. In the predictable condition, every deviant tone was paired with a deviant image presented 450 ms earlier. In this way, the participants were warned by a visual stimulus prior to hearing the deviant tone. By making participants able to anticipate when a deviant tone was likely to occur, the researchers were able to observe how high and low-order cognitive systems created seemingly opposite predictions. Although they were aware of the time of occurrence and nature of the deviant tone, participants nonetheless showed brain activity in specific brain regions related to auditory expectation violation. Both predictions were measured through their respective related event-related potentials (ERP), with the mismatch negativity (MMN) component observed during the

presentation of a deviant tone. This indicates that both high and low-order cognitive systems can be prepared for seemingly opposite outcomes. The P3 component observed showed that

participants anticipated the deviant tones, while the MMN component was elicited as a reaction to violations of a perceived sequence pattern. The results suggest that violations of expectancies may occur independently of an individual's awareness of future musical events, a factor that should be considered when choosing music that is familiar to listeners as stimuli in neuroimaging experiments that focus on musical expectancy.

A simple approach to understanding the interaction of brain structures in predictive processing is by viewing predictions as a high-priority signal that increases efficiency of

processing in a given region (Rees & Frith, 1998). In other words, the biased competition theory of selective attention suggests that predictive processing affords anticipated events priority in

(38)

processing efficiency over other events of lower relevance (Beck & Kastner, 2009). Hence, predictions are considered as bias signals communicated from expectation-producing brain regions to other regions, activating the lowering of thresholds needed to direct attention, and an increase in signal-to-noise ratio allowing further ease of processing (Brunia, 1999). This

interpretation finds support in studies showing similar activation for the prediction and perception of somatosensory stimuli in the somatosensory cortex (Carlsson, Petrovic, Skare, Petersson, & Ingvar, 2000). Hence, top-down processing involves communication to brain regions involved in perception and their activation prior to incoming stimulus. With this in mind, predictions have the potential to involve a large amount of functional parts of the brain. For example, the rustling of leaves in a park at dusk may activate not only auditory and visual cortices, but also the amygdala, which is involved in fear responses and may in turn potentially activate the medulla oblongata, a part of the brain stem responsible for controlling the heart rate.

The different functional brain regions implied in predictions can be distinguished by the timescale of the prediction. Aside from the sensory cortices mentioned earlier, the thalamus, motor system, and prefrontal cortex play a particular role in short-term predictions (Bubic, 2010). The prefrontal cortex is also involved in longer-term predictions, in conjunction with the hippocampus and other brain regions having a functional role in imagination, such as the lateral parietal and temporal lobes (Schacter et al., 2007). Furthermore, the ventral striatum has been observed to play an important role in reward prediction, a topic which is addressed later when focusing specifically on neural evidence for musical predictions.

(39)

the neuroscientific study of predictions. By assessing the reward value of a prospected event, behaviour can adapt accordingly, whether prospected results appear to be enjoyable or

potentially fatal. Studies suggest that the orbitofrontal cortex (OFC), a region found in the frontal lobe, is responsible not only for cognitive processing and decision making, but also for

establishing the value of a future rewards, a crucial mechanism for associative learning

(Takahashi et al., 2011). The OFC produces a value signal that can be stored in working memory to motivate and guide behaviour towards obtaining these rewards through long-term planning. (Wallis, 2007). Using fMRI, Howard and Kahnt (2018) established further knowledge of how the OFC determines reward value and how it is modified and updated by dopaminergic midbrain neurons. Hungry participants performed a reinforced learning task that encouraged them to associate visual cues with food odours. These associations were subsequently violated by random pairing of visual and odour stimulus, and brain activity was measured. Results suggest that midbrain signals to the OFC were most heavily correlated with previously unpaired stimuli, and correlated with changes in reward-identity levels in the OFC. Although there are multiple brain regions implicated in prediction errors, such as in the lateral prefrontal cortex and the posterior parietal cortex, the current study provides a neural basis for reward evaluation and its real-time update by the midbrain. A notable observation of the experiment was the lack of observed activity in the striatum during the study, a critical part of the reward system that has been observed to be active during neuroimaging studies involving predicted rewards (Klein-Flugge, Hunt, Bach, Dolan, & Behrens, 2011). This is interpreted by the researchers as a distinction between the time and nature of the reward, a division in which the striatum may respond more to temporal predictions (when the event will occur) instead its nature, or the what

Referenties

GERELATEERDE DOCUMENTEN

‘[…] promoting the cultural diversity of Europe, intercultural dialogue and greater mutual understanding between European citizens; […] highlighting the common aspects

Second, we examined the effects of enhancement and suppression on mothers’ self-reported perception of the laugh sound, self- reported intended sensitive and insensitive

​De JGZ gaat twee pilots uitvoeren: een pilot waarbij alle ouders  van pasgeboren baby’s informatie over kinderrechten krijgen en een pilot met  kinderen van tien tot en met twaalf

Thus ‚ the purpose s of this study were to examine whether subjects scoring high on repre ssion or defensive ne ss (i) report a lower frequency and impact of

Hierbij wordt in dit onderzoek niet gekeken naar de intentie van de leidinggevende, maar naar het gewenste versus waargenomen leiderschap door zorgmedewerkers om op deze manier

In South Africa, the Triple Helix model is managed under the name of THRIP (Technology and Human Resource for Industry Programme), by the National Research

Does the tool accelerate the decision making process and is the quality of the decisions better using a GDSS.. Next, the Viability part of the Fit-Viability model

To find an answer to the question if the passage of the German Corporate Governance Code had an influence on the level of bank debt of firms the same explanatory