• No results found

What’s in a Sound? Trigger Specificity and Acoustics in Misophonia

N/A
N/A
Protected

Academic year: 2021

Share "What’s in a Sound? Trigger Specificity and Acoustics in Misophonia"

Copied!
39
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

What’s in a Sound? Trigger Specificity and Acoustics in

Misophonia

Iza Korsmit

Project Title: Misophonia: Extreme aversive responses to specific sounds EC: 42

Period: November 15, 2016 – July 21, 2017 Student Name: Iza Korsmit

Student Number: 6048994 Supervisor: Romke Rouw Co-Assessor: Marte Otten

Research Institute: Psychology, Brain and Cognition, UvA

Education: MSc in Brain and Cognitive Sciences, Cognitive Science, University of Amsterdam

(2)

Abstract

This study set out to find out more about the recently discovered condition misophonia, characterized by extreme negative emotional reactions to certain sounds (Jastreboff & Jastreboff, 2001), like apple eating, pen clicking, or lip smacking. Our goal was to contribute to the clinical picture of misophonia, test whether people with misophonia are specifically sensitive to trigger sounds only, or all sounds in general, and what the acoustic properties are of those trigger sounds. In Part 1 of this study, we developed a new questionnaire – Miso-PPD – to test the severity of misophonia. In Part 2, we found that misophonia was characterized by increased perceived unpleasantness in reaction to misophonia triggers, but not to normally aversive or neutral sounds. In Part 3 we found that some acoustic features clearly distinguished aversive sounds from neutral and misophonia sounds, as we expected. Importantly, we found that certain features that were highly present in aversive sounds, were very low in presence in misophonia sounds, and even lower than in neutral sounds. We suggest that the absence of certain sound characteristics makes other sound characteristics, that are unpleasant to people with misophonia, even more noticeable and unpleasant. Future research could test this suggestion by testing a larger and more diverse set of sounds, and further contribute to this line of research by testing different acoustic features. To conclude, we found that people with misophonia, based on self-report, are selectively sensitive to misophonia triggers only. Although there is likely a social and semantic role in causing a misophonic response, the absence of certain acoustic features is related to those trigger sounds.

(3)

Introduction Misophonia: A Clinical Description

Misophonia is literally translated from ancient Greek as ‘hatred of sound’ and is also known as ‘selective sound sensitivity’. It was first described by Jastreboff and Jastreboff (2001). Misophonia is characterized by negative emotional reactions and autonomic arousal in response to specific auditory sounds. This is not a fear reaction, which occurs in phonophobia, but rather a reaction of irritation, anger, or panic. In addition, reports indicate that the emotional response is accompanied by physical reactions like chest pain, tense muscles, or sweating (Cavanna, 2014). The most common triggers are sounds produced by other people, like eating, pen clicking, whistling, or lip smacking. In contrast, people with misophonia are often not triggered when they produce the sounds themselves, and report mimicry as a coping mechanism (Edelstein, Brang, Rouw, & Ramachandran, 2013). People with misophonia worry about losing control and show some obsessive-compulsive traits. However, although misophonia shows some similarities with other disorders (e.g., posttraumatic stress disorder, obsessive compulsive disorder, and antisocial personality disorder), none of these can fully explain its symptoms (Schröder, Vulink, & Denys, 2013). Misophonia can have large negative effects on daily functioning and social life, due to avoidance of possible triggers. People with misophonia are also often apprehensive to report their condition, because of a fear of stigma. The disorder is not yet recognized by the Diagnostic and Statistical Manual of Mental

Disorders (5th edition; DSM-5; American Psychiatric Association, 2013), and clinicians are

often unaware of its existence.

A recent functional and structural MRI study (Kumar et al., 2017) on the brain basis of misophonia found that people with misophonia elicited a greater BOLD response in the anterior insular cortex (AIC) than controls, when presented with a trigger. The AIC is the center of the salience network, which plays a big role in the perception of interoceptive signals and emotion processing. They also found a heightened connectivity between the AIC and emotion-related brain areas (vmPFC, PMC, hippocampus, and amygdala). In addition, they found that trigger sounds caused a heightened heart rate and galvanic skin response, mediated by AIC activity. Finally, a body-consciousness questionnaire showed that misophonia subjects perceived their bodies differently from control subjects. These results indicate that misophonia is a disorder where abnormal salience is attributed to specific sounds, with a heightened perception of the internal bodily states. The specificity and acoustic characteristics of the sounds that trigger this response are the subjects of this study.

(4)

Edelstein et al. (2013) describe that people with misophonia are not triggered by sounds they make themselves, and more specifically, they are mostly triggered by people close to them. Sounds made by children or animals will not evoke a misophonic response either. This suggests that it is not the sound itself, but the social and semantic associations that go along with it, that trigger misophonia. However, these results are based on self-report, which is a subjective measure. People with misophonia might not have reliable insights into the reason a sound triggers them. For example, people with misophonia might still be triggered by sounds produced by children, animals, or people close to them, but in a smaller degree. This suggests that there still could be a causal role in the sound qualities themselves, for example specific timbral or temporal characteristics. To provide a more objective insight into the relevance of the sound itself, a more careful study of the specific characteristics of misophonia triggers is necessary. It will help to define the disorder and point to the directions of future studies and interventions, e.g., by influencing the tolerance for certain timbral and temporal acoustic properties.

One can take many different approaches in analyzing characteristics of sounds. At the most basic level, one looks at the purely physical properties of the sound, without taking into account how humans process them. This would mean looking at frequency spectrums or average loudness (e.g., Johansson, Bergbom, Waye, Ryherd, & Lindahl, 2012). The physical properties would be a good starting point to get a picture of trigger sounds, but would not consider what humans can and cannot actually hear. For example, we might find that misophonia triggers have a certain frequency range in common, but this might not lie in the range of frequencies that humans are able to perceive. One could also analyze how the human cochlea processes sound, or how sound characteristics are represented in the auditory cortex (e.g., Arnal, Flinker, Kleinschmidt, Giraud, & Poeppel, 2015; Shammai, 2003). This would lead to more psychologically relevant findings, but would also require some assumptions to be met and parameters to be determined, in order to model human hearing and decide what is our region of interest. For example, to create a modulation power spectrum (MPS) of the triggers, we would need to determine the expected range of frequency and temporal modulations of our sounds. Only when determining a specific region, can one extract useful information from the MPS. We are not able to formulate such expectations about the misophonia triggers at this moment.

Instead, we will look at acoustic features that require very few assumptions, because acoustic qualities of misophonia triggers have, to the best of our knowledge, not been studied before. Therefore, we cannot formulate many assumptions beforehand (e.g., in what frequency

(5)

range will the unpleasant sound qualities of misophonia sounds fall?). We will look at ‘lower-level’ basic qualities of the frequency spectra of the triggers, and some ‘mid-‘lower-level’ psycho-acoustic features that to some extent model human hearing, but don’t require many assumptions. Most of the features we will use, come from the field of Music Information Retrieval (MIR; Downie, 2003). This field of study was developed in areas of computer and information sciences and therefore has a strong computational founding. Many of the acoustical qualities described in MIR are either on a very musical level (e.g., tonal mode recognition) or have been made psychologically uninterpretable due to its computation (e.g., MFCC’s model the non-linear tonotopic scaling of the human cochlea, and the logarithmic response of hair cells in the basilar membrane, but also contain features added to improve the computational performance in machine learning, with no consideration of their cognitive relevance; Aucouturier & Bigand, 2013). Nevertheless, there is still a large body of research in MIR that is relevant to the study of natural sounds, that are psychologically relevant and applicable to non-musical sounds. Of the nine features that we will use in this study, seven come from MIR, as will be described in our methods section. The first three features are distributional frequency descriptions to describe the sounds’ timbre; flatness, kurtosis, and skewness of the spectrogram. The next three features also describe timbre; attack time, event density, and fluctuation. The final three features specifically measure the unpleasantness of sounds; loudness, sharpness, and roughness.

The first three features are basic distributional features of the power-frequency spectrum. The power-frequency spectrum is obtained through a Fast Fourier Transform (FFT) and will show us, for the entire sound fragment, the amplitude of each frequency band. Attack time is a timbre measure that averages the time from the onset of a tone, until its amplitude peak. The longer this attack time, the more gradual a tone appears. Event density measures how many tones, or events, on average appear in a certain time frame. The higher the event density, the more ‘busy’ something sounds. Fluctuation is a rhythmic measure that shows, for each frequency Bark band, what is its rhythmic periodicity. Bark bands divide the frequency spectrum into bands where each band is of perceptual equidistance, i.e., taking into account how the human cochlea processes sound (Lartillot & Toiviainen, 2007).

Another method for studying sound characteristics comes from the research on sound quality, with features of loudness, sharpness, and roughness (Fastl, 2006). These features are of special interest, because they can be taken as measures of unpleasantness (Ellermeier, Zeitler, & Fastl, 2004), and thus might be related to the perceived unpleasantness in misophonia triggers. For example, normally aversive sounds should score high on one or all of these three

(6)

features, but misophonia triggers could also share these qualities, perhaps not as strongly as normally aversive sounds. Loudness is a measure of the perceived volume of a sound, expressed in sones. Sones take some perceptual features into account and estimate the perceived loudness of a sound. For example, the spectral broadening of the excitation within the cochlea causes masking effects, where higher frequencies are masked by lower frequencies. In addition, there is a temporal decay of perceived loudness. Consequently, depending on the frequency distribution and the temporal development of a sound, sounds that are objectively equally loud, can be perceived to be unequally loud. Sharpness is a tone color. High sharpness is characterized by a high relative presence of high frequencies. It contributes to the perceived ‘power’ of a sound, but too much sharpness will make a sound be perceived as aggressive. The final psychoacoustic feature is roughness, which concerns the temporal modulation characteristics of a sound. Roughness occurs at temporal modulations with a maximum frequency of 70 Hz. It arises as a result of masking, because in temporal modulations of sound there is a decay in psychoacoustic excitation in the hearing system. This phenomenon is otherwise known as ‘beating’ and is also related to the perception of harmonic dissonance. The rougher a sound, the more dissonant or coarse it sounds.

Current Study: Research Questions and Hypotheses

The aim of this study is to contribute to the clinical picture of misophonia and analyze the acoustic qualities of common misophonia triggers, and how they compare to normally aversive and neutral sounds. This study is executed in three parts. In Part 1, we further elaborate on the clinical picture of misophonia with an online questionnaire with questions about misophonia symptoms, demographics, and comorbidities. This questionnaire will also provide data on the severity of misophonic complaints for the participants we tested in the following experiments. In Part 2, we study the specificity of misophonic responses, by comparing the subjective evaluations of sounds from three different categories; misophonia triggers, normally aversive sounds, and neutral sounds. Normally aversive sounds have been studied more widely, and are well known sounds like nails on a chalkboard, metal grinding, or screaming (Kumar, Forster, Bailey, & Griffiths, 2008). In Part 3, we study what the acoustic differences are between misophonia, aversive, and neutral sounds and how this relates to the subjective unpleasantness of those sounds, as found in Part 2. To describe the acoustic qualities of sound, we will look at the features (described above) of misophonic, normally aversive, and neutral sounds.

There are three possible scenarios concerning the acoustic features of misophonia triggers. Firstly, if a misophonic response is merely elicited due to the social and semantic

(7)

associations of a sound, we would not expect the acoustic qualities of misophonia triggers to be different from neutral or aversive sounds. Only the unpleasantness ratings of misophonia triggers would then show a distinction between subjects with or without misophonia. This finding would be in concurrence with the self-report findings by Edelstein et al. (2013). Secondly, however, if misophonia is rather a condition of a heightened sensitivity to all sounds, we would expect the acoustical properties of misophonia triggers to be qualitatively (but perhaps not quantitatively) similar to the acoustical properties of aversive sounds, especially for the features of loudness, sharpness, and roughness, which are well-established unpleasantness measures (Ellermeier et al., 2004). In addition, we would then expect subjects with misophonia to rate all sounds as more unpleasant than control participants would. Thirdly, misophonia triggers may show a distinctive set of acoustic properties, distinguishing itself from both aversive and neutral sounds, that would indicate a heightened sensitivity to specific sounds, possibly interacting with the social and semantic connotations of those sounds. This should then also be apparent in the unpleasantness ratings, where only the misophonia triggers are rated more negatively by the subjects with misophonia (i.e., there is specific sensitivity, instead of sensitivity to all sounds). This latter finding would, at least in part, contradict the idea that people with misophonia are purely triggered by sounds due to their social and semantic connotations.

Part 1: Clinical Description of Misophonia

Experiment 1

This experiment was set up with two goals in mind; to contribute to the knowledge on the clinical picture of misophonia, and to obtain a quantification of the severity of

misophonic complaints, on which we can base participant selection for the following

experiments. Although a few questionnaires and scales to measure misophonia severity were already in existence, we felt that they were either difficult to interpret for the participants (MAS-1; Dozier, 2015), or did not encompass the full range of symptoms of misophonia (A-MISO-S; Schröder, Vulink, & Denys, 2013). We named our new questionnaire Misophonia: Psychology, Physiology, and Daily Life (Miso-PPD). The items as used in our questionnaire can be found in Appendix A.

This experiment also featured an ASMR questionnaire. ASMR (Autonomous sensory meridian response; Barratt & Davis, 2015) was found by a previous study (Rouw & Erfanian, 2017) to be related to misophonia, and thus we also aimed to further analyze this relationship.

(8)

This, however, was not the main goal of this study, and is not featured in later experiments. The ASMR questionnaire and its result are described in Appendix B.

Methods

Participants. On the first, very short questionnaire, we received 355 responses. Based on this first questionnaire we contacted participants that indicated they had misophonia. The second, longer, questionnaire was filled out by 232 participants, including control participants without misophonia. In this group, there were 40 males and the mean age was 27 (SD = 11). All the described experiments were approved by the ethics commission of the Psychology department of the University of Amsterdam.

Materials. The first, short, questionnaire only contained a few questions about demographics, contact information, short descriptions of misophonia and ASMR, and a question whether the participant experienced misophonia and/or ASMR.

The second questionnaire contained specific questions about misophonia and ASMR. We developed Miso-PPD partly based on existing scales (MAS-1; Dozier, 2015; A-MISO-S; Schröder et al., 2013) and partly formulated newly to encompass the full picture of symptoms and daily life consequences of misophonia. All items were scored on a 5-point Likert scale where 1 equaled “Strongly disagree” and 5 equaled “Strongly agree”. The items that were summated for the weighted misophonia score are annotated with an asterisk in Appendix A. All questionnaires were sent out with Qualtrics1.

Procedure. The two questionnaires were distributed via social media (the Dutch Misophonia support group on Facebook), via the newsletter of the Dutch association for misophonia, and the University of Amsterdam Psychology student recruitment website. Participants who indicated to have misophonia and/or ASMR on the first questionnaire, were sent the second questionnaire. Students at the recruitment website received the first and second questionnaire simultaneously.

Results

Gender differences. An independent t-test with gender as independent and weighted Miso-PPD score as dependent variable, showed no differences between men and women. The Kolmogorov-Smirnov test showed that distribution for the group of women was not normal. The non-parametric Mann-Whitney U test also showed no significant differences between men (M = 33.5, SD = 14.3) and women (M = 35.8, SD = 14.3; p = .35). In addition, although there were more women in our misophonia group (those with a weighted score above 40; 86%

1 qualtrics.com/

(9)

female), there were also more women in our control group (those with a weighted score below 30; 80% female), and thus overall, this resulted in a non-significant Pearson’s chi-squared test for differences in Miso-PPD score between men and women in the misophonia and control group.

Age differences. For the age differences we performed an independent t-test with age as dependent and participant group (misophonia or control) as independent variable. The Levene’s test for homogeneity was violated for the misophonia (weighted Miso-PPD score above 40) and control (weighted Miso-PPD score below 30) participants (independent), p < .001. Without equal variances assumed, an independent t-test showed significant differences in age between the misophonia and control group, t(175.5) = -4.9, p < .001. The misophonia group had a mean age of 31.4 years (SD = 12.2), the control group had a mean age of 23.8 years (SD = 8.9). There was also a significant correlation between the age of the participants and their weighted Miso-PPD score (r = .29, p < .001) – this also takes into account the participants who had a score between 30 and 40 and were thus not assigned to the misophonia or control group.

These finding may be due to the fact that a large number of our control participants were recruited through the psychology student database, and thus could have influenced the mean age of the control group. On the other hand, looking solely at participants assigned to the misophonia group, there was also a significant, but slightly smaller, correlation with age (r = .25, p < .05), and looking solely at participants assigned to the control group, there was a significant negative correlation with age (r = -.32, p < .001).

Comorbidities. The weighted Miso-PPD score is significantly correlated with having a psychological diagnosis in general (point-biserial r = .26, p < .001), having generalized anxiety disorder (GAD; point-biserial r = .13, p < .05), having obsessive compulsive disorder (OCD; point-biserial r = .21, p < .001), using medication (point-biserial r = .16, p < .05), and having hyperacusis (point-biserial r = .32, p < .001). There were no correlations between the weighted or total Miso-PPD scores and the scores on the ASMR questionnaire (Pearson’s rweighted = -.04; rtotal = .01).

Conclusion

Contributing to the clinical picture of misophonia, these results show that there were no differences between men and women in the severity or presence of misophonia, based on our newly developed Miso-PPD questionnaire. There was, however, a correlation of misophonia severity with age, suggesting that the older you are, the more severe the

(10)

misophonia symptoms. Finally, we found that misophonia was correlated with diagnoses of GAD, OCD, hyperacusis, and taking medications, but not with ASMR.

Part 2: Subjective Unpleasantness

Experiment 2

This experiment was part of an fMRI study that aimed to uncover the structural and connectivity brain differences between subjects with and without misophonia, and to study the functional brain activity in response to different types of stimuli. For detailed information about the fMRI study, see Appendix C. In this study, we look at the unpleasantness ratings that were given to the different stimuli during the scanning procedure. Our goal was to

determine whether subjects with misophonia are more sensitive than controls to all sounds, or specifically to misophonia triggers alone.

Methods

Participants. The participants were a selection of the participants from Experiment 1. They were selected based on their weighted Miso-PPD score. Those with a weighted score minimum of 40 and aged between 18 and 60 years old were asked to participate in our fMRI experiment. Control participants were mostly brought in via the misophonia participants (partners, relatives, or friends) or via the student recruitment website. The control participants all had a weighted maximum score of 30 and were also aged between 18 and 60 years old. Most misophonia participants were keen to participate, motivated by the wish to know more about misophonia. Respondents who did not want to participate in our fMRI study, were either too busy or did not feel comfortable with lying in the fMRI scanner and/or listening to possible triggers. This resulted in 45 participants, 24 with misophonia. All participants received either a financial compensation, or credits (for the Psychology students). The misophonia group had a mean weighted Miso-PPD score of 51 (SD = 6), mean age of 31 (SD = 11), and consisted of 20 females. The control group had a mean weighted Miso-PPD score of 18 (SD = 5), mean age of 29 (SD = 11), and consisted of 15 females.

Materials.

Stimuli. The input stimuli that were rated by our participants during the fMRI experiment were 6 videos with misophonia sounds, 6 videos with aversive sounds, and 6 videos with neutral sounds. The stimuli were presented on a Cambridge Research Systems BOLD screen (32 inch, 1920x1080, 120 Hz refresh rate). Audio was presented through MR Confon MkII+ headphones. The misophonia sounds consisted of someone eating an apple, eating a sandwich, eating chips, chewing gum, sniffling his nose, and clearing his throat. The videos

(11)

were all recorded in our own lab, in a silent room, with a Superlux E531/BCS microphone and Canon LEGRIA mini X camera. The normal aversive sounds consisted of nails on a chalkboard, a fork scratching a plate, screaming girls, squeaking Styrofoam, an alarm clock, and a fire alarm. These videos were not recorded by ourselves, but collected via YouTube. The selection of aversive sounds was based on the paper by Kumar, Forster, Bailey, and Griffiths (2008). The neutral sounds consisted of someone shaking a bottle, moving a zipper up and down, cutting a newspaper, ripping a newspaper, peeling an apple, and squeezing a wet towel. All these videos were also recorded by ourselves, in the same way as the misophonia stimuli. Some neutral sounds were selected by matching them with the misophonia sounds, based on a seeming similarity in rhythm or timbre (e.g., newspaper cutting/ripping vs. eating apple/chips) or shared associations (e.g., apple eating vs. apple peeling). All the videos were 12 seconds long and the videos recorded by ourselves all featured the same male actor. The sound levels of all videos were normalized to 80 dB peaks, to ensure that the sound was audible over the fMRI machinery sounds.

Ratings. After each video, the participant was asked “How did this sound make you feel?” Responses were given with a button box of the Current Designs Fiber Optic Response Pads (fORP) model (HHSC-2x4-C). The participant used two buttons with which they could give their answer on a scale from -5 (extremely negative) to 5 (extremely positive). With the left button (index finger) they could move left on the scale towards -5, with the right button they could move right on the scale, towards 5.

Exit interview. In the exit interview, we asked the participants which of the sounds they

heard evoked the most negative affect, if any sounds were not audible enough, and whether they had a misophonic response. They were asked about the perceived unpleasantness of the fMRI machinery sounds and the overall unpleasantness of the trigger sounds they heard in the fMRI scanner. In addition, we asked them more specific questions about their daily triggers and what they life was like with misophonia. The control participants were asked far fewer questions; about the unpleasantness and the audibility of the sounds during scanning.

Procedure. The experiment consisted of three blocks, with 18 trials (6 misophonia, 6 aversive, 6 neutral). During the blocks, EPI scans were made. After each block, there was a short break in which the scanner stopped and we could communicate with our participant. Each trial started with a short text indicating which video would be presented (e.g., eating an apple; 2 sec.). Then the 12 second video played. After this video, there was an empty screen for a jittered duration between 1 and 5 seconds (3x1 sec, 4x2sec, 4x3sec, 4x4sec, 3x5 sec). After this, the rating screen appeared and the participants were able to give their rating with the two

(12)

response buttons, moving the slider on the screen between -5 and 5. This rating screen lasted for 6 secs each time. The last screen of the trial was again empty, with a similarly jittered duration between 1 and 5 seconds. The stimuli were presented in two randomized orders, as to control for order effects.

Results

Gender differences. An independent t-test was performed to test differences in average ratings (dependent) between men and women (independent). However, the Kolmogorov-Smirnov test showed that the distribution of ratings for the group of men was not normal, which may be due to the low number of male subjects (N = 10). A non-parametric Mann-Whitney U test with gender as independent and average rating as dependent variable, showed that there were no significant differences between men and women (p = .14) in their average total ratings.

Order differences. Another independent t-test was performed to test the difference in average ratings (dependent) in presentation orders (A or B). However, the Kolmogorov-Smirnov test showed that the distribution of ratings for the second order of stimuli presentation (order B), was not normal, which also may be due to the low number of participants that followed this order (N = 9). The assumption of homogeneity was again violated. The results of the independent t-test with equal variances not assumed and the non-parametric Mann-Whitney U test both show no significant differences between the two orders of stimuli presentation (order A, M = -0.76, SD = 1.03; order B, M = -1.36, SD = 0.97; parametric, t(41) = -1.1, p = .30; non-parametric, p = .44).

Block differences. We tested whether ratings were different along the blocks with a repeated measures ANOVA, with average rating as dependent and block as independent variable. The assumption of sphericity was violated 2(2) = 24.72, p < .01. Therefore Greenhouse-Geiser corrected tests are reported ( = .70). The results show that the ratings were influenced by the block in which the stimuli appeared F(1.4, 61.2) = 12.09, p < .001. The average rating decreased as new blocks were presented (MB1 = -.72, SEB1 = .15; MB2 = -.93, SEB2 = .15; MB3 = -.98, SEB3 = .14).

Misophonia Correlation. Both the weighted (Pearson r = -.63, p < .001) and the total (Pearson r = -.63, p < .001) Miso-PPD scores were significantly correlated with the ratings by the misophonia participants of the misophonia stimuli. However, the weighted (Pearson r= -.06) and the total (Pearson r = -.08) Miso-PPD scores did not correlate with the neutral stimuli. The weighted (Pearson r = -.24) or total (Pearson r = -.24) Miso-PPD scores also did not correlate with the aversive stimuli.

(13)

Rating Differences. Finally, to answer our hypothesis about the rating interaction between the stimulus categories and the participant groups, we performed a factorial mixed ANOVA, with ratings as dependent variable, and participant group (misophonia, control) and stimulus category (neutral, aversive, misophonia) as independent variable. The assumption of sphericity was violated 2(2) = 20.13, p < .01. Therefore Greenhouse-Geiser corrected tests are reported ( = .72) for the repeated measures effect (stimulus category). The ANOVA shows a significant main effect of stimulus category on the ratings, F(1.3, 62.3) = 50.88, p < .001, and a main effect of the participant group, F(1, 43) = 12.03, p < .05. Most importantly, there was a significant interaction between stimulus category and participant group, F(1.4, 62.3) = 21.08,

p < .05, ηp2 = .20. Simple contrasts show that control and misophonia participants differed

significantly in their ratings in the misophonia stimulus category compared with the aversive stimulus category, F(1, 43) = 22.93, p < .001, ηp2 = .35, but not in the neutral stimulus category compared to the aversive stimulus category F(1, 43) = 0.09 , p = .77 (see Figure 1).

Figure 1. Mean ratings for control and misophonia participants on the three different stimulus categories, with 95% confidence interval.

Experiment 3

Although the stimuli were clearly audible to our participants in Experiment 2, we wanted to test the reliability of our findings by testing the same group of participants on the same set of

(14)

stimuli, but in a different environment. Therefore, we contacted our participant group from Experiment 2 and asked them to fill out an online questionnaire with the same stimuli. Although an online study also has its downsides due to little control over the participants’ environment, we theorized that reliable results should show similar rating trends in both Experiment 2 and 3.

Methods

Participants. Out of 45 participants, 31 filled out the questionnaire. These were 20 misophonia participants, with a mean age of 32 (SD = 11.1) and consisting of 16 females; and 11 control participants, with a mean age of 27 (SD = 9.8) and consisting of 9 females.

Materials. The sound stimuli were exactly the same as in the fMRI study. Yet, we had little control over the volume of the sounds that the participants would have on their laptops. We added a few demographic questions, and also three specific questions about the participants’ misophonia: whether they were triggered by anyone, or only people close to them; whether they were also triggered by sounds they made themselves; and whether they did their best to not produce their own trigger sounds themselves. A previous study found that people with misophonia don’t tend to be triggered when they produce their triggers themselves (Edelstein et al., 2013). However, our exit interview showed that some people are triggered by themselves, or do their utmost best to not produce their own triggers. Therefore, we decided to specifically ask for that in this questionnaire.

Procedure. This questionnaire was also sent out with Qualtrics. The participants from the fMRI study were contacted through e-mail and were asked to fill in the extra questionnaire. To be able to provide some consistency, we asked the participants to sit in a room with few or no distractions, listen to the sounds with their headphones on, and adjust the volume to ensure they could hear everything properly.

Analysis. To test the differences between the ratings inside the scanner with the ratings online, and a possible interaction with participant group (misophonia or control participants), we performed a factorial mixed ANOVA. Then, we performed another factorial mixed ANOVA with ratings as dependent, and participant group and stimulus category as independent variables.

Results

fMRI vs. Online Study. First, we tested the differences between the overall ratings from the fMRI and the online questionnaire that followed, and its interaction with the participant group, with a factorial mixed ANOVA. There was a significant main effect of the ratings from the fMRI vs. ratings online, F(1, 29) = 10.48, p < .05, ηp2 = .27, but no interaction

(15)

effect with participant group, F(1, 29) = 2.00, p = .17. Overall, the stimuli in the online study were rated more negatively than in the fMRI scanner (see Figure 2).

Online Rating Differences. Secondly, we tested if the ratings from the online questionnaire showed a similar pattern as the ratings from the fMRI study. A factorial mixed ANOVA tested difference in ratings (dependent) between stimulus category (independent) and participant group (independent). There was a significant interaction between stimulus category and participant group, F(2, 58) = 26.92, p < .001, ηp2 = .47. Contrasts show that aversive compared to neutral stimuli were rated similarly, F(1, 29) = 2.05, p = .16, but that misophonia compared to neutral stimuli were not rated similarly between misophonia and control participants, F(1, 29) = 29.29, p < .001, ηp2 = .50 (see Figure 3).

Added Misophonia Questions. These questions were asked to the misophonia participants in this experiment, and the misophonia participants from Experiment 4. We calculated correlations between the three items and the misophonia score, and found that only the item “I will be triggered, regardless of how well I know the person producing my trigger” showed a significant correlation with the weighted Miso-PPD score, r .38, p < .05. The distribution of the items in figures 4, 5, and 6, shows that overall, the misophonia participants try to avoid producing their triggers themselves, and that they will be triggered regardless of how well they know the person producing their trigger. However, for the question of whether participants were also triggered by themselves, responses were more divided and geared towards “Completely disagree”.

Experiment 4

After testing the reliability of the unpleasantness rating trends in different environments (fMRI vs. online), we wanted to test the reliability of these trends with a different set of sound stimuli. In the Experiment 2 and 3, our misophonia and neutral sound stimuli were recorded in our own lab, but the aversive stimuli were obtained from YouTube. This may be the source of a confound, causing the aversive stimuli to be different from the other two categories because they came from YouTube. The stimuli from YouTube overall contained more environmental sound not relevant to the sound stimulus itself (e.g., the fire alarm stimulus was accompanied by people talking). For this experiment, a completely new set of sounds was obtained from YouTube. Although recording all of our own stimuli would have allowed us more control over the sound quality, the videos from YouTube are more

(16)

Figure 2. Mean overall ratings from the fMRI study and the online questionnaire, for the control and misophonia participants, with 95% confidence interval.

Figure 3. Mean ratings for control and misophonia participants on the three different stimulus categories, in the online questionnaire, with 95% confidence interval.

(17)

Figure 4. Frequency of responses on the item “I always try to not produce my triggers myself”.

Figure 5. Frequency of responses on the item “I will be triggered,

regardless of how well I know the person producing my trigger”.

Figure 6. Frequency of response on the item “I will also be triggered if I produce my trigger myself”.

(18)

Methods

Participants. We targeted a new group of participants with a new set of stimuli. The questionnaire was filled out by 33 people. These participants were recruited the misophonia support group on Facebook, so might contain some of the same participants as in the previous experiments. However, due to the continuously growing number of members (1473 members on July 5th, 2017), we expect that there is a sufficient proportion of new participants in our sample. Of these 33, 19 had a weighted misophonia score above 40 and were classified by us to have misophonia (mean score of 51, SD = 4.9) and 9 were classified as having no misophonia, because they had a weighted misophonia score below 30 (mean score of 22, SD = 6.1). The rest of the 33 participants had scores between 30 and 40 and were not taken into further analysis (N = 5). Misophonia participants were on average 39 years old (SD = 13.5), consisting of 18 females; control participants were 32 years old on average (SD = 13.6), consisting of 7 females.

Materials.

Stimuli. The sound stimuli for this experiment were all collected from YouTube, but were of the exact same categories as in the fMRI study (eating an apple, eating a sandwich, chewing gum, etc.), except for nose sniffling and fork on plate. We were not able to find those sounds on YouTube, but replaced them with similarly sounding nose breathing and metal grinding sounds, respectively. All videos were cut to 12 seconds. The volume of the sounds was not normalized to an 80-dB level, as was done with the previous stimuli. We found with our previous set of stimuli that this normalization resulted in a high volume of white noise for the stimuli that had to be amplified. This, in return, influenced our feature measures greatly (e.g., an increase in loudness, roughness, and sharpness). The volume of a few of our new stimuli was attenuated, however, because they were unrealistically and/or painfully loud (cutting paper, screaming, and metal grind). This was done with Audacity2. A calibration tone was added to the stimuli set, to have more influence on the volume level on each participants’ computer device. This was a simple sinusoid tone obtained from YouTube. As a final control measure, we asked the participants at the end of the questionnaire whether they had listened to the sound on their headphones or not.

Questionnaire. The participants were given the Miso-PPD (see Experiment 1), with three questions added. These were the same questions that were added in Experiment 3, about

2 audacityteam.org/

(19)

triggers produced by other people and themselves. We, again, asked them some demographic questions and questions about their misophonia specifically.

Ratings. Each sound stimulus was rated the same as in the previous study, going from

-5 (extremely negative) to 5 (extremely positive).

Procedure. After some demographic questions and an explanation of misophonia, participants were presented with the misophonia questionnaire. After these questions, the participants were presented with the calibration tone. Participants were asked to adjust the volume on their computer to the point where the sound was loud and clear, but not painful. Then, the participants were presented with a block of the 18 videos, in randomized order, with a rating question after each video. In a second block, the same sound stimuli were presented, without video but with titles of what the sounds were, again in randomized order. The order of seeing videos and then sounds, or vice versa, was counterbalanced by switching the order of presentation when we reached half of our expected number of participants.

After rating the videos and sounds, the participants who indicated having misophonia, were presented with more detailed questions about their misophonia.

Results

Order Differences. To test whether the two orders (first only audio and then video, or first video and then audio; independent) resulted in different rating patterns (dependent) and if this interacted with participant group (misophonia or control; independent), we performed a factorial mixed ANOVA. There were no significant differences between the orders overall, F(1, 23) = 1.16 , p = .29, and there was also no interaction between order and participant group, F(1, 23) = .033, p = .86. Therefore, for the further analysis we averaged over the sound and video ratings, resulting in single average ratings for each stimulus.

Rating Differences. With these averaged rating data, we performed a factorial mixed ANOVA to test whether ratings (dependent) showed differences for the participant groups (independent) and/or the stimulus category (independent). The assumption of sphericity was violated 2(2) = 6.61, p < .05, for the stimulus variable. Therefore Greenhouse-Geiser

corrected tests are reported ( = .81) for the repeated measures effect (stimulus category). There was a significant interaction between stimulus category and participant group, F(1.6, 42.0) = 4.53, p < .05 ηp2 = .15. Contrasts show that the interaction was significant comparing the misophonia and aversive stimuli, F(1, 26) = 7.72, p < .05, ηp2 = .23, but not comparing aversive with neutral stimuli, F(1, 26) = .61, p = .44 (see Figure 7).

(20)

Figure 7. Mean ratings for control and misophonia participants on the three different stimulus categories, with the new stimuli, with 95% confidence interval.

Conclusion

In Experiment 2 of this study, we found no gender differences between the

unpleasantness ratings of the sounds. We did find a block effect, suggesting that every time someone is presented with the same trigger again, it will be rated more negatively. This finding extended to Experiment 3, where we found that the same stimuli were also rated more negatively than in Experiment 2. This however, could also be caused by the new

environmental setup. It is arguably easier to hear the unpleasant sounds at home, than in the fMRI scanner, and thus would be rated more negatively.

Most importantly, in all three experiments (2, 3, and 4), we found similar rating patterns for the different stimulus categories. Whereas misophonia and control participants rated similarly on the aversive and neutral stimuli, it was only on the misophonia stimuli that the misophonia participants gave more negative ratings than the control participants. This suggests that people with misophonia are selectively sensitive to misophonia triggers only, and not to all sounds in general, based on subjective unpleasantness ratings.

Finally, with our extra questionnaire items we found that, based on self-report, people with misophonia are not triggered by their own sounds, but do try to avoid producing their own trigger sounds. They are also triggered by anyone, regardless of how well they know them. This final finding was also correlated with the severity of misophonia. Trigger sounds

(21)

produced by people with misophonia themselves, does not lead to a misophonia response, and thus suggests that it is not solely the sound of the trigger that causes a response.

Part 3: Acoustic Characteristics

Experiment 5

In our final experiment, we look at the acoustic characteristics of all the different sound stimuli. We used the set of nine features described in our introduction. The flatness, kurtosis, and skewness of the frequency spectrogram, to describe its distribution; the attack time, event density, and fluctuation of the stimuli to describe the timbre of the sounds; and finally, the loudness, sharpness, and roughness of the stimuli to measure unpleasantness. All, but the loudness and sharpness, originated in MIR. The goals of this experiment were

threefold. Our first goal was to test the acoustical differences between the three stimulus categories (misophonia/aversive/neutral). Our second goal was to study the differences between the first (Experiment 2 and 3) and the second (Experiment 4) set of stimuli, so we could test whether the specific sound stimulus or the category of the sound was more relevant (e.g., do the two eating apple sounds score similar on the features?). Our third and final goal was to relate these acoustical characteristics to the ratings obtained in Experiment 2, 3, and 4. Methods

Participants. To correlate the sound characteristics of the stimuli with their

respective ratings, we used the rating data from all previous experiments. This resulted in a total participant group of 109 (Nexp2 = 45; Nexp3 = 31; Nexp4 = 33). This consisted of 68

participants with misophonia, with a mean age of 34 (SD = 11.8), and 44 females. There were 41 control participants, with a mean age of 29 (SD = 11.2), and 29 females.

Materials.

Stimuli. We used the stimuli from Experiment 2, 3 and 4 for our acoustical analysis. These were 36 stimuli; 12 misophonia (6 self-recorded [set 1], 6 from YouTube [set 2]), 12 aversive (from YouTube), and 6 neutral (6 self-recorded [set 1], 6 from YouTube [set 2]). We used the original sounds that were not normalized to 80 dB. We found that the amplification of sounds to 80 dB created a lot of white noise in some of our stimuli, which would confound the acoustical analysis.

Software. We used MATLAB code created by Jan van Balen as posted on Github, to extract measures of loudness, sharpness, and roughness3. This code made use of the

(22)

MIRtoolbox (Lartillot, Toiviainen, & Eerola, 2008) and the Auditory toolbox (Slaney, 1998). The remaining features were also extracted with the MIRtoolbox. All statistical analyses were performed in MATLAB.

Procedure. The spectrograms of the stimuli were computed with the mirspectrum function of the MIRtoolbox, which performs a fast Fourier transform on the audio files. With this spectrum, we measured the kurtosis (function mirkurtosis), skewness (function mirskewness), and flatness (function mirflatness). To measure attack time, event density, and fluctuation, we used the mirattacktime, mireventdensity, and mirfluctuation functions, respectively. Loudness and sharpness were computed with the barkfeatures function, with sharpness, described by Peeters in the Cuidado project (Peeters, 2004), as the specific loudness of Bark bands. Loudness is the total loudness in sones. Roughness is computed with the mirroughness function from the MIRtoolbox, which computes the peaks of the frequency spectrum and takes the average of all dissonance between all possible pairs of peaks.

Analysis. For each acoustic feature, we performed three one-way ANOVA’s, to measure the effect of stimulus category on the feature; once with all 36 stimuli, to test the overall effect; twice separately with the stimuli from Experiment 2 and 3 (Set 1), and with the stimuli from Experiment 4 (Set 2), to test whether there were any differences between the sound sets. With these final two tests, we measure whether the effect is dependent on the specific sound example we chose (e.g., apple eating from Set 1 or apple eating from Set 2), or can be attributed to the characteristics of that trigger in general (i.e., independent of which sound example you choose). Finally, we analyzed the correlations between the ratings and feature values of the 36 stimuli, with Pearson correlation.

Acoustic Analysis

Loudness. For all 36 sound stimuli, the Levene’s test of homogeneity was significant (p < .05). Thus, in the case of significant ANOVA effects, contrasts without equal variances assumed are reported. There was a significant effect of stimulus category on the loudness of the stimulus, F(2,33) = 35.21, p < .001, η2 =.68. Planned contrasts, without equal variances

assumed, show that aversive stimuli had higher loudness than misophonia and neutral stimuli, t(12.42) = -6.18, p < .001. In addition, misophonia stimuli had lower loudness than neutral stimuli, t(21.08) = 4.33, p < .001. See Figure 8 for a visualization of the ordered categorization for all stimuli separately.

Looking at Set 1, Levene’s test of homogeneity was violated (p < .05). There was a significant effect of stimulus category on the loudness of the sounds from Set 1, F(2, 15) = 20.30, p < .001, η2 = .73. Planned contrasts, without equal variances assumed, show that

(23)

aversive sounds had higher loudness than both misophonia and neutral sounds, t(5.5) = -4.50, p < .05, and misophonia sounds had lower loudness than neutral sounds, t(8.3) = 4.67, p < .05. For Set 2, Levene’s test of homogeneity was also violated (p < .05). There was also a significant effect of stimulus category on loudness, F(2, 15) = 13.68, p < .001, η2 = .64. Planned

contrasts, with no equal variances assumed, show that aversive sounds scored higher on loudness than misophonia and aversive, t(5.7) = -4.0, p < .05, but in this case misophonia and neutral were not significantly different from each other, t(9.6) = 2.0, p = .08.

Figure 8. Loudness ordered from smallest (left) to largest (right) for each stimulus, with misophonia stimuli coloured blue, aversive coloured red, and neutral coloured green.

Roughness. For the analysis with all 36 stimuli, the Levene’s test of homogeneity was significant, p < .05, and thus the assumption was violated. There was a significant effect of stimulus category on the roughness of the stimuli, F(2,33) = 17.34, p < .001, η2 = .51.

Contrasts show that aversive stimuli were significantly rougher than misophonia and neutral stimuli, t(11.51) = -4.27, p < .05. Misophonia and neutral stimuli were not significantly different from each other, t(17.07) = 1.98, p = .07. See figure 9 for an ordered categorization per stimulus.

(24)

For Set 1, Levene’s test of homogeneity was significant (p < .05). There was a

significant effect of stimulus category on roughness of the sound stimuli, F(2, 15) = 8.22, p < .05, η2 = .52. Planned contrasts show that aversive stimuli scored higher on roughness than

misophonia and neutral stimuli, t(5.4) = -3.0, p < .05, but misophonia and neutral stimuli did not score significantly different from each other, t(6.3) = 1.55, p = .17.

For Set 2, Levene’s test of homogeneity was also significant (p < .001). There was a significant effect of stimulus category on roughness of the sound stimuli, F(2, 15) = 7.91, p < .05, η2 = .52. Planned contrasts show that aversive stimuli scored higher on roughness than

misophonia and neutral stimuli, t(5.1) = -2.85, p < .05, but that misophonia and neutral stimuli did not score differently, t(10.0) = 1.16, p = .27.

Figure 9. Roughness ordered from smallest (left) to largest (right) for each stimulus, with misophonia stimuli coloured blue, aversive coloured red, and neutral coloured green.

Sharpness. With all 36 stimuli, there was a, marginally, significant effect of stimulus category on the sharpness of the stimuli, F(2,33) = 3.33, p = .05, η2 = .17. Contrasts show that

aversive stimuli were sharper than both misophonia and neutral stimuli, t(33) = -2.58, p < .05. However, misophonia and neutral stimuli were not significantly different, t(33) = -0.12, p = .91. See figure 10 for a categorization per stimulus.

(25)

For Set 1, there was a significant effect of stimulus category on the sharpness of the sound stimuli, F(2, 15) = 5.9, p < .05, η2 = .43. Planned contrasts show that aversive stimuli

scored higher on sharpness than misophonia and neutral stimuli, t(15) = -2.93, p < .05, but misophonia and neutral stimuli were not different from each other, t(15) = 1.79, p = .09.

For Set 2, there was no significant effect of stimulus category on sharpness of the stimuli, F(2, 15) = 1.79, p = .20.

Figure 10 shows that the values of sharpness are close together for all stimuli. In addition, the figure shows some clear differences between Set 1 and 2. For example, Eating Sandwich from Set 1 is in the lower half of scores, whereas Eating Sandwich from Set 2 is in the higher half. This discrepancy is also found in other stimuli pairs. This suggests that the sharpness of the sound is more dependent on the specific audio clip we chose, instead of the trigger it represents.

Figure 10. Sharpness ordered from smallest (left) to largest (right) for each stimulus, with misophonia stimuli coloured blue, aversive coloured red, and neutral coloured green.

Spectrogram Flatness. Looking at all 36 stimuli, the Levene’s test of homogeneity was significant (p < .05), and thus the assumption of homogeneity was violated. There was

(26)

no significant effect of stimulus category on the spectrogram flatness, F(2,33) = 0.06, p = .95. There was also no significant effect of stimulus category on the spectrogram flatness of Set 1, F(2, 15) = 0.56, p = .58.

There was, however, a significant effect of stimulus category on the spectrogram flatness of Set 2, F(2, 15) = 7.0, p < .05, η2 = .48. Planned contrasts show that the aversive

stimuli are significantly lower than both the misophonia and neutral stimuli, t(15) = 3.73, p < .05, but the misophonia and neutral stimuli are not significantly different from each other, t(15) = 0.28, p = 79.

Figure 11 shows a visualisation of all the sound stimuli ordered and categorized by colour. This figure also shows that the same sounds, but from different sets, can be on

separate ends of the spectrum. Nails on chalkboard from Set 1 are on the higher end, whereas nails on chalkboard from Set 2 are on the lower end of the flatness spectrum. The same can be seen for, e.g., apple peeling, fire alarm, and eating chips. In other words, the flatness of the spectrogram seems to depend more so on the specific audio file we chose, instead of the sound it represents.

Figure 11. Spectrogram Flatness ordered from smallest (left) to largest (right) for each stimulus, with misophonia stimuli coloured blue, aversive coloured red, and neutral coloured green.

(27)

Event Density. There was a significant effect of stimulus category on the measure of event density, F(2,33) = 14.54, p < .001, η2 = .47. Contrasts show that the aversive stimuli

had higher event density than both misophonia and neutral stimuli, t(33) = -4.83, p < .001. In addition, misophonia stimuli had lower event density than neutral stimuli, t(33) = 2.41, p < .05, r = .39.

There was also a significant effect of stimulus category on the event density of Set 1,

F(2, 15) = 6.27, p < .05, η2 = .84. Planned contrasts show that event density was higher for

aversive stimuli than both misophonia and neutral stimuli, t(15) = -3.33, p < .05, but that there were no significant differences between misophonia and neutral stimuli, t(15) = 1.22, p = .24.

For Set 2, there was also a significant effect of stimulus category on event density,

F(2, 15) = 10.16, p < .05, η2 = .58. Planned contrasts show that event density was higher for

aversive stimuli than for both misophonia and neutral stimuli, t(15) = -3.72, p < .05, and lower for misophonia than for neutral stimuli, t(15) = 2.55, p < .05.

See Figure 12 for a visualization of how event density separates each category from each other. The figure shows quite a clear distinction between the three stimulus categories. The two misophonia sounds from Set 1 (Eating Sandwich and Throat) which are on the higher end, and the neutral sound from Set 1 (Zipper) on the lower end of the spectrum, may have resulted in an absence of contrast between the misophonia and neutral stimuli in the one-way ANOVA from Set 1. With the ANOVA’s and the figure below, we are more inclined to say that indeed misophonia stimuli also are lower in event density than neutral stimuli. Spectrogram Kurtosis. For all 36 stimuli, the Levene’s test of homogeneity was significant (p < .05), and thus, the assumption of homogeneity was violated. There was no significant effect of stimulus category on the spectrogram kurtosis, F(2,33) = 1.93, p = .16. Levene’s test for homogeneity was also violated for Set 1 (p < .05). There was no significant effect of stimulus category on the spectrogram kurtosis of Set 1, F(2, 15) = 1.02, p = .38. There was also no significant effect of stimulus category on the spectrogram kurtosis of Set 2, F(2, 15) = 0.99, p = .40

Spectrogram Skewness. With the full dataset, the Levene’s test of homogeneity was significant (p < .05). There was no significant effect of stimulus category on the spectrogram skewness, F(2,33) = 0.52, p = .60. The Levene’s test of homogeneity was also violated for Set 1 (p < .05) and there was no significant effect of stimulus category on spectrogram skewness of Set 1, F(2, 15) = 0.36, p = .71. Similarly, there was also no effect of stimulus category on spectrogram skewness of Set 2, F(2, 15) = 0.28, p = .76.

(28)

Figure 12. Event density ordered from smallest (left) to largest (right) for each stimulus, with misophonia stimuli coloured blue, aversive coloured red, and neutral coloured green.

Attack Time. For the analysis of all sound stimuli, there was no significant effect of stimulus category of the timbre measure of attack time, F(2,33) = 2.34, p = .11. There was also no significant effect of stimulus category on attack time of Set 1, F(2, 15) = 1.63, p = .23. In addition, there was no significant effect of stimulus category on attack time of Set 2, F(2, 15) = 1.20, p = .33.

Fluctuation. Finally, with all sound stimuli, there was no significant effect of stimulus category on the measure of fluctuation, F(2,33) = 0.50, p = .61. For Set 1, Levene’s test of homogeneity was violated (p < .05), and there was no significant effect of stimulus category on fluctuation of the sounds in Set 1, F(2, 15) = 0.69, p = .52. There was also no significant effect of stimulus category on fluctuation of the sounds in Set 2, F(2, 15) = 0.83, p = .46.

Correlations.

Aversive stimuli. See Table 1 for an overview of all the correlations of the features with the ratings. There are several significant positive correlations of the unpleasantness ratings with acoustic features. Note, however, that our rating scale went from negative (-5) to positive (5), and thus that a positive correlation indicates a relation to more positive ratings. For example, looking at the loudness and roughness features, the positive correlation indicates that the louder

(29)

and rougher the stimulus, the more positive the rating of the (aversive) stimulus. This is very unexpected, since loudness and roughness are indicators of unpleasantness.

The positive correlations of kurtosis and skewness are less surprising, because they have not been previously associated with unpleasant sounds, or with any of our ANOVA findings.

Misophonia and Neutral Stimuli. For the other two stimulus categories, there were no significant correlations of the ratings with the acoustic features.

Table 1.

Correlation matrix of all the features with the ratings, split on misophonia/control/all subjects and aversive/misophonia/neutral stimuli.

Aversive Stimuli Misophonia Stimuli Neutral Stimuli

Feature Miso Con All Miso Con All Miso Con All

Loudness .76** .47 .67* .35 .01 .14 -.05 .03 .00 Roughness .79** .55 .71** .43 .22 .36 -.51 -.23 -.49 Sharpness .55 .50 .55 .17 -.02 .07 -.37 .15 -.21 Flatness -.30 .01 -.17 .07 .38 .29 .07 -.02 .08 Event Density .15 .19 .18 -.01 .14 .06 -.35 -.16 -.34 Kurtosis .59* .66* .63* -.26 .04 -.11 .14 -.08 .09 Skewness .60* .63* .63* -.18 .11 -.03 .28 -.06 .21 Attack Time .15 -.07 .06 .17 .18 .19 .38 .09 .33 Fluctuation -.40 -.22 -.33 .25 .23 .33 -.49 .24 -.44

Significant correlations in bold.

* Correlation is significant at .05 level (two-tailed). ** Correlation is significant at .01 level (two-tailed).

Acoustic Features. See Table 2 for the correlations of the features with each other, taken over all sound stimuli. Results show that loudness, roughness, sharpness, and event density were all strongly correlated with each other. Skewness and kurtosis of the spectrogram, which are both measures of the spectrogram distribution, were also correlated with each other. Finally, fluctuation of the sound stimulus was correlated with sharpness and spectrogram flatness.

(30)

Table 2.

Correlation matrix of the four features that showed significant ANOVA results.

L oudne ss Roughne ss S ha rpne ss F la tne ss E ve nt D ens ity K urt os is S ke w ne ss A tta ck T im e F luc tua tion Loudness 1 .88** .46** -.11 .68** .29 .22 -.21 .03 Roughness 1 .59** -.19 .65** .30 .18 -.13 .13 Sharpness 1 .05 .49** .21 .07 -.13 .37* Flatness 1 .17 -.10 -.10 -.17 .35* Event Density 1 .10 -.01 -.20 .32 Kurtosis 1 .94** -.06 -.15 Skewness 1 -.02 -.19 Attack Time 1 -.13 Fluctuation 1

Significant correlations in bold.

* Correlation is significant at .05 level (two-tailed). ** Correlation is significant at .01 level (two-tailed).

Conclusion

Roughness showed the most reliable results of all the features. Aversive stimuli were the roughest, misophonia and similarly the least rough, consistent over both stimuli sets.

Loudness also showed reliable results, with aversive stimuli as the loudest category. In the case of all stimuli or Set 1 only, neutral stimuli were also louder than misophonia stimuli, but in the case of Set 2 neutral and misophonia stimuli were equally loud. However, we suggest that this latter finding is due to a few outliers in Set 2, and that misophonia triggers appear to also be less perceptually loud than neutral sounds.

Sharpness, which was expected to be related to perceived unpleasantness, did not show very reliable results. We found small effect sizes for all stimuli and Set 1, with aversive stimuli as the sharpest, and misophonia and neutral equally least sharp. For Set 2, however, we found no effect. Figure 10 also showed that the individual stimuli were not clearly categorized in the three stimulus categories.

(31)

Spectrogram flatness showed an effect for Set 2, but not for Set 1 or all stimuli. We argue that the findings from Set 2 are due to the specific audio files chosen, and not due to the acoustic qualities of the sounds the stimuli ought to represent.

Event density showed more reliable results, with aversive stimuli as the densest and misophonia stimuli as the least dense, looking at all stimuli or Set 1. For Set 2 we also showed an effect of stimulus category, but no differences between misophonia and neutral stimuli. Looking at the individual stimuli, however, we do argue that misophonia are less dense than neutral stimuli. Event density was also correlated with roughness, sharpness, and loudness. This indicates that event density can also be seen as an unpleasantness measure. The lack of event density in the misophonia stimuli ask for further examination. It may be due to the type of neutral and misophonia stimuli we selected. The finding may also indicate that misophonia sounds contain less sound events, and therefore make it easier to focus on the specific trigger sound at hand. One would, however, then also expect a correlation with the ratings on the misophonia stimuli.

The results of the correlations of the acoustic features with the unpleasantness ratings, at first seem quite conflicting. One explanation for the positive correlations of the ratings with the features that are generally thought to represent unpleasantness (loudness and roughness), is that our measurement was not reliable enough. The subjective rating scale is sufficient to find differences between stimulus categories, but perhaps not between individual stimuli. It may be easier, for someone with misophonia, to indicate on a scale that a misophonia sound is more unpleasant than a normal aversive sound, than it is to indicate whether one aversive sound is more unpleasant than the other one. The very low number of significant correlations also suggest that these results were not very consistent.

Discussion

This study had three aims. The first one was to contribute to the clinical picture of misophonia. The second aim was to analyse whether people with misophonia are sensitive to sounds overall, or whether they are sensitive only to particular sounds, as the misophonia synonym ‘selective sound sensitivity’ implies. Finally, the third aim was to study if the acoustical characteristics are from those sounds that trigger misophonia. We tested of measures of unpleasantness (roughness, loudness, and sharpness) were also related to misophonia triggers (but perhaps in a lesser degree), and whether other timbral measures could distinguish aversive, neutral, and misophonia sounds from each other.

(32)

In Part 1 of this study, for the first aim, we created a new questionnaire to measure the severity of misophonia; Miso-PPD. Based on this questionnaire, we found comorbidities of generalized anxiety disorder (GAD), obsessive compulsive disorder (OCD), and hyperacusis, which is partly in agreement with previous findings (Edelstein et al., 2013; Arjan Schröder, Vulink, & Denys, 2013). We also found that age correlated with misophonia severity, which corroborates the finding by Rouw and Erfanian (2017).

Regarding the second aim, in Part 2, we found that based on ratings, people with misophonia are sensitive only to their triggers and not to sounds in general. Specifically, the misophonia triggers were experienced as more unpleasant by people with misophonia. In the case of heightened sensitivity to all sounds, people with misophonia would have also experienced aversive and neutral sounds as more unpleasant than people without misophonia. Our subjective unpleasantness findings all show rating patterns similar to the findings by Kumar et al. (2017), where only the misophonia triggers were rated more negatively by the misophonia participants than by controls, instead of a stronger negative rating of all stimuli in general. The findings by Kumar et al. (2017) were also corroborated by physiological measurements (heart rate and galvanic skin response), which provides additional objective evidence that people with misophonia are triggered by specific sounds only, instead of all sounds in general.

However, regarding the third aim to uncover the acoustic characteristics of those specific sounds, our findings were less straightforward. We hypothesised that the three measures of loudness, roughness, and sharpness, would be related to the perceived

unpleasantness of aversive sounds (Ellermeier et al., 2004; Fastl, 2006), and perhaps also, but in a smaller degree, to the unpleasantness of misophonia triggers. The rest of the features could possibly show some timbral characteristics of misophonia triggers.

Firstly, we found that aversive sounds were different from both misophonia and neutral sounds most consistently on the three features of roughness, loudness, and event density. Thus, loudness and roughness, but not sharpness, could reliably detect

unpleasantness in aversive sounds. We did not expect event density to be related to the unpleasantness of aversive sounds, but given our consistent findings across stimuli sets, we argue that this can also be considered a measure unpleasantness.

Interestingly, misophonia sounds were found to be less perceptually loud and have less events than neutral sounds in some cases. This was not what we expected; that the lack of certain features is related to unpleasantness of misophonia stimuli. This finding could be a result of the specific sound stimuli we chose in those sets, and show no relation with

Referenties

GERELATEERDE DOCUMENTEN

How is the learning of argument structure constructions in a second language (L2) affected by basic input properties such as the amount of input and the moment of L2 onset..

If we distinguish between those students who were told that they could revise their goals in survey 1 (T3) and the other treatments, we find that T3 students who have a grade that

The timeframe of the story is October 1981 to June 1982, and the political events (the commencement of the Israeli incursion into Lebanon) form the background to the story. Yet, the

Le plan incomplet évoque la forme d'un quadrilatère irrégulier s'élar- gissant vers Ie nord, d'une longueur repérée sur 40m et d'une largeur de 35m maximum dans l'état

Een inrichting als M-type ligt dan in delen van de Berkel, waaronder het traject tussen Lochem en Borculo, meer voor de hand. Dat betekent feitelijk geen rivier maar een kanaalachtig

Distributed algorithms allow wireless acoustic sensor net- works (WASNs) to divide the computational load of signal processing tasks, such as speech enhancement, among the

In [6], a binaural multi-channel Wiener filter, providing an en- hanced output signal at both ears, has been discussed. In ad- dition to significantly suppressing the background

An analysis of the data shows that (a) more than half of the subjects could localise the sound source with less than 7.5 de- grees of error, (b) twelve percent of the