• No results found

Description of Flute Tone Quality: Semantic Labelling of Low-Level Features

N/A
N/A
Protected

Academic year: 2021

Share "Description of Flute Tone Quality: Semantic Labelling of Low-Level Features"

Copied!
65
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Description of Flute Tone Quality: Semantic

Labelling of Low-Level Features.

Aleksandra MICHALKO

Supervisor dr. Ashley Burgoyne

(2)

Abstract

Many professional flute players pay a high attention to the tone quality they produce. Previous studies on music timbre helped to define several spectral and temporal acoustic features that best correlate with the per-ceptual representations of timbre, however, they focused predominantly on comparison of isolated notes of different musical instruments. The studies regarding perceptual discrimination within the tone quality of a single mu-sical instrument are rather few. This piece of research aims at matching low-level features of flute sound to the higher-level (linguistic) description of flute timbre. It explores 1) the correlations between descriptors used by listeners and the acoustic features of the sound, and 2) whether there are significant differences in the description of flute tone among individuals and groups with different expertise level. The work presented comprises three experiments, Experiment 1, to extract and meaningfully group acoustic fea-tures of flute sounds; Experiment 2, to establish a set of vocabulary that refers to differences in sounds within a flute tone palette and Experiment 3 to label various flute tones with verbal descriptors and correlate them with previously extracted acoustic features. The findings suggest the existence of significant correlations between specific verbal descriptors and low-level features of the flute timbre.

(3)

Acknowledgements

The research included in this dissertation could not have been performed if not for the assistance, patience, and support of many individuals. I would like to extend my gratitude first and foremost to my thesis advisor dr. Ashley Burgoyne for mentoring me over the course of my graduate studies. Thank you for introducing me to Computational Musicology and statistical analysis, tutoring and your patience with all my questions.

I would additionally like to thank dr. Makiko Sadakata for the valuable advice and feedback.

Finally I would like to extend my deepest gratitude to Manuel Gigena, my family and friends who supported me and were always there for me.

(4)

Contents

1. Introduction 6

2. Experiment 1: Computational Analysis. Extraction of low level acoustic

features. 8

2.1. Introduction . . . 8

2.2. Methods and Materials . . . 10

2.2.1. Recording Sessions . . . 10

2.2.2. Participants . . . 11

2.2.3. Recording Procedure . . . 11

2.2.4. Input and Pre-processing . . . 12

2.2.5. Low Level Feature Extraction . . . 13

2.2.6. Factor Analysis . . . 13

2.3. Results . . . 17

2.4. Discussion . . . 20

3. Experiment 2: High level features, semantic labelling 21 3.1. Introduction . . . 21

3.2. Methods and Materials . . . 22

3.2.1. Participants . . . 22 3.2.2. Stimuli . . . 23 3.2.3. Procedure . . . 23 3.3. Results . . . 24 3.4. Discussion . . . 25 4. Experiment 3 27 4.1. Introduction . . . 27

4.2. Materials and Methods . . . 29

4.2.1. Online Survey . . . 29

4.2.2. Stimuli . . . 30

(5)

4.2.3. Music Background Questionnaire . . . 30

4.2.4. Participants . . . 31

4.2.5. Procedure . . . 32

4.2.6. Statistical Analysis . . . 32

4.3. Results . . . 33

4.3.1. Factors and Vocabulary . . . 33

4.3.2. Different factors described by the same words . . . 38

4.4. Discussion . . . 48

5. Conclusion 50

Appendices 60

A. Flute Material 61

B. Informed consent form 62

C. Results of Logistic Regressions presented in log odds 63

D. The General Table of The Frequency of Every Adjective 66

(6)

1. Introduction

The famous and respected piano teacher Heinrich Neuhaus wrote: “Music is a tonal art. (. . . ) Since music is a tonal art, the most important task, the primary duty of any performer is to work on tone” (Neuhaus, 1978). Indeed, many active musicians spend hours of their daily practice on sound exercises (Toff, 2012). During their musical education, as well as their professional career, flutists are trained to pay special attention to the tone quality which they produce. Tone Development Through Interpretation for the Flute (1986) of M. Moyse, Melodies for developing tone and interpretation (2008) of R. Winn or Trevor Wye Practice Book For The Flute: Book 1-Tone (2013) of T. Wye are some of many book proposals that are concerned with the development of flute sound. But what does the development of a good tone quality mean? What defines a good tone quality?

Research on musical timbre discovered that there exist “fixed” acoustic features that help a listener distinguish the sound of flute from different instruments, such as the oboe, trombone or violin. However, which acoustic elements distinguish the sound of one flute player from the other? Which acoustic features can be controlled by a flutist and make it possible to create a specific quality of sound recognizable for a specific performer? Which acoustic features are universal for all flutists and which ones are variable and tend to vary among flute players? Those and many other questions have been partly addressed and answered by the studies of Coltman (1971, 2006) and Fletcher (1975). The flute is an aerophone instrument, which means that the sound is produced by causing a jet of air to vibrate within a tube without the use of membranes or strings (Randal, 1999). Contrary to widespread belief among many flutists, the material of which the flute is made does not affect the harmonic content and acoustic features of a tone (Widholm et al., 2001; Coltman, 1971). Studies of Coltman (1971, 2006), Fletcher (1975) and Billington (2000) point to the differences in the acoustic features among the sound of various flutists, even though there is a standard combination of harmonics for musical instruments. The direction of the jet above or below the edge of the embouchure hole

(7)

and the blowing pressure play a crucial role in determining the harmonic content and the dynamics of a flute sound (Coltman, 2006; Widholm et al., 2001). Moreover, the body of a player works as a resonator for the flute, which again implies a reliance of the acoustic content of the flute tone on the physiology of the player (Dickens et al., 2007; Billington, 2000; Wilcocks et al., 2006; Fletcher, 1975). Indeed, after years of practice and flute lessons, flutists are able to consciously control the speed of the air, the jet offset and their body in order to obtain various tone “colors” to greater or lesser degree. The skill to manipulate the harmonic content of a sound is especially useful in order to create special effects required by composers of contemporary music pieces (De Wetter-Smith, 1978). For instance, the comparative study of vibrato by Fletcher et al. (2001) suggests that vibrato slightly changes the harmonic components of the flute sound.

In general, the music timbre is defined as a multidimensional entity which serves as one of the primary perceptual vehicles for sound source recognition and contributes to the understanding of a sound’s perceptual attributes that can either continuously vary e.g. brightness, roughness, or that are discrete or categorical e.g. as a set of ”fixed” features, which distinguishes a clarinet sound from a flute sound (McAdams, 2013). Most studies on musical timbre research focused on the discrimination, categorization and identification of the isolated notes of various musical instruments outside any musical context. However, little is known about the perceptual discrimination of the nuanced timbre changes by a single musical instrument (Loureiro et al., 2004; McAdams, 2013; Alluri and Toiviainen, 2010).

This research is concerned with the cognition of flute tone quality by flute players and non flute players. It explores 1) the possible correlations between descriptors used by lis-teners and the acoustic features of the sound 2) whether there are significant differences in the description of flute tone among individuals and groups with different expertise level. In order to answer those questions, empirical data with computational analysis of sound features will be combined. This approach has been used by many researchers involved with studies on musical timbre and tone quality (Stepanek, 2004; Grey, 1977; Alluri and Toiviainen, 2010). It resulted to be the most effective and practical method

(8)

for measuring, controlling and separating the signal’s physical properties from what is constructed by the cognition and cultural constraints within experimental stimuli (Au-couturier and Bigand, 2013; Alluri and Toiviainen, 2010). For instance, this approach helped to discover correlations between the perceptual sound descriptions of professional flutists and the harmonic spectrum of a flute sound (Yorita and Clements, 2015). More-over, it showed significant differences in the preferences for flute tone quality between professional flutists and other musicians (Yorita and Clements, 2015).

This study consists of three experiments: In the first experiment (Chapter 2), all audio samples will be collected and a computational analysis of audios’ acoustic features will be performed. The sound features such as roll off, entropy, high energy-low energy ratio, spectral centroid and Mel-frequency cepstral coefficient will be extracted, analyzed and grouped into meaningful clusters. The second experiment (Chapter 3) will help to establish a frame of reference i.e. a set of vocabulary that refers to differences in sounds within a flute tone palette by professional flutists. In the third experiment (Chapter 4), by means of an online survey, participants will be presented with a set of previously selected audio material from which perceptual data will be collected. The objective is to create a reference frame which will facilitate a task for participants not familiar with the description of a flute tone quality by means of the Spontanous Verbal Description (SVD) method (Stepanek, 2002). Furthermore, regression analyses will be performed to check for the correlations between perceptual components and acoustic features of the flute sound. Finally, chapter 5 concludes findings of this study and presents suggestions for further research.

2. Experiment 1: Computational Analysis. Extraction of low

level acoustic features.

2.1. Introduction

Research on timbre which combine empirical and computational approaches go back to 1970s. Their focus was predominantly on finding the mental representations of the

(9)

stimuli, i.e. the meaningful timbre spaces and the categorization of the sounds of various instruments (Grey, 1977; McAdams et al., 1995; Alluri and Toiviainen, 2010).

Due to the multidimensional character of timbre, multidimensional scaling was one of the methods mostly used in those experiments. The multidimensional scaling algorithms map the subjective distance relationship into a geometric space (Grey, 1977; McAdams et al., 1995). The subjective distance relationship is calculated from previously gathered perceptual data, which usually takes the form of perceptual similarity judgments between the pairs in a set of stimuli (Grey, 1977; McAdams et al., 1995). A few studies used self-organizing maps (SOM) in order to create the “physical” timbre spaces that would define the correlations between the audio features and the perceptual timbre spaces (Alluri and Toiviainen, 2010; Cosi et al., 1994; Loureiro et al., 2004). Those previous investigations helped to define several spectral and temporal acoustic features that best correlate with the perceptual representations of timbre: spectral centroid (McAdams et al., 1995) , which is interpreted as a measure of perceived “brightness” (Alluri and Toiviainen, 2010); spectral flux; log-attack time, spectral irregularity; Mel-Frequency Cepstral Coefficients (MFCC), zero-crossing rate, entropy and spectral roll-off to name a few. Although investigations of timbre mostly included studies on monophonic timbre (Alluri and Toiviainen, 2010; Grey, 1977; McAdams et al., 1995; Loureiro et al., 2004), they focused predominantly on the comparison of isolated notes of different musical instruments outside any musical context (Grey, 1977; McAdams et al., 1995). The studies regarding the perceptual discrimination within the tone quality of a single musical instrument are rather few (although this is not an exhaustive list): 2 studies on violin ((Stepanek, 2004; Lukasik, 2005); clarinet (Loureiro et al., 2004); organ pipe (Disley et al., 2006); saxophone (Nyk¨anen and Johansson, 2003; Lukasik, 2005) and flute (Yorita and Clements, 2015).

In this experiment the physical source of a sound is known and fixed, hence, a set of sound low level characteristics is needed. The choice of appropriate acoustic features is of crucial importance and their selection will vary depending on the task and the modeling approach. In the following sections, I will discuss the steps of the computational analysis

(10)

Figure 1: Pipeline. Adapted from ‘Music Information Retrieval: Flute Sound Acoustic Features,’ by A. Michalko, 2019, p. 4.

which was carried out in order to extract the acoustic features of flute sound and organize them into meaningful clusters. All those steps and processes were needed to create the online survey, which will be used in the final empirical experiment.

The general pipeline of the extraction and analysis of the audio features is shown in Figure 1.

2.2. Methods and Materials

2.2.1. Recording Sessions

5 recording sessions were scheduled between January and February 2019 in London, United Kingdom and Leuven, Belgium. Each recording session lasted around 20-30 min-utes. The recordings were made in different environmental settings due to the different

(11)

locations of the flute players (United Kingdom and Belgium). Although the researcher did not have complete control over the environmental setting, the data collected was more realistic, since in real-life situations, listeners are exposed to a myriad of music audios recorded in various acoustical circumstances (Yorita, 2014; Yorita and Clements, 2015).

All recordings were made in acoustically treated rooms, with the same sound recording equipment i.e. ZOOM H1. During every recording session, the recorder was positioned horizontally at a 3-meter distance from the flute player. Every recording session was made in one take and saved in .WAV format.

2.2.2. Participants

5 flutists (4 female, mean age = 30) took part in the recording sessions. 2 of them were Polish, 2 Spanish and 1 Italian. The instruction language was English. The flute players were professionally trained at either Royal College of Music London in United Kingdom or in LUCA School of Arts Campus Lemmensinstituut in Leuven, Belgium. They are all active musicians and had at least 10 years of music formation. The recordings were made on a Miyazawa 925 Silver flute, a Marteau Silver flute and a Sankyo CF701 Silver flute.

The participants did not receive any financial reimbursement, but they were offered a small snack when the recording session was over.

2.2.3. Recording Procedure

Each flutist was provided with the informed consent form and music sheet on the day of the recording. The music sheet consists of whole notes with a pitch range between C4 and A6.

This study is limited to the production of the flute sound in a “traditional classical western music performance”. It excludes sonorities regularly used in contemporary music performances such as flutter tonguing, pitch bending, glissando or multiphonics (Dickens et al., 2007) as well as traditional techniques from other cultures. However, it includes

(12)

sound examples with vibrato (either strictly controlled by a player or not) as the vibrato is a part of the flute’s traditional classical western performance, even though it may alter acoustic features of the flute tone (Fletcher et al., 2001). The flute players were instructed to play at one intensity level mezzo-forte (mf), to repeat each note with and without vibrato and to play each note from 5 to 6 seconds . They could repeat the notes in case they were unsatisfied with the output. In order to feel comfortable and relaxed, they were reminded about the anonymity of the experiment and the possibility to stop the recording at any moment. The music sheet and the consent form are in the appendix A and B respectively.

2.2.4. Input and Pre-processing

The final product of the recording sessions were 5 audios in .Wav format and served as input for the further analysis. The recorded audios had to be pre-processed, since all the recording sessions were done in one take for the comfort and fluent progression of the recording sessions. The pre-processing of audio was done in an Audacity digital audio editor. The recordings were cut into short audio files and ordered in separate files according to the pitch note. The studies on the relationship between pitch and timbre show that the timbre differences between two sounds can substantially affect the pitch perception of the sound (Vurma et al., 2011; Sethares, 2005). For this reason, in order to exclude pitch interference on the choice of higher level descriptors, the audios were equalized for pitch.

However, due to the preservation of the original attack decay and fade phases, the recordings vary in duration (ca. 4-8 seconds). The aim was to preserve the highest possible ecological validity of sounds.

As a result, 460 audio tracks were created, each representing one note. 417 audio tracks (out of 460) were selected for a computational analysis in Matlab environment.

(13)

2.2.5. Low Level Feature Extraction

Audio feature extraction lies at the basis of a substantial proportion of audio process-ing, music information retrieval, audio effect design and audio synthesis tasks (Moffat et al., 2015). It underpins sound applications (e.g. SHAZAM) and programs for speech or music emotion recognition and machine learning tasks (Yang et al., 2008). Due to its robust and indispensable utilization, various computer softwares and programming languages were developed to optimize and facilitate audio feature extraction, e.g. MIR-Toolbox (Lartillot, 2018), The Timbre MIR-Toolbox (Peeters et al., 2011), Meyda or YAAFE (Moffat et al., 2015). In this study, the audio features were extracted using MIRToolbox. MIRtoolbox is a MATLAB toolbox for audio processing. Through to its user-friendly syntax, it allows for easy offline extraction of high and low level audio features (Moffat et al., 2015). The low level timbre features were extracted with the function mirfeatures. This function breaks audios down to short overlapping frames and allows to compute a large set of features from different fields (e.g. timbre, dynamic) simultaneously. The features are returned in a structure array (Lartillot, 2018). In total, 890 features were extracted from 417 sound samples, inter alia, standard deviations of Mel Frequency Cepstrum Coefficients (MFCC), spectral flux, roughness, zero-crossing rate and central centroid. These were exported to .txt files and further computed in R environment.

2.2.6. Factor Analysis

Feature selection is the process of selecting a subset of relevant informative features for the model construction. It may include general data reduction (to increase algorithms speed and efficiency); reduction of feature set (to make interpretation easier); or per-formance improvement (to improve predictive accuracy) (Jimenez-del Toro et al., 2017; Guyon et al., 2008).

In timbre studies which focus on correlations between semantic labelling and audio fea-tures usually the multidimensional scaling (Grey, 1977; McAdams et al., 1995; Stepanek, 2004), SOM (Whitman, 2003) or semantic decomposition was applied (Lukasik, 2005; Whitman, 2003). Some studies also used a step-wise regression or sequential feature

(14)

Figure 2: Very Simple Structure: index of goodness of fit for factor solutions. Adapted from ‘Music Information Retrieval: Flute Sound Acoustic Features,’ by A. Michalko, 2019, p. 7.

selection which are mostly used in research on the predictions of the listeners’ mood and emotions on a particular audio through the use of audio features (Alluri and Toiviainen, 2010; Yang et al., 2008; Zacharakis et al., 2011). Those methods use all features as an input and through an iterative processing, they select features that provide the best output.

This study uses the Exploratory Factor Analysis using Minimal Residuals (EFA) in order to identify the latent relational structure among all extracted audio features and, as a consequence, to reduce them to fewer factors. 417 observations with 894 variables from acoustic feature extraction were used. Each observation corresponds to a single sound and each variable refers to a single extracted audio feature (with exception of 4 variables that depict name of the note e.g. do, octave e.g. 5, row name and the name of the audio file e.g. do201). All computational processes were done in R using psych and tidyverse packages.

Before the actual factor analysis, the Very Simple Structure (VSS) was conducted. The VSS is an index of goodness of fit for factor solutions of increasing rank. It helps

(15)

to determine the optimal number of factors by considering increasing levels of factor complexity (c, i.e. the number of factors on which an item loading may differ from zero, up to a pre-specified value) (Revelle and Rocklin, 1979). The VSS figure 2 suggests 4 factors would be optimal for conducting EFA.

As suggested by the VSS and PA the appropriate number of factors in exploratory factor analysis is four. Nevertheless, as a check-up, the EFA for 3, 4, and 5 factors was conducted. As expected, the Exploratory Factor Analysis of 4 factors has yielded best results. As the last step of the factor analysis, the scores were meaningfully ordered by the name of the note, octave and corresponding name of the audio file.

(16)

Table 1: The factor analysis of extracted audio features. Adapted from ‘Music Infor-mation Retrieval: Flute Sound Acoustic Features,’ by A. Michalko, 2019, p. 9.

Results of the factor analysis - top 13 of each factor.

Variable MR1 MR2 MR3 MR4 h2 u2 com

tonal hcdf1 HarmonicChangeDetectionFunction Std 0.87 0.79 0.21 1.07 spectral mfcc2 Mel-Spectrum Std 32 0.85 0.73 0.27 1.03 spectral mfcc2 Mel-Spectrum Std 39 0.84 0.78 0.22 1.24 tonal mode1 Mode Std 0.84 0.74 0.26 1.09 spectral mfcc2 Mel-Spectrum PeriodEntropy 39 0.83 0.70 0.30 1.03 spectral mfcc2 Mel-Spectrum Std 35 0.83 0.71 0.29 1.12 spectral mfcc2 Mel-Spectrum PeriodEntropy 37 0.83 0.70 0.30 1.06 spectral mfcc2 Mel-Spectrum Std 37 0.83 0.76 0.24 1.25 spectral mfcc2 Mel-Spectrum PeriodEntropy 40 0.81 0.67 0.33 1.08 spectral mfcc2 Mel-Spectrum Std 38 0.80 0.73 0.27 1.31 tonal mode2 Keystrength Std 4 0.80 0.66 0.34 1.04 spectral mfcc2 Mel-Spectrum Std 31 0.80 0.64 0.36 1.00 timbre spectralflux1 Spectralflux Mean 0.85 0.77 0.23 1.15 fluctuation peak1 Fluctuation PeakPosMean 0.80 0.72 0.28 1.26 spectral ddmfcc1 Delta-Delta-MFCC Std 4 0.78 0.62 0.38 1.07 spectral ddmfcc1 Delta-Delta-MFCC Std 3 0.78 0.60 0.40 1.00 spectral ddmfcc1 Delta-Delta-MFCC Std 6 0.77 0.61 0.39 1.05 spectral ddmfcc1 Delta-Delta-MFCC PeriodAmp 3 0.77 0.61 0.39 1.08 spectral ddmfcc1 Delta-Delta-MFCC PeriodAmp 1 0.76 0.61 0.39 1.09 spectral ddmfcc1 Delta-Delta-MFCC Std 7 0.75 0.58 0.42 1.08 timbre spectralflux1 Spectralflux Std 0.75 0.69 0.31 1.56 spectral ddmfcc1 Delta-Delta-MFCC PeriodAmp 8 0.75 0.58 0.42 1.04 spectral ddmfcc1 Delta-Delta-MFCC Std 2 0.75 0.57 0.43 1.11 spectral ddmfcc1 Delta-Delta-MFCC PeriodAmp 4 0.75 0.58 0.42 1.08 spectral ddmfcc1 Delta-Delta-MFCC PeriodAmp 11 0.75 0.56 0.44 1.01 spectral mfcc2 Mel-Spectrum Slope 10 0.93 0.87 0.13 1.01 spectral mfcc2 Mel-Spectrum Slope 16 0.92 0.86 0.14 1.01 spectral mfcc2 Mel-Spectrum Slope 15 0.92 0.86 0.14 1.01 spectral mfcc2 Mel-Spectrum Slope 9 0.92 0.86 0.14 1.01 spectral mfcc2 Mel-Spectrum Slope 25 0.91 0.84 0.16 1.01 spectral mfcc2 Mel-Spectrum Slope 12 0.90 0.82 0.18 1.01 spectral mfcc2 Mel-Spectrum Slope 39 0.90 0.82 0.18 1.01 spectral mfcc2 Mel-Spectrum Slope 11 0.90 0.81 0.19 1.01 spectral mfcc2 Mel-Spectrum Slope 40 0.90 0.82 0.18 1.01 spectral mfcc2 Mel-Spectrum Slope 14 0.90 0.82 0.18 1.02 spectral mfcc2 Mel-Spectrum Slope 38 0.90 0.82 0.18 1.01 spectral mfcc2 Mel-Spectrum Slope 26 0.90 0.80 0.20 1.00 spectral mfcc2 Mel-Spectrum Slope 8 0.90 0.81 0.19 1.01 spectral mfcc2 Mel-Spectrum Slope 13 0.89 0.81 0.19 1.01 tonal chromagram peak1 Chromagram PeakPosMean -0.93 0.88 0.12 1.01 tonal chromagram centroid1 centroidofChromagram Mean -0.93 0.88 0.12 1.02 spectral dmfcc2 MFCC Mean 1 0.89 0.81 0.19 1.07 spectral mfcc1 MFCC Mean 1 0.89 0.81 0.19 1.07 spectral mfcc2 Mel-Spectrum Mean 4 0.86 0.78 0.22 1.04 spectral irregularity2 Spectrum PeakPosMean -0.83 0.70 0.30 1.03 spectral dmfcc1 Delta-MFCC Slope 1 -0.83 0.68 0.32 1.01 spectral ddmfcc2 Delta-MFCC Slope 1 -0.83 0.68 0.32 1.01 spectral roughness2 Spectrum PeakPosMean -0.83 0.69 0.31 1.03 spectral mfcc2 Mel-Spectrum Mean 2 0.82 0.70 0.30 1.09 timbre zerocross1 Zero-crossingrate Mean -0.79 0.62 0.38 1.02 spectral mfcc2 Mel-Spectrum Mean 3 0.78 0.62 0.38 1.03 spectral mfcc2 Mel-Spectrum Mean 1 0.77 0.64 0.36 1.15

(17)

Table 2: The factor loadings of the Factor Analysis Variable MR1 MR2 MR3 MR4 SS loadings 55.3 53.79 44.11 44.97 MR1 1.00 -0.06 -0.01 0.01 MR2 -0.06 1.00 -0.01 -0.04 MR4 0.01 0.04 0.05 1.00 MR3 -0.01 -0.01 1.00 0.05 2.3. Results

The Factor analysis reduced and grouped the acoustic audio features into four factors. Table 1 shows the top 13 observations for each factor. The audio features, which are grouped within the first factor, come from spectral and tonal fields: standard deviations of MFCC2 spectrum 31, 32, 35, 38 and 39; standard deviation of mode 1; period entropy of MFCC2 spectrum 37, 39 and 40; and standard deviation of mode2 keystrength. Mel Frequency Cepstral Coefficients (MFCC) is a set of features which models a spectral shape of the sound. They approximate a psychoacoustic representation of the spectral content as the MFCC frequency bands are equally spaced on the mel scale (Lartillot, 2018). For these reasons they can be perceptually motivated (Weihs et al., 2016). Factor 1 entails predominantly MFCCs standard deviations and period entropy, which refer to the degree of disorder or uncertainty in the given features. Thus, it could be described as a measure of perceived openness, dispersion of the flute sound; more of the Factor 1 the sound would be perceived as unfocused, open, airy and less of the Factor 1 would qualify a sound as closed, focused, small or edgy. For those reasons, in the further sections, I will refer to Factor 1 as Dispersion factor.

The second factor consists of features related to timbre and spectral field: mean and standard deviation of spectral flux 1; PeakPosMean of peak1 fluctuation, standard deviations of delta-delta MFCC 2, 3, 4, 6; period amplitude of delta-delta MFCC 1, 3, 4, 8, 11. It could suggest its correspondence with fluctuation and the speed of vibrato,

(18)

since the spectral flux measures the amount of spectral change between consecutive signal frames. In MIRToolbox 1.7.1 user’s manual it is described as the distance between the spectrum of each successive frames (Lartillot, 2018). Moreover, deltas and delta-deltas, also known as differential and acceleration coefficients, give information about the trajectories of the MFCC coefficients over time. The presence of PeakPosMean peak1 fluctuation also further confirms previous conjecture. In the next sections, I will refer to this factor as the Vibrato factor.

The third factor’s acoustic features come from spectral field and they refer predomi-nantly to the MFCC2 spectrum 8, 9, 10, 11, 12, 13, 14, 15, 16, 25, 26, 38, 39, 40 slopes. A slope suggests an increasing change in one direction, which in case of MFCCs spectra might refer to the changes in the energy distribution in MFCCs across time. For this reason, factor 3 might describe perceived unsteadiness, wobbliness, irregularity, instabil-ity of the flute sound. In the next sections, I will refer to this factor as the Unsteadiness factor.

The fourth and last factor consist of features from spectral, tonal and timbre fields: PeakPosMean of chromagram peak1; means of centroid1 chromagram, DMFCC2 1, MFCC1 1, MFCC2 mel-spectrum 4; PeakPosMeans of irregularity2 spectrum and rough-ness2 spectrum; and slopes of DDMFCC1 delta-MFCC 1 and DDMFCC2 delta-MFCC 1. The spectral centroid represents the centre of gravity of the rate-map (Peeters et al., 2011). The zero-crossing rate is a measure of the frequency content of a signal. It counts the number of times the signal crosses the X-axis. It is directly extracted from the time domain representation. Roughness estimates of the sensory dissonance related to the beating phenomenon whenever pair of sinusoids are closed in frequency (Lartillot, 2018). This factor as the only one has also negative loadings. For instance, it corre-lates negatively with acoustic features which measure the variability of fluctuations i.e. spectral dmfcc1 Delta-MFCC Slope and spectral ddmfcc2 Delta-MFCC Slope 1. Factor 4 corresponds to the distribution of the energy across the lower and upper frequencies of the sounds. Grey (1977) interpreted it as spectral energy distribution, which in the literature was also called the brightness (Ferrer and Eerola, 2011). Normally, the scale

(19)

goes from lower to upper frequencies (denoting an upward movement). However, in this case, an increase on the scale denotes an increase of the lower frequencies. In other words, the scale goes from upper to lower frequencies. For this reason, in the following sections, I will refer to this factor as Darkness instead of Brightness.

The ordered data of the factor analysis is presented in Figure 3. Only factor 4 -Darkness factor shows a clear pattern which corresponds to the pitch height of the tones. The tendencies of other factors might not be transparent as they do not describe the distribution of the energy peakiness. For instance, Factor 1 - the Dispersion factor, represents the degree of dispersion of the sound (Alluri and Toiviainen, 2010); and features in Factor 2 - the vibrato factor refer to the fluctuation and thus speed and form of the vibrato, which are spread across the notes. The third factor - the Unsteadiness factor refers to the increasing changes in sound MFCCs spectra. In other words, they are independent from the pitch height and their patterns will not be explicitly visible on this graph.

Figure 3: Audio’s grouped according to factor analysis results. Adapted from ‘Music Information Retrieval: Flute Sound Acoustic Features,’ by A. Michalko, 2019, p. 11.

(20)

2.4. Discussion

This experiment helped to identify and meaningfully group acoustic features, which are crucial to differentiate flute sounds in this set of audios. It determined four factors, which correspond to 1) the degree of dispersion of the flute sound 2) the fluctuation and speed of the vibrato, 3) increasing changes in sound MFCCs spectra 4) the distribution of energy across the lower and upper frequencies. The output of the pipeline i.e. audio recordings which were sorted according to the factor analysis are the final product of this experiment and will be used in the following experiments.

This experiment, however, is not without limitations. First, MIRToolbox software from Matlab environment is widely known and used among researchers for its good performance reports (Moffat et al., 2015). Nevertheless, most of these reports were made by the developers of the software itself (Kumar et al., 2015). The recent study of Kazazis et al. (2017) suggests that MIRToolbox does not perform equally well on different sound sets. Moreover, the studies of Kumar et al. (2015) show a higher accuracy for string instruments than for brass instruments. How this relates to wind instruments, and flute in particular, still needs to be determined as, to my knowledge, there are no studies on this topic yet.

Second, since the experiment used natural flute sounds without modifying their acous-tic features (with exception of pitch equalization), each sound consisted of the features grouped in all those 4 factors. Although collected flute sounds vary substantially in the degree and mixture of those factors, it is impossible to exclude or suppress specific factors without computational manipulation. Indeed, the trade of internal validity i.e. the presence of confounding in an experiment (Sturm and Wiggins, ress) for ecologi-cal validity had to be made. Due to the main research question which investigates the cognitive response of human beings to the natural flute tone quality, I decided not to manipulate the sounds, even though the study might encounter possible confounds and, in consequence, might see its internal validity questioned.

Finally, the results of Factor Analysis are based solely on correlations among variables which can subsequently produce factors whose interpretation is challenging. Another

(21)

disadvantage is that the results’ interpretation is subjective to and highly dependant on the interpretation of the researcher. In order to diminish the subjectivity of a single investigator, an examination panel consisting of 3 professional flutists was created in order to verify the conjectures of the researcher and to establish the list of semantic descriptors based on the collected audio files. The following section discusses the choices of the semantic descriptors for the cognitive experiment, which is based on the results obtained from this factor analysis.

3. Experiment 2: High level features, semantic labelling

3.1. Introduction

A vocabulary used to describe timbre has previously been studied in a number of different contexts. There are works that analyze the semantic space derived from vocabulary to describe timbre in general (von Bismarck, 1974; Kendall and Carterette, 1993; Disley et al., 2006) as well as vocabularies that refer to one particular sound source, for instance, violin (Stepanek, 2002; Lukasik, 2005), flute (Yorita and Clements, 2015), saxophone (Nyk¨anen and Johansson, 2003) and organ (Disley et al., 2006). There are only few works interested in exploring a vocabulary used by non-specialists (Sarkar et al., 2007; Ferrer, 2012).

The most frequently used descriptors in timbre research are taken from other sensory domains than the auditory (Zacharakis et al., 2011; Zbikowski, 2002). For instance brightness and color come from the visual domain, and soft, hard from somatosensory domain.

Previous studies that looked for correlations between semantic labels and acoustic features used either rating methods or a verbal description of the timbre stimuli (Alluri and Toiviainen, 2010). The commonly used rating methods are bipolar scales and the Verbal Attribute Magnitude Estimation (VAME). The VAME uses scales which quantify the applicability of each adjective e.g. dark vs not dark, whereas bipolar scales place antonyms at extremes along a scale e.g. soft vs hard (Kendall and Carterette, 1993;

(22)

Alluri and Toiviainen, 2010). However, some researchers opt for Spontaneous Verbal Description of Sounds (SVD) in order to establish a list of descriptors without limiting respondents to a finite set of words (Stepanek, 2004). In this experiment the SVD will be used in order to create a set of words that refers to the description of the flute tone quality. It was chosen for several reasons: 1) there is no extensive taxonomy or database with descriptors for the flute tone quality as opposed to the violin (Lukasik, 2005) 2) all the previous studies on musical timbre focused mostly on the general descriptors of the sounds whose set of adjectives might therefore not be sufficiently nuanced to differentiate between subtle changes in the tone palette of single instruments (Loureiro et al., 2004) 3) the semantic labels used in studies which focus on individual instruments vary depending on the sound source and cannot be directly applied to this experiment. To my knowledge, there is only one study of Yorita and Clements (2015) that investigates the descriptors of flute tone quality. Nevertheless, since it explores the vocabulary regarding only changes in harmonic spectrum, their terminology list might be incomplete.

The objective of this experiment is to establish a terminology which refers to the flute timbre and will be used in the following experiment and which also will help 1) to reduce the cognitive load for all participants (Honing and Ladinig, 2008) 2) as well as to complete the task for participants who are not familiar with tasks involving the description of a flute sound (Yorita and Clements, 2015).

The following section describes the Methods and Materials used in this experiment and is followed by the Results and Discussion sections.

3.2. Methods and Materials

3.2.1. Participants

Three professional female flutists took part in this experiment. Two of them also par-ticipated in the previous experiment. The mean of age was 28 and each of them has at least 13 years of playing experience. Two of the participants are Polish and one Ital-ian. Each has a high proficiency of English and received a higher musical education in English. As a result, they are familiar with the terms which are commonly used to

(23)

scribe flute sounds in English. Once again, the participants did not receive any financial reimbursement.

3.2.2. Stimuli

46 single flute notes were chosen from the collected data in the previous experiment. 3 criteria were applied: 1) they had either a high positive or negative value in a single factor 2) in other factors they had relatively low values 3) the notes with high positive and negative values in a single factor had the same pitch height.

The example of this selection is shown in Table 3. The fa322 and fa307 are of the same pitch height: F6. Fa322 has a high positive value and fa307 has a high negative value in the factor 1 Dispersion, but they both have relatively low values in other factors. A similar situation is shown by sol205 and sol217 with the exception that they correspond to the pitch height of G5 and exemplify high positive and negative values of the factor 3 Unsteadiness.

In order to facilitate the task and create a frame of reference for the participants, the 46 single flute sounds were grouped into pairs according to the above criteria, resulting in 23 pairs.

Table 3: Factor values, selected examples

note name factor 1: Dispersion factor 2: Vibrato factor 3: Unsteadiness factor 4: Fullness fa322 2.41013630 0.52657152 0.247050464 -0.98605694 fa307 -2.22057268 -0.06444022 -0.09246145 -1.0724797 sol205 0.22777610 -0.39590242 3.052248990 0.60812438 sol217 0.80185115 1.5824068 -1.12028614 -0.39956445

3.2.3. Procedure

The experiment was carried out online. After reading and signing the consent form, each participant was provided with a two-part questionnaire made in an excel sheet. English was the instruction language. In the first part, respondents answered

(24)

graphic questions such as age, sex, years of musical experience, the highest obtained musical education and the English language proficiency. The second part of the ques-tionnaire consisted of names of the audio samples e.g. Fa322, Fa307 and blank spaces next to them. The blank spaces were used to write down the adjectives and expressions to the corresponding audio sample. The audio files were shared with all respondents with Google Drive. The participants could listen to the samples as many times as they wanted, and could also write down an unlimited amount of adjectives. Respondents did not have access to each other’s answers.

3.3. Results

38 adjectives were used for the description of 23 flute sound pairs as shown in Table 4. The words are ordered in accordance with their frequency i.e. from highest to lowest. The most frequent words are with vibrato, without vibrato, centered and airy. Each of them was used more than 20 times. The least frequent words are clear, covered, flat, inactive, piercing, relaxed, unsteady and vivid. They were used 3 times.

The most frequent words that referred to the Dispersion Factor are with vibrato, centered, nasal, without vibrato, airy, active. The words predominantly used with a factor’s high positive and negative values are airy, open and centered, nasal, focused respectively.

The audio samples which corresponded to high positive and negative values of the Vibrato Factor were mostly described by with vibrato and without vibrato respectively.

The Unsteadiness Factor was mostly described as with vibrato, airy, without vibrato, open, bright, with energy, sharp. However, it is not clear which words align best with the positive and negative values of this Factor.

The audio samples which corresponded to high positive and negative values of Dark-ness Factor were mostly described using without vibrato, sharp, steady, with vibrato, airy, with predominance of high overtones, with predominance of low overtones. With predom-inance of lower overtones and higher overtones corresponded to the higher positive and negative values respectively.

(25)

Table 4: List of adjectives describing flute tone quality

adjective adj total 1 with vibrato 41.00 2 without vibrato 40.00 3 centered 22.00 4 airy 21.00 5 active 14.00 6 full 14.00 7 nasal 14.00 8 open 12.00 9 sharp 12.00 10 focused 11.00 11 dark 10.00 12 pred low overtones 9.00 13 steady 9.00 14 edgy 8.00 15 bright 7.00 16 rich 7.00 17 with energy 7.00 18 metallic 6.00 19 small 6.00 20 squeezed 6.00 21 clean 5.00 22 intense vibrato 5.00 23 pred high overtones 5.00 24 slow vibrato 5.00 25 unfocused 5.00 26 closed 4.00 27 pushed 4.00 28 wide vibrato 4.00 29 without energy 4.00 30 wobbly 4.00 31 clear 3.00 32 covered 3.00 33 flat 3.00 34 inactive 3.00 35 piercing 3.00 36 relaxed 3.00 37 unsteady 3.00 38 vivid 3.00 3.4. Discussion

The objective of this experiment is to provide a set of words that describes the flute tone palette based on natural flute sounds, as well as to create a reference frame which will facilitate the task for participants who are not familiar with the description of flute tone quality. As the final product of this experiment, a set of 38 descriptors was created.

Additionally, with the exception of Unsteadiness factor, the results of this experiment

(26)

line up with the suggested names for the acoustic-features-factors made by the researcher in the previous experiment. The Dispersion factor is described with the words airy, open when the degree of dispersion is highly positive and centered, nasal, focused when it is negative. Similarly, the Vibrato factor is referred to as with vibrato when it has a high fluctuation and as without vibrato when it has a very low fluctuation. The Darkness factor i.e. spectral energy distribution factor coincide with the description of its high positive and negative values as with predominance of lower overtones and with predominance of higher overtones respectively. Only the Unsteadiness Factor did not yield a clear alignment.

In recent studies on timbre and semantic labelling, researchers point out the chal-lenge of discussing tone qualities with different groups of people (Porcello, 2004). They criticize previous studies for focusing mostly on the terminology used by people closely involved with the particular sound source and ignoring the vocabulary of other people (Ferrer, 2012). As a solution, they propose various online approaches, which could help in discovering studied vocabulary such as the analysis of tags from last.fm (Ferrer and Eerola, 2011); Freesound database (Sarkar et al., 2007) or especially design game ap-plications (Turnbull et al., 2007). Nevertheless, even though this approach might have bigger ecological and external validity than the traditional methods used in laboratory settings, to my knowledge, until now, it seems to have been focused mostly on collecting general terminology which refer to timbre, emotions, and genre categorization and not on the nuanced differences between the tone quality of a single instrument (Ferrer, 2012; Turnbull et al., 2007). Therefore, this approach is not applicable to this piece of research. Moreover, Sarkar et al. (2007) suggests that some of the strong correlations between certain words and timbre depend on a respondent’s musical and cultural background. Indeed, the external validity of this experiment could be put into question. The under-lined correlations might be specific to those 3 participants and do not need to represent the universal correlations between words and timbres used by all professional flutists (Sturm and Wiggins, ress). Furthermore, only four out of ten most frequent descriptors used by flutists in the study of Yorita (2014) coincide with the top ten of this study

(27)

e.g. airy, full, open, focused. On the other hand, the focus of Yorita (2014) study was exclusively on the harmonic spectrum of the flute sound and this might be the reason why only four out of ten top descriptors from both studies overlap. Interestingly enough, the results of this experiment suggest that the adjectives airy, full, open, and focused refer to the measure of perceived openness, dispersion of the flute sound. Nevertheless, whether they refer to the same acoustic features of audio samples used in both studies still needs to be determined.

Certainly these adjectives may have slightly different meanings for different listeners depending on their listening biographies (Margulis et al., 2009), their cultural back-ground and language variation. Indeed, a degree of ambiguity is inevitable in any study where language semantics are involved.

The objective of this experiment was to create a list of adjectives which a flutist use while communicating about the flute timbre. In the following experiment, this set of descriptors is used as a frame of reference for all participants who participated in the online survey.

4. Experiment 3

4.1. Introduction

Many people while engaging with music and musical activities often use language in order to communicate various musical concepts. The relationship between language and music has been extensively studied from linguistic, musicological, psychological, neurological and biological perspectives (Patel, 2010; Bernstein, 1976; Grey, 1977; Berkowitz, 2016; McAdams et al., 1995; Zbikowski, 2002; Roncaglia-Denissen et al., 2016; Ayotte et al., 2002). An extensive body of research suggests that the way we speak about an entity does not only reflect the metalingual function (i.e. establishment of mutual agreement on meaning of a particular word between interlocutors) of the language, but also the way we perceive this entity and act upon it (Lakoff, 1993; Jakobson, 1960). Hence, the words which we use to describe a music timbre might reflect our conceptual understanding of

(28)

it. In the literature, music timbre is defined as a multidimensional entity which serves as one of the primary perceptual vehicles for sound source recognition and contributes to the understanding of a sound’s perceptual attributes that can either continuously vary e.g. brightness, roughness or that are discrete or categorical e.g. set of “fixed” features, which distinguishes clarinet sound from flute sound (McAdams, 2013). Also, the study results of Barthet et al. (2010) suggests that changes in timbre together with changes in dynamics and tempo account for the expressiveness in music performances. Indeed, many studies, by examining linguistic choices, aimed to explore perceptual and cognitive properties of musical timbre Grey (1977); McAdams et al. (1995). Moreover, the work of Zacharakis et al. (2015) suggests that verbal descriptions of musical timbre can convey a considerable amount of perceptual information. Those studies and methods regarding the perception and cognition of musical timbre that combine language semantics and acoustic features have been partly addressed in the previous chapters i.e. 2 and 3. As previously commented, little is known about the semantic associations with a flute sound’s acoustic characteristics since, to my knowledge, no previous studies focusing on the perception of nuanced changes in flute tone quality has been found/observed. For these reasons, the current experiment will explore the correlations between the verbal description of the flute sounds and their acoustic features. It will address the following questions: do changes in a flute sound’s acoustic features influence the choice of the vocabulary which we use to describe it? If yes, is there a systematic underlying pattern? Does the level of expertise affect the choice of the words we use to describe a particular flute sound? Since the source of the sound is well known i.e. the flute, the study investigates possible underlying correlations between verbal descriptors and a set of a sound’s perceptual attributes (McAdams, 2013). The current experiment builds on the results obtained in previous experiments using the audio tracks and results of the computational analysis from Experiment 1 and the list of 28 descriptors from Experiment 2. It consists of two stages: an online survey with a listening test and a statistical analysis.

The online survey was made in order to reach broader and demographically varying

(29)

audience. Besides its numerous advantages (addressed in the following sections in more detail), the online survey design brought also some limitations. For instance, in order to minimize the dropout of potential respondents, the actual experiment should not last more than 15 minutes (Honing and Ladinig, 2008). As a consequence, neither multidimensional scaling (MDS) nor dissimilarity rating techniques could be used, as they require more time and involve a bigger/larger cognitive load to complete the task (Yorita, 2014). Similarly, neither a bipolar scale nor the Verbal Attribute Magnitude Estimation (VAME) technique were used, but participants were asked to choose, from the previously established list of descriptors, at least one word and a maximum of 4 words to describe a sound which they hear without giving it a rating. The second stage of the study, the statistical analysis investigated the possible underlying patterns and correlations between a flute sound’s acoustic features and data collected from the online survey. Section 4.2 describes the materials and methods used in the experiment, section 4.3 presents the results obtained from the collected data and the statistical analysis. Finally, section 4.4 discusses observed findings and proposes direction of the further research.

4.2. Materials and Methods

4.2.1. Online Survey

In order to reach a bigger and more demographically diverse number of participants, the online survey was launched using Qualtrics software. Other advantages of using online survey in music cognition studies are the minimization of the performance according to social desirability, the elimination of biases and Pygmalion effects and its substantial external validity (Honing and Ladinig, 2008). Beside the disadvantage mentioned in the Introduction (See 4.1), another limitation of the use of online surveys for this piece of research could be the lack of control over the reproduction setup by a researcher. Bad quality headphones or speakers might significantly affect responses of the participants. In order to minimize the responses affected by a low-quality reproduction setup, the following prevention steps were taken: a conversion of audio samples to MPEG4 - to

(30)

guarantee minimum loading time and an optimal sound quality for online survey ex-periments (Honing and Ladinig, 2008) - a question concerning the reproduction setup and written encouragement to use headphones. On the other hand, Honing and Ladinig (2008) argues that letting participants complete the experiment in an environment in which they normally listen to music, e.g. environmental noisiness, low-quality head-phones might positively influence the ecological validity of the results.

4.2.2. Stimuli

The selection of a small set of samples grouped into pairs was done in order to minimize a degraded feedback quality (Yorita, 2014; Yorita and Clements, 2015) and to prevent the participants from dropping out i.e. ensure a high completion rate of the online survey (Honing and Ladinig, 2008). The stimuli pairs were chosen by applying the same criteria as in Experiment 2 (See 3.2.2 Stimuli) i.e. 1) they had either high positive or negative value in a single factor 2) in other factors they had relatively low values 3) the notes with high positive and negative values in a single factor had the same pitch height. Additionally, the stimuli were equalized for pitch height.

As a consequence 19 pairs of flute sound were selected. Each pair was combined with two descriptors lists (one list per sound) from Experiment 2 (See 4) and a question about the perceived difference between two sounds e.g. How much of a difference do you hear between these two sounds? Respondents had to choose one descriptor per sound. Max-imum four descriptors per sound were allowed. One of the criticisms regarding the use of predefined vocabulary argues that the list of given descriptors does not always corre-spond to words that participants would choose (Zacharakis et al., 2014). For this reason, at the end of the actual test, participants could add or propose additional descriptors which in their opinion were missing on the given list.

4.2.3. Music Background Questionnaire

Respondents were provided with 28 questions concerning their musical background and familiarity with flute sound. Instead of a bipolar division between musicians

(31)

musicians, the continuous scale design of The Goldsmith Musical Sophistication Index (Gold-MSI) was used to assess participants’ level of musical expertise (M¨ullensiefen et al., 2014). The Gold-MSI is a self-report inventory and test battery which measures overt and covert engagement with music and musical skills of the individuals (M¨ullensiefen et al., 2014). The Gold-MSI questionnaire is independent from musical preferences for specific styles of music, it is multi-faceted and includes different aspects of musical sophistication e.g. it differentiates between active musical engagement, Self-reported Perceptual abil-ities, Self-reported Singing Abilabil-ities, Self-reported Emotional Engagement with Music (M¨ullensiefen et al., 2014). The maximum points to obtain were 182. The mean for all respondents was 110, for flutists 121 and for non-flutists 107.

4.2.4. Participants

There were no restrictions on the age, nationality or gender of the participants. The only requirement was a sufficient knowledge of English as it was the language used for the instructions. 55 participants (mean of age 32, 30 female) took part in the survey. 22 reported to be at university, 16 in full time employment, 7 self-employed, 3 retired, 4 in part time employment, 2 at school, 1 homemaker.

One of the objectives of this piece of research is to investigate whether there is a difference in the tonal quality perception between flutists and non-flutists. Based on the Goldsmith-MSI responses participants were divided into two groups flutists (11) and non-flutists (44). There is no significant difference regarding the musical sophistication of flutists and non-flutists. However, the exposure to flute sound and music varies substantially among two groups. The studies of Margulis et al. (2009) suggest that the human auditory system and mechanism of perceptual-neural plasticity are sensitive to long-term exposure and listening biography and elicit different neurological responses to the stimuli depending on the instrument of expertise (flutists vs violinist). In both cases, one could argue that long-term exposure to the particular kind of music and musical experience plays a crucial role, which has an influence on neural tuning and mechanism of perceptual-neural plasticity (Margulis et al., 2009).

(32)

For those reasons, an additional computational analysis was carried out in order to check for the effect of (non)familiarity with flute sound and flute repertoire. 21 non-flutists were reported to be familiar with flute repertoire and 23 were not. Nevertheless, since the analysis did not yield many significant differences, this path was not further pursued. However, it is also possible that more participants are needed in order to see significant effects on the influence of familiarity of non-flutists with flute sound.

4.2.5. Procedure

The test consisted of four parts: an informed consent form, which had to be read and signed by the participant prior to the experiment; three tryout examples in order to give time to the participants to adjust the volume of the audio and become familiar with the form of the experiment; the actual test; the music background questionnaire with 3 questions concerning demographic information about the respondents. The language of instruction was English. The actual test had 20 questions. 19 questions consisted of 1 pair of flute sounds, 2 descriptor lists (one per sound) and a question about the perceived difference between two sounds. The participants could choose minimally 1 and maximally 4 descriptors per sound. The question about the perceived difference had 3 possible answers: no difference, small difference and large difference. The 20th question gave all participants the opportunity to add their own adjectives, if they wished to do so. All participants were strongly encouraged to use headphones and they could listen to the sound examples as many times as they wanted. The experiment lasted approximately 30 minutes, but there was no time limit for the survey’s completion.

4.2.6. Statistical Analysis

Once all data were collected, they were analysed in an R environment by the packages lme4, car, tidyverse, jtools, emmeans, jtools and xtable. For each descriptor, thus 38 times, the multinomial logistic regression was conducted. It is a multiple regression which gives as outcome a categorical variable and predictor variables that are continuous and categorical (Field et al., 2012). Subsequently, for each output the Anova with Wald

(33)

Chi-squared Test was conducted. Finally, for the most significant findings the validation test was conducted using emmeans package and its emtrends function. It estimates and compares slopes of fitted lines given a fitted model (Lenth, 2019).

Also, additional Logistic Regressions were conducted in order to control for random effects. Overall, either there was no interaction between the variables or the interaction was very low and insignificant. For this reason, the output of the logistic regressions with random effect were not further explored.

4.3. Results

The results are reported in the form of odds ratios and its confidence interval (Table 8). Continuous variables are mean centered (e.g. Dispersion, Unsteadiness, Vibrato and Darkness factor). Moreover, the results are presented and reported as suggested in the book Discovering statistics using R by A. Field (2012).

The beta values and standard errors are included in Appendix C. Also, the general table of the frequency of every adjective can be found in Appendix D.

4.3.1. Factors and Vocabulary Dispersion

Flutists are more likely to use words airy, closed, edgy, intense vibrato, squeezed, unfocused, unsteady, wide vibrato, with vibrato and wobbly to describe a flute sound if the Dispersion factor has a high value.

The major positive effect is observed with words such as airy (b = 0.63, SE = 0.20) and wobbly (b = 0.71, SE = 0.27) for which the change in the odds increase 1.89 and 2.03 respectively. Anova reveals significant effects for both adjectives i.e. for airy ( ˜χ2 = 29.24, p < 0.001) and wobbly ( ˜χ2 = 8.72, p < 0.01). The estimated marginal means of linear trends shown in Table 5 and Figures 4a and 4d present the upward trend direction for the use of those descriptors with the Dispersion factor increasing: the increase for the word airy by about 0.045 per unit increase in Dispersion factor is expected while for the word wobbly it is 0.027 per unit. The pairwise comparison of estimated marginal means

(34)

of linear trends does not show significant differences between flutists and non-flutists for both adjectives.

As the Dispersion factor increases, it becomes less likely that they will use the words centered, clean, dark, flat and without vibrato to describe a flute sound.

The major negative effect is observed with the word dark : as the Dispersion factor increases, the odds of a flutist using the word dark (b = -1.35, SE = 0.26) are 0.26. The odds ratio for a flutist using the word dark when the Dispersion factor decreases are 3.846 times the odds for using it with Dispersion factor increasing. However, as the participants changes from flutist to non-flutist in combination with the Dispersion factor increasing, the change in the odds for non-flutists to use the word dark (b = 1.15, SE = 0.29) increases 3.15. In other words, non-flutists are more likely to use the word dark to describe a flute sound if the Dispersion factor increases.

The negative effect is also observed with the word centered : as the Dispersion factor increases, the odds of a flutist using word centered (b = -0.18, SE = 0.13) are 0.84. The estimated marginal means of linear trends shown in Table 5 and Figure 4b present the downward trend direction for the use of the word centered with the Dispersion factor increasing: the decrease by about -0.024 and -0.021 per unit increase in Dispersion factor is expected with flutists and non-flutists respectively. The Anova revealed a significant effect for flutists using the word centered with the Dispersion increasing ( ˜χ2 = 10.60, p < 0.01), but not for non-flutists. However, it did reveal significant effect in contrast between flutists and non-flutists using the word centered ( ˜χ2 = 15.40, p < 0.001) with the Dispersion factor increasing: the odds ratio for flutists using this descriptor are 0.51. On the other hand, the change in the odds ratio for the pairwise comparison of estimated marginal means of linear trends does not show significant differences between flutists and non-flutists (z.ratio = -0.138, p > 0.9). This can be due to the fact that both groups still follow the same downward pattern.

Vibrato

Flutists are more likely to use the words full, intense vibrato, open, rich, sharp, slow vibrato, steady, vivid, wide vibrato, with energy and with vibrato to describe a flute sound

(35)

if the Vibrato factor increases. The major positive effects are observed with words such as with vibrato (b = 1.11, SE = 0.20), wide vibrato (b = 0.99, SE = 0.24), intense vibrato (b = 1.38, SE = 0.21) where the change in the odds increases by 3.04, 2.68, 3.97, respectively.

For all three descriptors the Anova yielded significant effects for flutists using these words to describe a flute sound in combination with the Vibrato factor increasing: with vibrato ( ˜χ2 = 319.50, p < 0.001), wide vibrato ( ˜χ2 = 106.55, p < 0.001) , intense vibrato ( ˜χ2 = 185.54, p < 0.001). Also, the estimated marginal means of linear trends shown in Table 6 and Figure 4g present the upward trend direction for the use of these descriptors with the Vibrato factor increasing: with increase of about 0.047 for with vibrato, 0.0253 for wide vibrato and 0.077 for intense vibrato per unit increase in Vibrato factor.

As the Vibrato factor increases, they will less likely use the words closed, clean, cov-ered, flat, inactive, small, squeezed, unsteady, without energy, without vibrato and wobbly to describe a flute sound.

The major negative effects are observed with the words without vibrato (b = -1.51, SE = 0.23), 0.22. small (b = -1.21, SE = 0.27), inactive (b = -1.24, SE = 0.28), where the change in the odds ratio is 0.22, 0.30, 0.29, respectively. The Anova reveals significant effects for all three descriptors.

However, as the Vibrato Factor increases, the change in the odds for non-flutists to use the word inactive (b = 0.77, SE = 0.30) increases by 2.16. Thus, non-flutists are more likely to use the word inactive to describe a flute sound if the Vibrato factor increases.

A similar effect is observed with the word small (b = 0.84, SE = 0.28) where the change in the odds increases by 2.33, hence, non-flutists become more likely than flutists to use the word small to describe a flute sound.

Finally, in combination with the Vibrato factor increasing, the change in the odds for using the word wobbly (b = -0.53, SE = 0.26) by flutists compared to not using it is 0.59. As Participant changes from flutist to non-flutist in combination with the Vibrato factor increasing, the change in the odds for using the word wobbly compared to not using it is 3.20 (b = 1.16, SE = 0.27). The post hoc pairwise comparison of

(36)

estimated marginal means of linear trends showed significant differences between flutists and non-flutists in usage of descriptor wobbly with Vibrato factor increasing (z.ratio = -4.989, p <.0001). Estimated marginal means of linear trends shown in Table 6 and Figure 4h present opposite trend directions for flutists and non-flutists: downward and upward, respectively. It might suggest that non-flutists use this word to describe sound fluctuations, whereas flutists do not ascribe this word to changes in sounds’ fluctuations. In other words, non-flutists are more likely to use the word wobbly with positive and high values of Vibrato factor, unlike flutists who would rather use it to describe a flute sound with low and negative values of the Vibrato factor.

Unsteadiness

Flutists are more likely to use the words inactive and squeezed to describe a flute sound if the Unsteadiness factor increases.

As the Dispersion factor increases, they will less likely use the words open, with pre-dominance of high harmonics and with vibrato to describe a flute sound.

The major positive effect is observed with the word squeezed (b = 0.47, SE = 0.19) with a change in the odds of 1.61. However, Anova does not confirm its significance.

The major negative effect is observed with the words with vibrato (b = -0.33, SE = 0.14), where the change in the odds for flutists is 0.72. Anova revealed a significant effect of the use of with vibrato descriptor with the Unsteadiness factor by flutists ( ˜χ2 = 185.54, p < 0.001).

Darkness

As the Darkness Factor increases, flutists are more likely to describe a flute sound with the words dark, centered, covered, flat, focused, inactive, sharp, steady, with the predominance of the low harmonics, without vibrato and without energy. The major positive effect is observed with words such as dark (b = 1.30, SE = 0.26) and with the predominance of the low harmonics (b = 0.53, SE = 0.20), where the change in the odds ratios for flutists using dark and with the predominance of the low harmonics increases by 3.68 and 1.70, respectively.

Anova revealed significant results for both descriptors i.e. dark ( ˜χ2 = 101.02, p <

(37)

0.001) and with predominance of low harmonics ( ˜χ2 = 42.35, p < 0.001). The estimated marginal means of linear trends shown in Table 7 and Figure 5b and 5e present the up-ward trend direction for the use of both descriptors with the Darkness factor increasing: the use of the first descriptor by flutists increases by about 0.032 per unit increase in the Darkness factor while for the second factor, it is about 0.031 per unit.

Flutists are less likely to use the words airy, intense vibrato, metallic, piercing, small, squeezed, vivid, with energy and with vibrato to describe a flute sound if the Darkness Factor increases.

The major negative effect is observed with the words vivid and piercing i.e. as the Darkness Factor increases, the change in the odds for flutists using the word vivid (b = -0.80, SE = 0.27) and piercing (b= -1.12, SE = 0.34) compared to them not using it is 0.45 and 0.33 respectively. Anova revealed significant results for both descriptors i.e. vivid ( ˜χ2 = 115.07, p < 0.001) and piercing ( ˜χ2 = 180.22, p < 0.001). The estimated marginal means of the linear trends shown in Table 7 and Figures 5f and 5c present a downward trend direction for the use of both descriptors with the Darkness factor increasing: the use of the first descriptor by flutists decreases by about 0.025 per unit increase in the Darkness factor and for the second factor, it is about 0.024 per unit.

However, when participant changes from flutist to non-flutist, in combination with the Darkness factor increasing, the change in the odds for using the word sharp (b = -0.89, SE = 0.18) compared to not using it is 0.41. In other words, as the Darkness factor increases, non-flutists become less likely than flutists to use the word sharp to describe a flute sound. The change in the odds for flutists to use the word sharp (b = 0.07, SE = 0.16) increases by 1.07. The post hoc pairwise comparison of estimated marginal means of linear trends showed significant differences between flutists and non-flutists in the usage of the descriptor sharp with the Darkness factor increasing (z.ratio = 4.739, p < .0001). The estimated marginal means of linear trends shown in table 7 and figure 5d present opposite trend directions for flutists and non-flutists: slightly upward and downward, respectively. In other words, flutists are more likely to use the word sharp with positive and high values for the Darkness factor, while non-flutists would rather

(38)

use it to describe a flute sound with low and negative values of the Darkness factor. It might suggest that non-flutists would use the word sharp to describe a flute sound with a predominance of high harmonics, whereas flutists do not associate this word with the distribution of energy across the lower and upper frequencies of the flute sound.

Also, non-flutists are more likely to use the word closed (b = 0.52, SE = 0.27), as the Darkness Factor increases i.e. the change in the odds increases by 1.68 compared to not using it.

4.3.2. Different factors described by the same words Airy

The use of the word airy is significant for two factors: Dispersion and Darkness. On one hand, it is more likely to be used when the Dispersion factor increases, on the other hand, it becomes less likely as the Darkness factor is increasing. The post hoc estimation and comparison of slopes of fitted lines shown in Tables 5 and 7, and Figures 4a and 5a depicts these trends: the usage of airy goes about 0.045 per unit increase in the Dispersion factor, and decreases by about 0.015 per unit increase in the Darkness factor. A pairwise comparison of flutists and non-flutists for the Dispersion and the Darkness factors in combination with the word airy did not show any significant differences (p > 0.3 and p > 0.9 respectively) suggesting that both groups would follow the same trends in both situations. The results suggest that the flute sound should have a high values in the Dispersion factor, but a low one in the Darkness factor, in order to be described as airy.

Dark

The opposite is observed with the word dark. It is more likely to be used as the Darkness factor increases and less likely when the Dispersion factor increases. A post hoc estimation and comparison of the slopes of fitted lines shown in Tables 7 and 5 and Figures 5b and 4c depict a movement in an upward direction for the word dark with the Darkness factor increasing and a movement in a downward direction with the Dispersion factor. The pairwise comparisons of flute players and non-flutists for the Dispersion and

(39)

Darkness factors in combination with the word dark show significant differences for the Dispersion factor (p < 0.0004), suggesting that the odds for flutists using the descriptor dark with the Darkness factor increasing compared to non-flutists are 0.973. However, the flute players would still use the word more likely with the Dispersion factor decreasing rather than with its positive and high values. The results for the Darkness factor are not significant suggesting that both groups would follow the same trend.

In other words, the flute sound should have high values in the Darkness factor, but low in the Dispersion factor in order to be described as dark.

Small

The use of the word small is significant for two factors: Vibrato and Darkness. In both cases, the descriptor small is less likely to be used with Vibrato and Darkness values increasing. The post hoc estimation and comparison of the slopes of fitted lines shown in Tables 6 and 7 present the downward trend in both situations. Moreover, the pairwise comparisons of flutists and non-flutists for the Vibrato and Darkness factors in combination with the word small show significant differences: in both cases, the flutists are less likely to use small with the Vibrato (p < 0.03) and the Darkness (p < 0.02) factor increasing than non-flutists.

In other words, the flute sound will be more likely described as small with low values for Darkness and Vibrato factor(s).

Finally, the major differences in the word choice for a flute sound description are observed with piercing, with predominance of low harmonics and with vibrato. As par-ticipants change from flutist to non-flutist, the change in the odds of using the descriptor with predominance of low harmonics is 0.27. In other words, the odds of flutists using this descriptor compared to them not using it are 3.7 times the odds for non-flutists. For piercing and with vibrato the change in the odds - as participant changes from flutist to non-flutist - of using these descriptors are 3.66 and 2.63 respectively.

(40)

(a) Airy Dispersion (b) Dark Dispersion

(c) Dark Dispersion (d) Wobbly Dispersion

(e) Inactive Vibrato (f) Small Vibrato

(g) With vibrato Vibrato (h) Wobbly Vibrato

Figure 4: Graphics of Emtrends 40

(41)

(a) Airy Darkness (b) Dark Darkness (c) Piercing Darkness

(d) Sharp Darkness

(e) With Predominance of

low harmonics Darkness (f) Vivid Darkness

Figure 5: Graphics of Emtrends continuation

Referenties

GERELATEERDE DOCUMENTEN

Om vast te kunnen stellen of er inteelt plaatsvindt en of die abnormaal is voor de soort, is het van belang om niet alleen naar de genetische diversiteit van de Katwijkse

Except for geolocation services, all features were expected to have a positive effect on product adoption, and this can be accepted for face verification, fingerprint verification

Reference table public Contains all possible votes in the form RnpotVote i , will be published on the web before the election starts; its SHA-1 hash value will be published in

Using a scale that explicitly distinguishes thought suppression attempts from successful thought suppression in clinical research on risk factors for experiencing obsessive

 Ook hoogbegaafde kinderen geven veel problemen: faalangst, angst voor het nieuwe.. Skype, FT, youtube filmpjes, digitaal

In contrast to a fixed effect model, a random effects model assumes that studies have different underlying true effects.. The combined effect in a random analysis is an estimation

Reference Publication Year Spectroscopy (RPYS) has been developed for identifying the publications (here: cited references, CRs) with the greatest influence in a given