• No results found

A mutual information analysis of the relationship between critical review scores and album sales

N/A
N/A
Protected

Academic year: 2021

Share "A mutual information analysis of the relationship between critical review scores and album sales"

Copied!
29
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

A Mutual Information Analysis of the Relationship between

Critical Review Scores and Album Sales

by

Benjamin Rogerson

Bsc, University of Amsterdam, 2015

Bachelor Thesis

Bsc Economics and Business

University of Amsterdam

June 2015

(2)

Abstract

The aim of this paper was to determine the extent to which there exists a relationship between critical review scores and album sales. Because data availability on album sales was low, reliable chart information was utilised from Billboard. Three attributes of album chart behaviour were selected for the analysis section: peak chart position, length of time in the charts and gradient of album chart trajectory. A mutual information analysis then took place to assess the relation between these three variables and the numerical review scores of seven review sources. The mutual information calculations took place and were coded in the MATLAB Integrated Development Environment. The findings of this paper are that although all review sources indicated meaningful informational relations with album sales behaviour, Pitchfork displayed the outright strongest relation. Further parametric analysis with financial data is suggested to enhance the strength of this claim.

(3)

Table of Contents

List of Tables and Formulas iii

Section 1: Introduction to the Study 1

Section 2: Literature Review 3

2.1. Music Industry Research 3

2.2. Statistical Methods Research 5

Section 3: Methodology 7

3.1. Variables 7

3.1.a. Sales Data 7

3.1.b. Critic Review data 8

3.2. Data Collection 9

3.3. Statistical Modelling 11

3.3.a. Overview 11

3.3.b. Mutual information and related statistics 13

3.3.c. Coding 15

Section 4. Analysis 18

4.1. Results 18

4.2. Evaluation 19

4.2.a. Peak Chart Position 20

4.2.b. Length of time in charts 20

4.2.c. Gradient of album trajectory through the charts 21

Section 5: Conclusion 22

(4)

List of Formulas, Graphs and Tables

Formula 1. Mutual Information 13

Formula 2. Global Correlation Coefficient 14

Formula 3. Relative Mutual Information 15

Formula 4. Covariance 15

Graph 1. Cumulative Distribution of Length of Time in Charts 12

Table 1. Source Variables 10

Table 2. Chart Information 11

Table 3. Review Information 11

Table 4. Mutual Information Values 18

Table 5. Global Correlation Coefficient Values 18

Table 6. Relative Mutual Information Values 19

(5)

Section 1: Introduction to the Study

With the rise of the global information age, the music market has seen a dramatic transition. Global album sales were at a historic high after the introduction of the Compact Disk format had a positive effect on sales in the 1980s and 1990s, but global recorded music sales have seen a considerable decline since the turn of the millennium (Ingham, 2015). The introduction of new music market innovations and the significant impact of the Internet on consumer behaviour have considerably altered the way the modern music market functions. For example, in 2001, with the introduction of the iTunes store, Apple dominated the music market in the early part of the century (Apple Insider, 2013).

Since then, the introduction of streaming services, such as Spotify in 2011, was a further contentious development in the music market (Thomes, 2013, p. 81). At the same time as these legal innovations, a number of illegal downloading and file sharing formats have also had an important impact on the market since the Internet came to prominence (Waldfogel, 2010, p. 306). Beginning with the introduction of MP3 sharing platform Napster in 1999, illegal downloading is today considered a major threat to the financial success the music market worldwide (Hong, 2013, p. 297–324). Ultimately, the impact has generated a market that is rarely stable and frequently subjected to dramatic change. Consequently, consumer behaviour in the music market today is more unpredictable than ever.

Aside form the effects of these new innovations; the Internet has also greatly enhanced consumer access to information, though research on this topic largely absent from the literature body on the music market since 2000. In order to develop the understanding of the music consumer in today’s market in light of this, this paper will focus on the behaviour of the relationship between information and music sales. Specifically, the focus will be on the role of critics and their review data on Album sales. It is the hypothesis of this paper that critical reception has an impact on consumer behaviour in the music market, and is likely to be one variable, which, in part, can develop our

(6)

understanding of what drives consumer purchases. Consequently, the central question of this paper for research is to what extent can album sales behaviour possess a relationship with critical reviews.

The analysis will take place using statistical data, in the form of a mutual information analysis of the relationship between critical review scores and album sales. The integrated development environment used for this analysis will be the Matrix Laboratory (MATLAB) interface. All analysis subsequent to the data collection process will take place using this software.

The paper will continue with a literature review, after which, the methodology for the analysis section will then be discussed. Following this, the results of the mutual information analysis that took place using the MATLAB IDE will be presented. The results of this analysis will then be examined, and the paper will close with a summarising conclusion.

(7)

Section 2: Literature Review

2.1. Music Industry Research

Research literature concerning the behaviour of the music market is limited. In recent times, most research has dealt with assessing the rise in illegal downloading, which became prominent with the invention of Napster in 1999 (Hong, 2013, p. 297). Generally speaking, the research indicated a negative impact of illegal downloading and file sharing on global music sales. Moreover, certain research focused on the impact of positive externalities that arise with the information age on music sales (Peitz & Walbroek, 2005, p. 907; Hendricks & Sorensen, 2009, p. 324).

With regards to the effects of illegal file sharing, various studies indicate a negative relationship between illegal activity and global music sales. A study by Hong (2013, p. 321), gave a good indication of the effect of illegal MP3 downloads from Napster on music sales in the United States. He concluded that file sharing explained 20% of the decline in music sales since awareness of the Napster device became prominent in 1999. This is in line with the majority of studies exploring the relationship between illegal downloading and music sales.

Further research focused on newer means of file sharing on the Internet. Oberholzer-gee and Strumpf (2007, pp. 38-40) focused specifically on the relationship between Peer-2-Peer (P2P) file sharing networks and music sales. They concluded that the rise in awareness and efficiency of the P2P tool since 2007 is likely to coincide with a strong reduction in music sales in the future. This conclusion is in line with the findings of Zentner (2006, p. 87), who found P2P file sharing reduces the prospect of consumers purchasing an album by approximately 30%.

Aside from the widely accepted view that illegal file sharing has a negative impact on music sales, a number of researches also claim that the Internet has generated some positive externalities that can benefit the revenues of the

(8)

music industry. For example, Peitz and Walbroeck (2005, p. 907) argue that the rise in sampling improves buyer tastes by providing a more diverse market of music, which is more accommodating to specific consumer preferences. In addition, Hendricks and Sorenson (2009, pp. 365-368) focus on the link between consumer information and album sales. They focus on the impact of new album releases on the respective artists prior releases. They notice that a new release for an artist induces a positive impact on the sales of prior albums, concluding that information availability for consumers has a role to play in the skewness of music sales. This holds a close link with the analysis of this paper, which focuses on information availability by observing the impact of critical review data on album sales. Though these positive externalities are valid for developing the debate on modern music consumer behaviour, it is still widely held that the negative impact of online “theft” on sales outweighs the potential benefits of positive externalities (Hui & Png, 2003, pp. 19-20).

Before explaining how this research is specifically relevant for the direction of this paper, there are some preliminary conclusions that can be formulated based on the aforementioned literature body and the experience of this research paper. Ultimately, the quantity of research on the music market in recent times is limited. There are a few possible explanations one can suggest that also have important implications for the approach and results of this research paper. Firstly, there appears to be a low availability of accessible data concerning global music sales. Nielson Soundscan Corporation are the industry leader in this field. Their data utilises 39,000 global retail points monitored across 19 countries from every continent worldwide (Nielsen, n.d.). Their data is widely cited as the academic standard for music industry measurement. Though they release an annual report with the general aggregated information of global music sales, detailed information (i.e. sales broken down by album) is difficult to acquire without the costly purchase of access to Nielson music data. This limits the scope of research in this field and may be a possible reason for the limited literature in this topic.

(9)

Secondly, the modern music market is under constant and rapid change, driven mainly by the frequent introduction of new music-market innovations. The first of this kind was the rise in iTunes in 2001, which then saw a digital-music market domination by Apple in the subsequent years (Apple Insider, 2013). Subsequently, Apple has seen its market share drop since Spotify and other streaming services entered the frame in 2011. The implication of these frequent market innovations is that it is difficult to assess the music market over a stable time period before its conditions change drastically with the introduction of a new music market innovation. Thus, it can be assumed, music market analysis will continue to exhibit difficulties in the future with the limited availability of reliable music sales data and the sporadic nature of new technological innovations.

This research paper will develop music industry consumer analysis by focusing on the relationship between critical review information and Album sales in the digital era, specifically between 2000 and 2015. Though illegal downloading will not enter into the analysis of this paper, it does have an impact on Album sales and consumer behaviour which may be important for comprehending the results of this paper. Furthermore, the role of positive externalities on music sales is also important, most significantly for this paper the role of information and its relationship with music sales. It must also be stressed that the scope of the analysis is also limited by the availability of music sales data, though the method to overcome this issue will be elaborated in the methodology section of this paper.

2.2. Statistical Methods Research

The statistical analysis in this paper will follow the form of a mutual information analysis. There are several strengths to this kind of analysis, which make it superior to other common procedures, such as the linear regression model. The literature surrounding mutual information will be discussed in brief now.

(10)

Mutual information is a statistical concept that measures the relationship between two variables simultaneously (Learned-Miller, 2013, p. 4). The metric measures how much information can be communicated, using the knowledge of one variable, regarding another. There are two main advantages of using mutual information that are widely acclaimed by researchers. Firstly, the measure does not assume any underlying theoretical probability distribution (Dionisia, 2006, p. 4). This simplification, which occurs in other models, can unfairly reduce the measure of information contained within one variable regarding another. This implies that the level of information measured between variables is often underestimated.

Secondly, mutual information is able to measure both linear and non-linear dependencies (Dionisia, 2006, p. 1). This again improves the accuracy and scope to which the informational content between two variables can be measured.

Dionisia, Menezes and Mendes (2003, pp. 1-31) assessed the strengths of using mutual information analyses in economic research. They concluded that the mutual information and global correlation coefficient are very efficient measures for testing and evaluating relationships, because of the two strengths that have already been discussed. The analysis of this paper is closely related to the financial mutual information analyses presented by these authors over the past twenty years.

The method for this paper, therefore, will take the form of a mutual information analysis. The absence of an underlying probability distribution and its ability to account for non-linear dependencies are the reason for its selection. It is the belief of this paper that linear relations are insufficient in analysing albums sales behaviour through the charts. The formulas and further information regarding mutual information are presented in the statistical analysis section of the methodology.

(11)

Section 3: Methodology

The methodology section will consist of three categories. Firstly, the variables used in the analysis section of this paper will be defined and the reason for their selection will be developed. Secondly, there will be a description of the data collection process. Lastly, the statistical method used for modelling the data using MATLAB programming software will be outlined.

3.1. Variables 3.1.a. Sales Data

The connection of this paper to economics is its focus on sales behaviour in the music market. More precisely, the focus will be on music album sales. As was discussed in the literature review of this paper, there is a low availability of music sales data broken down by artist and album. Furthermore, sales data is rarely broken down into time intervals at less than an annual basis. This paper will use the data of Nielson Soundscan, widely considered the industry leader in music sales analysis, to form the dataset for the analysis.

Nielson Soundscan is an information and sales tracking system which compiles data from more than 39,000 retail outlets globally (Nielsen, n.d.). Though the actual data of the system is exclusively available to subscribers and music industry professionals, Billboard use the Soundscan data to generate the rankings for their ‘Top Album Sales Chart’ on a weekly basis. As this data is informed by a reliable source and freely available, this chart data will be used to reflect album sales in the analysis section of this paper. Chart ranking information, from 1-100, will be used to indicate sales information on a weekly basis over a fifteen-year period, from 2000 to 2015.

Though this implies a reliably informed data set can be acquired, there are a number of limitations that arise when using chart information rather than direct sales information. Most significantly, chart data is non-parametric. This reduces the sensitivity of the analysis to the finer details that would be reflected when using direct sales information. Rankings still reflect sales data

(12)

in this case, but only at a relative level. This can also be seen as a benefit on the other hand. Notably, chart data increases the simplicity needed to generate a model; large disparities in sales values between chart points are absent from the analysis.

Additionally, ranking data is limited to the top 100 sellers on the Billboard platform. Thus the findings of our analysis can only be applied to the behaviour of top album sales, and cannot be generalised to the entirety of the music market.

3.1.b Critic Review Data

Critical reviews across a variety of mediums often exist of an analytical text combined with a numerical score, which reflects their analysis of the album. The existence of this numerical value, often weighted from 1-5, 1-10 or 1–100, make comparisons across review sources relatively simple. For the contents of this paper, the numerical score will be used as an indicator for the level of critical acclaim associated with each album.

Critical scores were extracted from a variety of sources. The selection criteria for these mostly consisted of the availability of source data and the reputation of the source. For this analysis, only reputable sources and those with a high number of available review scores of chart albums (greater than 200) were selected for analysis. In addition, several review sources were selected to extend the scope of the analysis. The selected sources are as follows:

• Independent • Independent on Sunday • NME • Observer • Pitchfork • Rolling Stone • Telegraph

(13)

There are some possible drawbacks in the use of these review sites. Firstly, the agenda and context of the source has an impact on the number of reviews and the types of albums reviewed in each analysis. Pitchfork for example, focuses on alternative and non-mainstream music and thus its abundance of reviews for chart albums is limited. The Telegraph on the other hand is more mainstream, thus it can be expected that its reviews cover more of those albums that enter the mainstream through the charts. This factor may have important implications on the results, as the sample of review scores applying specifically to chart albums varies across the sources.

Secondly, numerical review scores are weighted differently across the sources. At the independent for example, they are scored from 1-5 (integers), whereas at pitchfork reviews exist from 1.0-10.0 (one decimal place). Thus the sensitivity of these scores differs significantly across sites, which may have an impact on the strength of its relation with chart ranking data.

3.2. Data Collection

The process of data collection of chart data and review scores occurred in two stages: stage one was the extraction of data; stage two, the data clean up.

First of all, an Application Program Interface (API) was created for each website in order to scrape the required data across multiple web pages. The API was generated using the import.io bulk data extraction tool, which enables users to generate structured tables out of complex website information using a list of identically structured hyperlinks. This automated the lengthy process of extracting the appropriate data from each site, resulting in the following variables:

(14)

Table 1. Source Variables.

Source Variable 1 Variable 2 Variable 3 Variable 4

Billboard Date of Chart Album Artist Ranking

Review Websites Date of Review Album Artist Review Score

After having the data extracted using the API across the appropriate websites, the data was imported into excel as .csv files. The data then underwent significant clean up whereby the album and artist information across the sites were reconciled into an identical format, each album given a unique key for analysis. The scores of each website were translated from their respective spectrums to a common 100 point grading system.

Subsequent to this, those pieces of music that were reflected in the charts but did not constitute new album releases were removed. This meant removal of the following items of data:

• Compilations

• Greatest hits collections • Reissues

• Soundtracks

As the analysis section will only focus on the relationship between critical review scores and chart information, those albums which were un-reviewed by any source were removed from the data set. The remaining data set therefore consisted only of new album releases between 2000 and 2015 subject to a review by at least one of the sites. The overview of the data set utilised in the analysis section is summarised in the following tables (Mean Score, SD, Skew and Kurtosis statistics were calculated in the MATLAB IDE).

(15)

Table 2. Chart information

Variable Count

Chart Dates 746

Artists 1165

Albums 2200

Table 3. Review Information

Source Album Reviews Mean Score SD Skew Kurtosis

Ind. 310 68.0 16.9 -0.500 3.16 Ind. Sun. 241 65.4 17.8 -0.076 2.57 NME 350 69.8 13.9 -0.876 3.57 Observer 211 62.8 15.8 -0.362 2.88 Pitchfork 311 64.6 18.5 -0.863 3.35 Rolling S. 248 69.1 9.6 -0.469 4.72 Telegraph 328 75.5 16.9 -0.330 2.73 3.3. Statistical Modelling

The statistical modelling section consists of three sections. Firstly, an overview of the calculation process will be presented. This will be followed by a section elaborating on the role of the mutual information statistic in the analysis. Lastly, the coding process used in the MATLAB IDE is described.

3.3.a. Overview

The mutual information statistics for the data set were calculated in the MATLAB Integrated Development Environment. A description of mutual information and the reason for its selection in this analysis are underlined in the following section (3.1.b.).

(16)

The mutual information values were calculated for the following characteristics of album chart behaviour:

• Peak chart position

• Length of time contained within chart

• Gradient of album trajectory through the chart

In order to calculate the gradient of album trajectory through the chart, an approximate distribution was required to formulate the minimum mean square error cost function. A cumulative distribution of the length of time in the charts was used to establish the best approximation for the album chart trajectory. After examination, a linear decay was selected as the appropriate model for the analysis. The results of this are displayed in the following graph.

Graph 1. Cumulative Distribution of Length of Time in Charts for All Albums

The mutual information of the above three chart characteristics, with the numerical review scores, was then calculated. In addition to this, the mutual information between each of the three chart characteristics was also calculated.

(17)

3.3.b Mutual Information and related statistics

Mutual Information

The mutual information of two discrete random variables is described in the following equation:

Formula 1. Mutual Information

𝐼𝐼(𝑋𝑋; 𝑌𝑌) = � � 𝑝𝑝(𝑥𝑥, 𝑦𝑦)log (𝑝𝑝(𝑥𝑥)𝑝𝑝(𝑦𝑦))𝑝𝑝(𝑥𝑥, 𝑦𝑦)

𝑥𝑥 ∈𝑋𝑋 𝑦𝑦 ∈𝑌𝑌

(Dionisia et al., 2003, p. 4)

Where p(x,y) is the joint probability of the random variables x and y, while p(x) and p(y) are marginal probability functions. (Note that it is defined here as the sum for all possible x and y in the distributions.)

The pairs of random variables in this research include several measures of the albums performance, as well as their ratings on several review sites. It is a measure of the amount of uncertainty one has about one variable, given knowledge of the other variable (Dionisia et al., 2003, p. 3). This ranges from a minimum of zero, i.e. no reduction in uncertainty, through to positive infinity (2003, p. 5). It is a symmetric measure, such that I(X;Y) = I(Y;X). A value of zero therefore implies the variables are independent.

The advantage of using mutual information as the measure of the relationship between the variables, rather than other statistical measures, is that it doesn't assume that the relationship between the variables is linear, and weights each data point based on how much it reveals about the underlying uncertainty in the data (Dionisia et al., 2006, p. 1).

(18)

This is perhaps clearest when considered by example. For a given site it might be the case that a 7/10 review was very common, whilst a 2/10 was very uncommon. Consequently, each data point for 7/10 reviews tells us less about another variable on average, because they come up so frequently that their behaviour is well characterised. By contrast, a 2/10 review informs us about an uncommon situation, and consequently carries more information per data point.

There are few disadvantages to using mutual information as a measure of the relationship between two variables, though the accuracy of the measure is low when the structure of the distribution of the random variables is not well characterised. As such, having more data points for a given distribution only changes the calculated mutual information by increasing the accuracy of the estimation of the measure.

Global Correlation Coefficient

In order to simplify the direct comparisons between the mutual information values, a standardised measure for mutual information was calculated. This value is known as the global correlation coefficient, it takes form in the following formula:

Formula 2. Global Correlation Coefficient (Standardised Mutual Information)

𝜆𝜆(𝑋𝑋; 𝑌𝑌) = �1 − 𝑒𝑒−2∗𝐼𝐼(𝑋𝑋,𝑌𝑌)

(Dionisia et al., 2003, p. 5)

This value is weighted between 0 and 1; and is thus comparable with the linear correlation coefficient r. This portrays overall dependence both linear and non-linear between X and Y.

(19)

Relative mutual information

The relative mutual information (RMI) value was also calculated, this is somewhat equivalent to the determination coefficient (R2) in linear analyses, and can be used to measure the percentage of variation in one variable that is explained by the other. (H(Y) is defined as the total entropy of variable Y; its calculation is automated in MATLAB.)

Formula 3. Relative Mutual Information

𝑅𝑅𝑒𝑒𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑒𝑒 𝑀𝑀𝑀𝑀𝑅𝑅𝑀𝑀𝑅𝑅𝑅𝑅 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑅𝑅𝑅𝑅𝑅𝑅𝐼𝐼𝐼𝐼 = 𝐼𝐼(𝑋𝑋; 𝑌𝑌)𝐻𝐻(𝑌𝑌)

(Dionisia et al., n.d., p. 27)

Covariance

In order to indicate the direction of the relationship, a covariance calculation was also generated using the following formula:

Formula 4. Covariance

𝐶𝐶𝐼𝐼𝑅𝑅𝑅𝑅𝐼𝐼(𝑋𝑋, 𝑌𝑌) = ∑(𝑋𝑋 − 𝑋𝑋�)(𝑌𝑌 − 𝑌𝑌�)𝐼𝐼 − 1

(Keller, 2012, p. 127)

3.3.c. Coding

The data was imported into MATLAB’s Integrated Development Environment using the native excel file importer. Each set of data was expressed as part of an array. Artist and album title were stored as strings in cell arrays, while release dates, chart positions and ratings were stored in vectors in double floating point precision. Dates were indexed to integer values to facilitate later

(20)

code, such that the date of the first chart is index 1, the next 8, and so on through to the present (henceforth referred to as end, following MATLAB conventions).

For each album, their performance on the charts were given as a vector from the first to the last date index (1:end). Where an album was not on the charts at a given time point, that time point was recorded as a NaN value, which allows MATLAB to treat it as an empty space in the array. Once the albums performance has been logged in this manner, one can calculate the peak chart position as the maximum value in the vector and the time in the charts as the number of non-NaN values in the vector, both of which can be calculated using built in functionality and logical indexing:

• peakInChart = max(isNaN(albumChartIndex) = 0); • timeInChart = length(isNaN(albumChartIndex) = 0);

The gradient of the albums’ trajectories through the charts, approximated as a linear decay from an initial peak, was calculated using the fitlm function, which computes a data structure corresponding to a linear fit of the data, calculated using a minimum mean square error cost function. This cost function minimises the squared difference between the actual data value and that predicted by the linear fit, and was chosen for its low computational processing cost. The function automatically ignores all NaN values within the data. For a given album, the code is approximately as follows:

• linearFit = fitlm(albumChartIndex, 1:end);

• albumGradient(itx1) = linearFit.Coefficients{2,1}; • albumLength(itx1) = linearFit.NumObservations;

For each variable, cumulative distribution functions were calculated using MATLAB’s cdf function. The properties of these distributions were then calculated using additional MATLAB’s mean, sdev, skewness and kurtosis functions.

(21)

The mutual information was then calculated using the open access Information Toolbox created by Mo Chen (2010, 2012), developed for MATLAB’s IDE. The mutual information calculation in this toolbox calculates the mutual information of two numerical arrays. NaN values are not accepted and consequently these are removed by logical indexing, like so:

• mutualInfo = mutualInformation(pX(isNaN(pX) = 0), pY(isNaN(pY) = 0))

Chen’s implementation capitalises on the benefits of linearised coding in MATLAB’s compiler. The distributions are re-expressed as cumulative distributions across the range of their discrete entries. A sparse matrix is then generated for the joint probability of the two variables, and this is used to calculate the mutual information as per the equation above.

Chen’s Entropy function was then used to calculate the entropy for each distribution, and subtracting this from the mutual information gives the conditional mutual information between the two random variables, given another variable (Chen, 2012).

(22)

Section 4: Analysis

4.1. Results 4.1.a. Mutual Information

The values of the mutual information statistics were calculated in the log base of 2; and are thus represented as bits. The table of mutual information values calculated are as follows:

Table 4. Mutual Information Values (Bits)

MI Attributes Review Scores

Attrib. Peak Length Grad Ind Ind S. NME Obs P4k RS Tel Peak X 1.336 1.589 0.111 0.152 0.158 0.129 0.497 0.115 0.107 Length X X 2.433 0.102 0.108 0.109 0.079 0.375 0.092 0.069 Gradient X X X 0.113 0.122 0.127 0.101 0.424 0.094 0.074

4.1.b. Global Correlation Coefficient

In order to make direct comparisons between the values, a standardised measure for mutual information was calculated. This value is known as the global correlation coefficient. This portrays the strength of overall dependence both linear and non-linear between X and Y. The values are displayed in the following table:

Table 5. Global Correlation Coefficients

λ Attributes Review Scores

Attrib. Peak Length Grad Ind Ind S. NME Obs P4k RS Tel Peak X 0.965 0.979 0.447 0.512 0.520 0.476 0.794 0.453 0.439 Length X X 0.996 0.430 0.441 0.443 0.383 0.726 0.409 0.358 Gradient X X X 0.450 0.466 0.474 0.427 0.756 0.415 0.372

(23)

4.1.c. Relative Mutual Information

The proportion of variation in one variable explained by the other is the relative mutual information. The values are in the following table:

Table 6. Relative Mutual Information

RMI Attributes Review Scores

Attrib. Peak Length Grad Ind Ind S. NME Obs P4k RS Tel Peak X 0.233 0.404 0.023 0.035 0.033 0.025 0.180 0.024 0.022 Length X X 0.662 0.024 0.028 0.027 0.019 0.156 0.023 0.018 Gradient X X X 0.043 0.050 0.126 0.023 0.165 0.033 0.013

4.1.d. Covariance

To indicate the direction of the relationship in the evaluation, the covariance for the data set was also calculated:

Table 7. Covariance

Covar Attributes Review Scores

Attrib. Peak Length Grad Ind Ind S. NME Obs P4k RS Tel Peak X 203.50 1.90 -9.10 13.30 -31.40 15.50 12.50 8.70 -0.55 Length X X 2.76 -6.04 23.76 -21.56 10.01 -22.9 5.25 32.5 Gradient X X X -0.01 0.40 -0.08 0.36 0.21 0.02 0.18

4.2. Evaluation

The evaluation will take place by assessing the attribute variables one at a time in the following order: Peak chart position; Length of time in the charts and Gradient of chart trajectory.

(24)

4.2.a. Peak Chart Position

The peak chart position displayed positive mutual information values for all review sites. The highest informational value was pitchfork with 0.4974 bits. To compare these we use the weighted mutual information (λ), which accounts for the size of each sample, and scales the mutual information strength between 0 and 1.

Pitchfork was the outright leader in mutual information for peak chart position among the review sites. It exhibited a strong level of 80%, compared to approximately 50% for the other sources. 50% would still be considered a considerable MI level for each source. This implies that numerical review scores contain a relatively high level of information concerning the peak position for album releases.

The relative mutual information value indicated the explanatory power, by dividing the mutual information value by the individual entropy for Peak chart position. It is important to note that this review score is only loosely related to weighted mutual information. While all other review scores possessed a value for λ of approximately 3%, 18% of peak chart position information was explained by Pitchfork.

Furthermore, the gradient and length attributes showed strong weighted mutual information with peak chart positions, at around 97%. This implies that the individual characteristics possess a large amount of information regarding peak chart position. The explanatory power of gradient and length towards peak position were also considerable, at 40% and 23% respectively.

4.3.a Length of Time in Charts

In general the mutual information values for the length of time in the charts were less for all review sites. These were the same sample sizes of course,

(25)

Pitchfork was again the highest predictor, with a λ value of 73%. The other review sites displayed results approximately equal to 43%, with the exception of the Observer and Telegraph, at around 36%. Thus the results imply a weaker, but still sizeable, level of information concerning the length of time in charts compared to the peak position.

The RMI again showed Pitchfork as the greatest explanatory variable, at around 15%. Compared to much lower values for the other indicators around 2.5%.

The length and gradient displayed high a mutual information value of 2.43 bits. In addition, the λ (99.6%) and RMI (66.2%) between these two were also very high. It is thus logical to conclude that the length of time spent in the charts is highly related to the gradient of the trajectory through the charts.

4.3.c. Gradient of album trajectory through the charts

The MI for the gradient with the review scores in general fell between the MI values for the length and peak. Pitchfork was again the outright informational leader with λ of 75%. The other sites displayed a lambda value at around 45%, with the exception of the observer, which was at 37%.

Pitchforks explanatory power was at 16.5% for the gradient of trajectory through the charts. For all others, the values were situated at approximately 5%.

(26)

Section 5: Conclusion

The aim of this paper was to determine the extent to which there exists a relationship between critical review scores and album sales. Because data availability on album sales was low, reliable chart information was utilised from Billboard. Three attributes of album chart behaviour were selected for the analysis section: Peak chart position, length of time in the charts and gradient of album chart trajectory. A mutual information analysis then took place to assess the relation between these three variables and the numerical review scores of seven review sources.

Numerical review scores were found to contain most information regarding the peak position. The gradient was the second strongest variable that could be explained by review scores, and the length was last. It is important to note that this does not reflect causality and is simply a measure of the extent to which one variable contains information regarding the other. In addition, the difference between the three chart characteristics was relatively low, approximately 5% for the global correlation coefficient.

A major finding of this paper is that the Pitchfork online review medium displayed very strong mutual information values with all three album chart attributes. Though the other sources also contained information, the values were not as significant as those for the Pitchfork scores. It can thus be concluded that a relationship exists between review scores and album sales, with Pitchfork as the best explanatory source for chart behaviour.

One major weakness of this paper is that the sales data used was non-parametric. One area that could develop the findings of this paper would be to analyse the relation between the Pitchfork scores and the direct sales data in units for those albums reviewed. This could help to understand the finer details behind the relationship between album sales and numerical review scores.

(27)

In addition to this, the mutual information analysis did not contain any form of statistical testing. Though there are methods of statistical testing that can be used on mutual information, the programming capabilities required to create the distribution are beyond the time frame of this project. For example, Dionisia et al. (2006, p. 4) suggested a dependence test using the global correlation coefficient. In order to perform this however, complex programming is required to simulate a model distribution for the data. This area could be extended in subsequent analyses of this type.

The scope of analysis on review scores could also be broken down into further categories, but again this depends upon data availability. Some areas of suggestion are: break down by music format (digital, Vinyl, compact disk i.e.); relation between reviews, sales; illegal downloading, individual market analysis (i.e. cross-country comparisons).

(28)

Section 6: Bibliography

Chen, M. (2012). In Information Theory Toolbox. MATLAB Central. Retrieved from http://de.mathworks.com/matlabcentral/fileexchange/35625-information-theory-toolbox.

Dionisia, A., Menezes, R. and Mendes, D. A. (2003). Mutual Information: a dependence measure for nonlinear time series, Manuscript first draft, 1-36.

Dionisia, A., Menezes, R. and Mendes, D. A. (2003). Entoropy-Based Independence Test, 1-36.

Dionisia, A., Menezes, R. and Mendes, D. A. (n.d.). Informaiton Theory: A Key to Analyse Statistical Dependences in Financial Time Series, Financas, 1-8.

Global digital music market. (2013). In Apple Insider. Retrieved from http://appleinsider.com/articles/13/06/20/apples-itunes-accounts-for-75-of-global-digital-music-market-worth-69b-a-year.

Hendricks, K. and Sorensen 
, A . (2009). Inform a

Music Sales 
, Journal of Political Economy, 117(2), 324-369.

Hng, K. L. and Png, I. (2003). Piracy and the legitimate demand for recorded music. Contributions to Economic Analysis & Policy, 2(1), 1-21.

Hong, S. H. (2013). Measuring the Effect of Napster on Recorderd Music Sales: Difference-In-Difference Estimates Under Compositional Changes, Journal of Applied Econometrics, 28, 297-324.

Ingham, T. (2015). In Music Business Worldwide. Retrieved from

http://www.musicbusinessworldwide.com/global-record-industry-income-drops-below-15bn-for-first-time-in-history/.

Keller, G. (2011). Managerial Statistics. International Edition, Cengage Learning, chapter 4, 97-158.

Learned-Miller, E. G. (2013). Entropy and Mutual Information. Department of Computer Science, 1-4.

Music Sales Measurement. (2012). In Nielsen Soundscan Retrieved from http://www.nielsen.com/us/en/solutions/measurement/music-sales-measurement.html.

Oberholzer‐Gee, F. and Strump 
, K . (2007). T he E ffe

Record Sales: An Empirical Analysis. 
 Journal of Political Economy, 115(1), 1-42.

(29)

free downloading — The role of sampling, International Journal of Industrial Organisation, 24, 907-913.

Thomes, T. P. (2013). An economic analysis of online streaming music services, Information economics and policy, 25, 81-91.

Waldfogel, J. (2010). Music file sharing and sales displacement in the iTunes era. Information Economics and Policy, 22, 306-314.

Zentner, A. (2006). Measuring the Effect of File Sharing on Music Purchases 
, Journal of Law and Economics, 49(1) 
, 63-90.

Referenties

GERELATEERDE DOCUMENTEN

The cost price for special products is based on standard products, increased with a percentage because of the higher material costs, engineering costs, or

Op zich is gezien het procesverloop van infectie en sporulatie een sturing op een vaste drempelwaarde voor relatieve luchtvochtigheid, VPD of meer direct op bladnat niet persé

Met alleen aandacht in de planvorming voor het canonieke (kenmerken en verhalen die wel algemeen erkend worden) wordt volgens hem een belangrijk deel van de identiteit van

It can thus be concluded that no workable constant year on year share portfolio selection framework for the mining sector in South Africa could be developed with

Blijkens de jurisprudentie had de HR een subjectief (oogmerk om voordeel te behalen) en objectief (verwachting dat het voordeel redelijkerwijs kan worden behaald) element

Patients in the obese and pre-diabetic groups fitted into parameters commonly seen in obese with insulin resistance individuals, namely an increased BMI exceeding 30

Developing a new application of genetic algorithms which can be used in inventory management 11Huiskonen2001InventorySpare partsCategorize into 4 different quadrants (graph)Value

DO-teacher = Dominant-Opposite teacher, DC-teacher = Dominant-Cooperative teacher, WISE = Wellbeing Inventory Secondary Education, SSWBS = Self Reported Socio- Emotional