• No results found

” “T HE EFFECT OF VOLUME , VALENCE AND DISPERSION ON THE USEFULNESS OF AN ONLINE CONSUMER RATING AMONG SEARCH AND EXPERIENCE GOODS

N/A
N/A
Protected

Academic year: 2021

Share "” “T HE EFFECT OF VOLUME , VALENCE AND DISPERSION ON THE USEFULNESS OF AN ONLINE CONSUMER RATING AMONG SEARCH AND EXPERIENCE GOODS"

Copied!
73
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

“T

HE EFFECT OF VOLUME

,

VALENCE AND DISPERSION

ON THE USEFULNESS OF AN ONLINE CONSUMER

RATING AMONG SEARCH AND EXPERIENCE GOODS

N

IEK

A

GEMA

(2)

“T

HE EFFECT OF VOLUME

,

VALENCE AND DISPERSION

ON THE USEFULNESS OF AN ONLINE CONSUMER

RATING AMONG SEARCH AND EXPERIENCE PRODUCTS

R.A.N. (Niek) Agema University of Groningen Faculty of Economics and Business

(3)

Management Summary

This study examines the effects of three main dimensions of online consumer ratings (volume, valence, and dispersion) on the usefulness of an online consumer rating. Besides, the degree of information seeking behavior prior to a purchase decision for both experience goods and search goods is researched. To capture the relative importance of each dimension on the usefulness of an online consumer rating, a Choice Based Conjoint Analysis is performed, wherefore an online questionnaire is executed (N=262).

This study investigates that volume and valence are the most important dimensions in determining the usefulness of an online consumer rating. The effect of volume is pretty straightforward, while the effect of valence is somewhat ambivalent. It is found that the tendency of volume is positive, since the perceived usefulness of online consumer ratings increases with the number of ratings. Though, the marginal effect of an extra rating is larger for less rated items.

In general, the valence of ratings is preferred to be positive rather than neutral/average or negative. However, a substantial part of the participants derives usefulness from a negative rating as well, although this usefulness probably leads to discouraging of the product.

Finally, this study does not detect differences in the usefulness of ratings with respect to search goods and experience goods. Moreover, contrary to expectations is found that information seeking behavior is more desired for search goods than for experience goods.

(4)

Table of Contents

1. Introduction... 6

2. Theoretical Framework ... 8

2.1. Word of Mouth ... 8

2.1.1. User-generated information ... 8

2.1.2. Online Word of Mouth ... 9

2.2. Online Consumer Ratings ... 9

2.2.1. Volume of OCR ... 11 2.2.2. Valence of OCR ... 12 2.2.3. Dispersion of OCR ... 13 2.3. Product Characteristics ... 14 2.4. Conceptual Framework ... 17 3. Methodology ... 18 3.1. Data Collection ... 18 3.1.1. Stimuli ... 18

3.1.2. Choice task design ... 20

3.2. Choice Based Conjoint Analysis ... 21

3.2.1. Model Specification and Estimation ... 22

3.2.2. Model Fit ... 22

3.2.3. Covariates Related to Class Membership ... 23

3.2.4. Validation ... 23

3.3. Independent Samples T-test ... 23

4. Results ... 25

4.1. Descriptive Statistics... 25

4.2. Choice Based Conjoint Analysis for Experience Goods ... 26

4.2.1. Preference Function ... 26

4.2.2. Aggregated Model ... 27

4.2.3. Model Selection for Segment Level Interpretation ... 28

4.2.4. Interpretation on Segment Level ... 29

4.2.5. Class Characteristics ... 31

4.2.6. Validity ... 33

(5)

4.3.1. Preference Function ... 33

4.3.2. Aggregated Model ... 34

4.3.3. Model Selection for Segment Level Interpretation ... 35

4.3.4. Interpretation on Segment Level ... 36

4.3.5. Class Characteristics ... 38

4.3.6. Validity ... 40

4.4. Comparison of Analyses on Aggregate Level ... 41

4.5. Independent Samples T-test ... 42

4.6. Hypotheses Testing ... 44

5. Discussion, Implications and Limitations ... 45

5.1. Discussion ... 45

5.1.1. Volume ... 45

5.1.2. Valence ... 45

5.1.3. Dispersion ... 46

5.1.4. Product Characteristics ... 46

5.1.5. Information Seeking Behavior ... 46

5.2. Implications ... 47

5.3. Limitations and Further Research Opportunities ... 48

6. References ... 50

7. Appendices ... 55

7.1. Appendix 1 – Questionnaire ... 55

(6)

6

1.

Introduction

Electronic Word-of-Mouth, provided by online consumer reviews or ratings, is available on the Internet for a large number of product categories. Approximately 68 percent of online shoppers read at least four reviews before making a purchase (Godes & Silva, 2012). For both online and offline purchases, consumers are frequently accessing consumer reviews or ratings before buying products and services (Chen & Xie, 2008). As such, online consumer reviews and ratings have become important as an independent and non-negligible resource of information to consumers. Although reviews and ratings are often used in the same context, these concepts are different: A review provides a qualitative assessment of one’s product experience, while a rating is a rather quantitative evaluation (Sridhar & Srinisivan, 2012). Often, web stores offer consumers the opportunity to evaluate products both in the form of a numerical star rating (usually ranging from 1 to 5 stars) and via open-ended comments (Mudambi & Schuff, 2010). This study focuses on the usefulness of ratings, since ratings are more suitable for aggregation and hence for implementation in a Choice Based Conjoint Analysis, which will be executed in this study.

The relevance of Word-of-Mouth (WOM) communication is widely confirmed in marketing research. A consumer’s purchase decisions can be influenced by opinions of others (Chen, Wang & Xie, 2010). However, the traditional WOM phenomenon has been transformed into various types of electronic Word-of-Mouth communication, due to the technological developments over the past decade. With the Internet’s growing popularity, online consumer ratings have become an important resource for consumers that are seeking to discover product quality (Zhu & Zhang, 2010).

Online consumer ratings (OCR) have been an extensively addressed topic in recent literature. Whereas there is a fair amount of agreement regarding the impact of OCR, the literature is far less clear with respect to the dimensions of which these ratings consist. Certain papers, like Chevalier & Mayzlin (2006), investigated that valence (whether a rating is positive or negative) of ratings is the most important dimension of OCR, while others (e.g. Liu, 2006) found that valence of OCR does not matter in explaining sales, but that the number (volume) of ratings does. The third dimension which is supposed to be the main driver of OCR is dispersion. Dispersion, or variance, is found to have a significant impact on sales growth, according to Clemons, Gao & Hitt (2006).

(7)

7

of OCR. Since these papers have been executed among different product categories, one might consider that the main dimension of OCR varies among product categories. This paper distinguishes search goods and experience goods, according to consumers’ ability to obtain product quality information prior to purchase. Search goods are products for which a consumer has the ability to obtain information about product quality prior to purchase. In contrast, experience goods are products that require sampling or purchase in order to obtain product quality (Nelson, 1970, 1974). According to Huang, Lurie & Mitra (2009), fundamental differences exist in the type of information consumers seek for these different types of products and, consequently, in their purchase behavior. Therefore, it is expected that the valuation of an online consumer rating is moderated by the characteristics of a product.

Based on this, the following research question is formulated:

How do volume, valence, and dispersion affect the usefulness of an online consumer rating among different product categories?”

In this study, the dimensions of online consumer ratings will be researched for both search goods and experience goods. Since the uncertainty in purchasing experience goods (for which the quality cannot be determined prior to consumption) is traditionally higher, it is expected that the desire to make use of OCR is larger for experience goods than for search goods.

The dataset for this study will be collected via an online questionnaire and tested by executing a Choice Based Conjoint analysis, to investigate the contribution of each dimension to the usefulness of an online consumer review on both aggregated and segment level.

From a seller’s point of view, this research could be very interesting. Although there is a high level of agreement regarding the utility of OCR, the existing literature is less clear in determining which dimensions affect the usefulness of ratings and what product characteristics might moderate this usefulness. This paper might investigate under what conditions a seller benefits from facilitating OCR and allows its consumers to post their comments on its own website (Chen & Xie, 2008).

(8)

8

2.

Theoretical Framework

2.1. Word of Mouth

Word of mouth (WOM) is defined by Liu (2006) as “informal communication among consumers about products and services.” The words ‘informal’ and ‘among consumers’ expresses that WOM is seen as communications by, and for consumers, without the company playing a role in this communication process. Goldenberg, Libai & Muller (2001) showed that a consumer’s decision making process is strongly influenced by WOM. Thus, consumer’s purchase decisions can be influenced by others’ opinions (Chen, Wang & Xie, 2010). Prior research showed as well that the effectiveness of WOM depends on the strength of ties between the recommendation source and the consumer who makes a decision (Zhu & Zhang, 2010). Strong ties are proven to be more valuable than weak ties, since “a strong tie increases the opportunity one has to communicate one’s needs and to assess others’ available resources” (Van den Bulte & Wuyts, 2007). Hence, WOM is interesting for managers, because it has proven to be an important driver of consumer behavior. So although a company does not play an active role in the WOM-process, WOM has proven to be important for companies in monitoring consumer attitudes, since WOM marketing can be more cost-effective than traditional marketing activities (Trusov, Bucklin & Pauwels, 2009).

2.1.1. User-generated information

The relevance of WOM communication is widely confirmed in marketing research. Webster (1970) investigated that in some cases, the information provided by marketers is sufficient for consumer’s purchase decision. However, as the amount of perceived risk in a purchase increases, it is more likely that a consumer will seek additional information from other people who have had experience with the products and who are seen as being more objective and trustworthy than commercial sources like marketers. Hence, information obtained from other users, via informal channels, may be an attempt to correct for perceived shortcomings in formal communications (Webster, 1970). Even in more recent studies is found that WOM comprises informal advice between consumers, which is usually interactive, swift and lacking in commercial bias (East, Hammond & Lomax, 2008).

(9)

9

objective and honest. Since this information transfer is not influenced by a firm, user-generated information is perceived to be more credible than seller-generated information (Dellarocas, 2003). Thus, in spite of the plurality of channels through which a consumer may receive information, informal and interpersonal communication is very important. (Godes & Mayzlin, 2004).

2.1.2. Online Word of Mouth

Over the past decade, the traditional WOM phenomenon has been transformed into various types of electronic Word of Mouth (eWOM) communication, due to the technological developments. eWOM communications has several characteristics in common with traditional WOM, but it is different on several dimensions as well. First, traditional WOM conversations take place in private conversations, for which observations traditionally has been difficult. In contrast, due to the rise of online customer-to-customer conversations (C2C), these conversations are more observable than traditional WOM (Godes & Mayzlin, 2004; Lee, Park & Han, 2008). Second, unlike traditional WOM, in which the influence is typically limited to a local social network, the impact of eWOM can reach far beyond the local community, due to the Internet (Chen & Xie, 2008). Third, eWOM has become an important resource for consumers that are seeking to discover product quality. However, credibility and trustworthiness are harder to determine in eWOM, since the sources of information in eWOM who have little or no prior relationship with the information seeker (Xia & Bechwati, 2008). Hence, in an online environment, tie strength is usually much weaker because recommendations or discourages are from strangers.

The popularity of eWOM is of great importance, since it may reflect the potential demand for a product in the future (Zhang et al, 2010). In this context, Godes & Mayzlin (2004) investigated the utility of online communities in explaining the sales process occurring offline. They found that people make offline decisions, based on online obtained information. The reciprocity of offline and online WOM is clarified in the finding of Zhu & Zhang (2010), who stated that 24 percent of Internet users access online reviews before an offline purchase. Online consumer reviews have thus become a major source of information for consumers and marketers regarding product quality.

2.2. Online Consumer Ratings

(10)

10

purchased the product, it contains their experiences, opinions, and evaluations” (Lee, Park & Han, 2008). A review has a dual function: it provides information about products and services, and serves as a recommendation (Zhang et al., 2010).

The relevance for online consumer reviews is investigated by Schlosser (2011), who stated that almost 58 percent of consumers prefer sites where reviews and/or ratings are available, and nearly all (98%) consumers declare to read reviews and/or ratings before purchasing. Thus, online consumer reviews has become a non-negligible source of information to consumers.

An important implication of online consumer reviews over traditional WOM is its visibility. Traditional WOM, exchanged in private conversations, is difficult to observe. As a result, marketers and researchers have relied on consumer recall or have inferred the process of information exchange in aggregated data. As mentioned before, the rising usage of online consumer reviews enables the observation of C2C-interactions in measuring WOM (Godes & Mayzlin, 2004).

The difference between reviews and ratings is that a review provides a qualitative assessment one one’s product experience, while a rating is a rather quantitative evaluation (Sridhar & Srinisivan, 2012). In numerous leading web stores, like Amazon.com, Bol.com, and Apple’s iTunes store, the average rating is prominently displayed to convey information about product evaluations.

Web stores often offer consumers the opportunity to evaluate products both in the form of a numerical star rating (usually ranging from 1 to 5 stars) and via open-ended comments (Mudambi & Schuff, 2010). Hereby, consumers often can give their opinion about the degree of usefulness (e.g. thumbs up or down) of each review. The usefulness of the sum of reviews, summarized in a rating, received less attention, despite the fact that they are prominently displayed in web stores. Three dimensions of ratings receive attention in determining the usefulness of a rating: volume, valence and dispersion (Dellarocas & Narayan, 2006; Moe & Trusov, 2011). Prior research to these ratings, like the paper of Sridhar & Srinisivan (2012), focuses on the degree in which existing ratings influences consumer’s own rating behavior. This study specifically aims at the usefulness of ratings and the characteristics (volume, valence, and dispersion) that determine the usefulness of the average ratings.

(11)

11

ways, but will be considered as statistical variance in this study. In the upcoming sections, these three dimensions of OCR will be elaborated.

In addition, since foregoing researches are executed among many different product- and service categories, the disagreement on the (most) important dimension(s) of OCR might be caused by the researched product category or characteristics. This will be set out in the next chapter.

2.2.1. Volume of OCR

In a study to WOM for movies, Liu (2006) found that most of the explanatory power of reviews comes from the volume of WOM rather than the valence of WOM. The volume of WOM thus plays an informative role by increasing the degree of consumer awareness and the number of reached consumers in the market. Therefore, the volume of OCR offers significant explanatory power for future revenues. In contrast, Chevalier & Mayzlin (2004) found that the reaction to an OCR is stronger for items that have less existing ratings, since online ratings are more informative when there is less product coverage. Hence, the volume of OCR has proven to be a predictor for future sales among certain product categories, but the magnitude of the effect of an extra rating is larger for less rated items.

Godes & Mayzlin (2004) suggest that the more conversations there are about a product, the more likely it is that somebody will be informed about it, leading to higher sales subsequently. In other words, the volume of OCR is related to consumer behavior, awareness and market outcome. According to Zhu & Zhang (2010), popular products tend to receive a larger number of reviews, and having larger number of reviews usually makes it is more trustworthy. They refer to Kirby (2000), who states that one non-expert might not be reliable, but if 90 percent of non-experts agree on a product, it is probably worth trying. Thus, the volume of ratings is associated with the trustworthiness and credibility of an average rating; the more individual ratings a product has received, the more useful the total rating is considered to be.

(12)

12

an average rating. Taking the foregoing mentioned arguments into account, it is expected that the volume of ratings has a positive effect on the usefulness of an online consumer rating; the more often a product is rated, the more useful the average rating is considered to be. Nevertheless, the magnitude of the effect of an additional rating on the usefulness of that OCR is expected to be smaller the more ratings an item has. These expectations result in the following hypotheses:

H1a: The volume of ratings has a positive effect on the usefulness of the average online consumer rating.

H1b: The magnitude of the effect of an additional individual rating on the usefulness of the average rating is larger for less rated items.

2.2.2. Valence of OCR

The valence of online consumer reviews describes whether the information is positive or negative. This distinction is important, because in an offline-setting (Arndt, 1967) as well as in an online-setting (Chen, Wang & Xie, 2010) is found that positive and negative information affect consumer behavior in different ways. Positive reviews give either a direct or an indirect recommendation for product purchase, while negative reviews may involve product denigration or discourage. Regarding the expected quality of a product, Liu (2006) therefore investigated that positive ratings enhance expected quality, whereas negative ratings reduce it. Although there is unanimity about the distribution of valence in positive and negative ratings, disagreement exists about the magnitude of the impact of positive ratings on the one hand, and negative ratings on the other hand.

With respect to the degree of impact of positive word-of-mouth versus negative word-of-mouth, East, Hammond & Lomax (2008) found that the impact of positive WOM is in general larger than negative WOM, since consumers’ purchase probability before receiving positive or negative WOM is below 50 percent, leaving more room for the positive WOM to increase than for negative WOM to decrease their purchase probability. In contrast, Lee, Park & Han (2008) found that a high proportion of negative online consumer review elicits a conformity effect, whereby both high- and low-involved consumers tend to conform to the perspective of reviewers. Moreover, Chevalier & Mayzlin (2006) found that the (negative) impact of a 1-star rating is larger than the (positive) impact of a 5-star review and Ahluwalia, Burnkrant & Unnava (2000) investigated that a consumer pays more attention to negative ratings than to positive information ratings in his product evaluation.

(13)

13

purchase intention and product evaluation. This study aims at the influence of positive and negative ratings on the usefulness of an average rating. In other words, do consumers pay more attention to positive or negative ratings in determining the usefulness of an online consumer rating? Since prior papers investigated that both positive and negative ratings may affect a consumer’s valuation of an OCR, it is expected that a quadratic effect exist, whereby both positive and negative ratings have a positive effect on the usefulness of an OCR. Neutral or average ratings, on the other hand, are expected to barely influence the usefulness of an OCR.

H2a: Positive ratings have a positive effect on the usefulness of an online consumer rating. H2b: Negative ratings have a positive effect on the usefulness of an online consumer rating.

2.2.3. Dispersion of OCR

While the influence of the other dimensions of OCR (volume and valence) is widely investigated, dispersion is less established. Both the research to dispersion and the operationalization of this dimension are less established than volume and valence. The interpretation of dispersion is in fact ambiguous and is measured in a variety of ways (Moe & Trusov, 2011). Some papers define dispersion as “the extent to which product-related conversations are taking place across a broad range of communities” (Godes & Mayzlin, 2004), whereby communities are described as the degree of heterogeneity between the population in which a product is discussed. Less dispersed WOM is thus, according to Godes & Mayzlin (2004), of less impact since it is discussed within a narrow and homogeneous population. The theory behind this reasoning lies in the network literature and is investigated by Granovetter (1973), who found that WOM quickly spreads within communities, but slowly across them.

The other way of interpreting dispersion is related to statistical variance (Moe & Trusov, 2011), so the variance of the ratings within the answer possibilities. For example, an average rating of three out of five stars can be reached by (1) 50 one-star ratings and 50 five-star ratings, or (2) 100 three-star ratings. Although these averages are equal in volume, the effect on the perceived usefulness of the rating might be different. In that vein, Clemons, Gao & Hitt (2006) found that the dispersion of ratings is positively correlated with sales growth. In fact, they argue that the dispersion of ratings is as important as the mean of ratings in predicting the growth in sales.

(14)

14

research the effect on a ratings usefulness via the quantitative approach, with the total amount of ratings rather than qualitative approach with consumers’ individual reviews. Taking both the practical and theoretical arguments into account, it is decided to elaborate on the statistical approach of dispersion in this study. Regarding the preferred distribution of dispersion on sales growth, Clemons, Gao & Hitt (2006) found that a more dispersed population provides more value to the company: “it is more important to have some customers who love you, than a huge number of customers who merely like you”. Thus, a rating structure in which the variance is more dispersed (via a statistical U-shape or Inverted-U-shape distribution), is preferred over an equal distribution of ratings. This results in the hypothesized effect of dispersion on the usefulness of an OCR in this study: H3: A more dispersed rating has a positive effect on the usefulness of an online consumer rating.

2.3. Product Characteristics

There are easy accessible online consumer ratings for numerous products or services (books, movies, television shows, electronics, restaurants, hotels, airlines, cleaning services, bicycles, plumbers, doctors, summer camps, mobile calling plans etc.) (Godes & Silva, 2012). Several studies have attempted to identify the relationship between (the dimensions of) OCR and product sales, resulting in contradictory findings. Research to the impact of different dimensions of OCR (volume, valence, and/or dispersion), and the influential and/or predictive power of OCR on sales or consumers’ purchase intention, has resulted in different outcomes.

Regarding the influence of OCR on sales, Chevalier & Mayzlin (2006) argued that online consumer ratings significantly influence sales. In contrast, Chen et al. (2004) found that OCR do not influence sales, but only serve as the predictors of sales. They align their research to Eliashberg & Shugan (1997), who found that consumer reviews serve as predictors rather than influencers of product sales. Nevertheless, these studies have been executed among different product categories; one might consider that the most important dimension of OCR varies among different product categories. Therefore, this study focuses on the moderating effect of product characteristics that might affect the importance of the different dimensions of ratings and thereby the usefulness of an OCR.

(15)

15

functional or practical task”, and hedonic products are characterized as goods “whose consumption is primarily characterized by an affective and sensory experience of aesthetic or sensual pleasure” (Cheema & Papatla, 2010; Dhar & Wertenbroch, 2000). Second, Nelson (1970; 1974) made a distinction between search goods and experience goods. Search goods are defined as goods that are dominated by product attributes for which complete information about the product can be acquired prior to purchase. Experience goods, on the other hand, are characterized by product attributes that cannot be known until the purchase has occurred and the product has been used (Klein, 1998). The distinction between search goods and experience goods is described in many other papers. Especially in recent years, due to the enormous growth in online purchases, many studies adopted, or criticized, the distribution between search and experience goods. In that vein, Moon, Chadee & Tikoo (2008) found that the major limitation of online shopping is that consumers cannot physically experience a product at the time of purchase. Therefore, they concluded that the Internet is more appropriate for selling search goods than experience goods. In contrast, Huang, Nurie & Mitra (2009) found that due to the use of Internet, the traditional distinctions between search and experience goods have been reduced. However, they endorse that experience goods involve greater depth (time per page) and lower breadth (total number of pages) of search than search goods. In addition, they concluded that the presence of online consumer reviews has a greater effect on search and purchase behavior for experience goods than for search goods.

People are more likely to rely on C2C–interactions (like OCR) when they are less aware or knowledgeable about products (Libai et al., 2010). This lack of knowledge or information leads to an increase in perceived risk and uncertainty for consumers (Hsieh et al., 2005). This uncertainty can be reduced by referring to eWOM information, like online consumer reviews or ratings that contain objective and thus trustworthy advice from other, experienced, users (Park & Lee, 2009). Therefore, one might expect that online consumer ratings are more important in the purchase decision of products for which a consumer feels uncertain.

Since a lack of awareness or knowledge can occur in the purchase decision of both hedonic and utilitarian products, it is expected that the distinction between search and experience goods is better suited for determining differences between product types. In fact, the distinction between search and experience goods is based on the degree of knowledge prior to a purchase. Thus, in this study the distinction of product types based on Nelson (1970; 1974) will be adopted.

(16)

16

experience goods used in this study. That is, they found that consumers are more likely to refer to eWOM (including OCR) for high-risk purchases. Since the product characteristics for experience goods are difficult to observe until consumption (Zhu & Zhang, 2010), it is expected that consumers are more likely to rely on credible and trustworthy information by others, caused by the volume of ratings, for experience goods than for search goods. In addition, it is expected that the importance of positive ratings is larger for experience goods, since it contributes in reducing uncertainty with respect to a purchase. Meanwhile, it is expected that negative ratings are more important with respect search goods, since consumers who are already interested in purchasing a product are more likely to browse for critical evaluations they were not aware of, than for confirmation of the opinions they already have.

Besides, experience goods are often purchased only once, and are associated with higher risk than goods that are purchased frequently. Since this uncertainty is thus traditionally higher for experience goods (for which the quality cannot be determined prior to consumption), it is expected that the tendency to search information before an online purchase is larger for experience goods than for search goods.

H4A: The volume of ratings is more important for experience goods than for search goods. H4B: Positive ratings are more important for experience goods than for search goods. H4C: Negative ratings are more important for search goods than for experience goods.

H4D: The tendency to rely on online consumer ratings is greater for experience goods than for search goods.

(17)

17

2.4. Conceptual Framework

From the theoretical framework, the conceptual model for this study can be derived. The conceptual model is displayed in figure 1.

Figure 1 - Conceptual Model

For clarity purposes, the hypotheses which are formulated regarding the expectations based on theory are summarized in table 1.

Hypothesis:

1a: The volume of ratings has a positive effect on the usefulness of the average online consumer rating. 1b: The magnitude of the effect of an additional individual rating on the usefulness of the average rating

is larger for less rated items.

2a: Positive ratings have a positive effect on the usefulness of an online consumer rating. 2b: Negative ratings have a positive effect on the usefulness of an online consumer rating. 3: A more dispersed rating has a positive effect on the usefulness of an online consumer rating. 4a: The volume of ratings is more important for experience goods than for search goods.

4b: Positive ratings are more important for experience goods than for search goods. 4c: Negative ratings are more important for search goods than for experience goods.

4d: The tendency to rely on online consumer ratings is greater for experience goods than for search goods.

(18)

18

3. Methodology

In this study, two independent Choice Based Conjoint (CBC) analyses will be executed. The purpose of these CBC analyses is to derive the relative importance of the attributes (volume, valence, and dispersion) and the preferred levels of each attribute for online consumer ratings (OCR) among different product types (experience versus search goods). The data for this study is collected via an online questionnaire (appendix 1). In addition, an Independent Samples T-test will be conducted to test the hypothesized difference in consumers’ desire to take advantage of online consumer ratings for different product types.

3.1. Data Collection

This study focuses on the extent in which the different dimensions of OCR affect the usefulness of a rating. Therefore, an online survey is conducted to collect data on (individual) consumer preferences in online consumer ratings and demographic characteristics of the respondents.

298 people participated in this study. Participants were contacted via social media (Facebook, Twitter, and LinkedIn), e-mail and direct begging conversations. Incomplete filled in surveys were excluded manually, resulting in 262 completed surveys. This number of participants is sufficient, according to Hair et al. (2010); a sample size of at least 200 is found to provide an acceptable margin of error in the usual applications of Choice Based Conjoint Analysis. Ideally, for identifying segments (latent classes) out of the respondents, the recommended sample size is at least 200 for each group (Hair et al., 2010). However, although latent classes will be identified in this study, the current sample size is supposed to be acceptable, with respect to the scope and feasibility of this study. For this questionnaire, items from standard scales were used to measure daily internet usage, online purchase frequency1 (Hong & Kim, 2012) and information seeking behavior (Flynn, Goldsmith & Eastman, 1996). Besides, respondents are asked to fill in their age, gender, postal code, income level2, education, household size and job. These variables might be used as (active) covariates in the conjoint analyses.

3.1.1. Stimuli

The dimensions of online consumer ratings (volume, valence, and dispersion) are defined as attributes for the Conjoint Analysis for both product types (search and experience goods). As

1

For measuring ‘daily Internet use’ and ‘online purchase frequency’, a pre-test (N=12) is conducted to improve the standard scale. After the pre-test, the range of the standard scales have been extended

2

(19)

19

mentioned in the theoretical framework, volume (the total number of ratings), valence (the average rating measure), and dispersion (the variance in ratings) are found to be dimensions that determine the usefulness of a rating. The distinction between search and experience goods is based on the degree of product knowledge prior to a purchase.

Product type is not considered as an attribute, to prevent that ratings for different products are combined within a choice set. Hence, both product types are separated and will be tested in two independent Conjoint Analyses. Based on the attributes and levels displayed in table 2, twenty-seven (3x3x3) profiles can be distinguished for each product type.

Attribute Levels

Volume 10 50 90

Valence 1.8 3.0 4.2

Dispersion U-shaped Inverted U Equal

Table 2 - Attributes and levels

The profiles were developed using Microsoft Paint. The design of the profiles is found in the lay-out of online consumer ratings by leading companies like Amazon.com, Bol.com and Apple’s iTunes-store. Barring the manipulations in the attributes, all profiles are identical (figure 2).

(20)

20

The products chosen for experience goods and search goods are a book and a digital photo camera, since these products are relevant examples for experience and search goods (Huang, Lurie & Mitra, 2009; Nelson, 1970, 1974).

Next, the profile presentation has to be chosen. Three methods of presentation are associated in conjoint analysis: full-profile method, pairwise comparison presentation, and trade-off presentation. For this study, a full-profile method is most appropriate, since the number of attributes used in this study is limited. Advantages of the full-profile method are that relatively few judgments per respondent are required, fractional design (each respondent judges a fraction of all possible stimuli) is possible, and all attributes are shown simultaneously, which is most realistic (Hair et al., 2010).

3.1.2. Choice task design

Thereafter, the profiles have to be translated into choice sets, wherefore Sawtooth Software SSI Web is used. Sawtooth defines the best number of stimuli per respondent. In this case, Sawtooth suggested to use two versions, three profiles per choice set (columns), five choice sets per respondent and one fixed choice set, resulting in a efficiency of 0.9818 (table 3). The fixed choice set is shown to every respondent and serves as a holdout sample, which is used to test the predictive validity. This construct will be discussed later in this chapter.

Attribute Level Frequency Actual Ideal Efficiency

Volume 10 ratings 10 0.4513 0.4472 0.9818 50 ratings 10 0.4513 0.4472 0.9818 90 ratings 10 0.4513 0.4472 0.9818 Valence 1.8 stars 10 0.4513 0.4472 0.9818 3.0 stars 10 0.4513 0.4472 0.9818 4.2 stars 10 0.4513 0.4472 0.9818 Dispersion U-shaped 10 0.4513 0.4472 0.9818 Inverted U-shaped 10 0.4513 0.4472 0.9818 Equal 10 0.4513 0.4472 0.9818

Table 3- Choice task design

(21)

21

and ends with choice set 7-12 (including a fixed choice set) for experience goods. The order of showing stimuli is reversed in the different questionnaire versions to ensure that each product type is shown first just as often to prevent a difference in respondents’ knowledge or concentration for one product type over the other. The total number of choice sets shown to each respondent is twelve, which is acceptable according to Johnson and Orme (1996) who argue that up to twenty tasks can be used without facing a decrease in reliability.

In CBC experiments, often a no-choice (none-) option is included. The major advantage of including a none-option is that a more realistic experiment is obtained (Vermeulen et al., 2008). Forcing respondents to make a choice between inappropriate choice options might lead to biased parameters (Dhar, 1997). However, this paper investigates a hypothetical situation in which a consumer (respondent) already decided to buy a book or a photo camera. It is assumed that in each choice set a respondent should be able to choose a rating combination which is appropriate (in positive or negative sense) to them. Therefore, no none-option is incorporated in this CBC experiment.

3.2. Choice Based Conjoint Analysis

“Conjoint Analysis is a multivariate technique developed to understand how respondents develop preferences for any type of object, based on the premise that consumers evaluate the value (utility) of an object by combining the separate amounts of value provided by each attribute” (Hair et al., 2010). In other words, Conjoint Analysis determines the relative importance consumers allocate to a set of attributes, and the utilities consumers attach to the level of attributes. To study consumer preference based on utilities, objects are considered as scores on a set of attributes; the utility of a product equals the sum of the utilities of the attribute levels (Wierenga, 2008). To be successful in defining utility, the object in terms of its attributes and all relevant values for each attribute has to be described (Hair et al., 2010).

(22)

22

interpreted at the segment level, to define latent classes in the population for each product type. Hereby, LatentGOLD software will be used to estimate latent class models.

3.2.1. Model Specification and Estimation

Since the utility of a product (in this case: an online consumer rating) equals the sum of the utilities of the attribute levels, the formula to determine utility (U) per segment (j) is:

Uj = β1jvolume + β2jvalence + β3jdispersion

There are three types of relationships within each attribute: linear, quadratic and part-worth shaped. The linear model is the simplest; just a single part-worth (regression coefficient) has to be estimated, which is multiplied by the level’s value to arrive at a worth for each level. The separate part-worth form, on the other hand, requires separate estimates for each level (Hair et al., 2010).

Obviously, the attribute dispersion cannot be treated as linear on forehand, since the levels of dispersion are nominal scaled. However, one might consider to treat the attributes volume and valence as linear relationships, since these attributes are ordinal scaled in the survey. Concerning valence, based on prior papers (East, Hammond & Lomax, 2008; Chevalier & Mayzlin, 2006) it is hypothesized that both negative and positive ratings provide a positive utility. It is expected that the part-worths of this attribute have a quadratic relationship, thus part-worths will be estimated for every level of valence. Regarding volume, the results of the part-worths per latent class will be plotted to conclude whether a linear or a part-worth function is most appropriate.

The dependent variable in a CBC analysis is the choice - which item is selected from a set of alternatives - a respondent makes. The choice a respondent makes is assumed to be the rating a consumer finds the most useful out of the available alternatives. The attributes (volume, valence, and dispersion) are the explanatory variables. The relative importance of each attribute can be calculated by dividing the largest part-worth difference within an attribute by the sum of the largest differences of all attributes (Hair et al., 2010).

3.2.2. Model Fit

In order to achieve the most appropriate number of segments, the model fit is taken into account. Model fit can be determined based on classification (how good can cases be classified to segments) and Likelihood (how good / bad is the fit between the data and the model parameters). Since the absolute fit – in terms of R2 and Akaike Information Criteria (AIC) – increases with the number of classes and added variables, it is important to look at the relative fit, which is corrected for the number of model parameters. Therefore, the Bayesian Information Criteria (BIC) or Consistent Akaike

(23)

23

identifying the optimal number of segments (Jedidi, Jagpal & DeSarbo, 1997), since the Log Likelihood is penalized for the number of parameters.

3.2.3. Covariates Related to Class Membership

There might be additional independent variables that are predictive for the dependent variable (the profile a respondent chooses in a choice set). These independent variables (covariates) might bring information outside the already available information in the choice tasks to the model to improve the estimation of part-worths (Orme & Johnson, 2009). Based on the significance levels of these covariates and its effect on overall model fit will be determined whether adding covariates to the model is appropriate.

3.2.4. Validation

The results of the CBC analysis will be validated both internally (via the predicted cases in

LatentGOLD) and externally (via the hold-out sample). The survey contains a hold-out task (figure 3)

for both product types. In contrast to the estimation set, which is used to calculate the part-worth functions for the attribute levels, these hold-out tasks are used to assess the reliability and validity of the model (Malhotra, 2010) by determining the hit-rate of the model. The hold-out task for both product types is identical, obviously except the kind of product shown.

Figure 3 - Hold-out choice task

3.3. Independent Samples T-test

(24)

24

Half of the participants (questionnaire version 1; N=132) is asked to answer six questions concerning the degree in which they take advantage of online consumer ratings when purchasing experience goods. The other half (version 2, N=130) is asked the same questions regarding search goods (table 4). For these questions, items from a standard scale by Flynn, Goldsmith & Eastman (1996) is used. Items are scored on a 5-point Likert-scale ranging from strongly disagree to strongly disagree. Before conducting an Independent samples T-test, the Cronbach’s alpha of this data will be calculated. Cronbach’s alpha is a measure of internal consistency reliability to find out whether it is appropriate to take the means of a questions section. A Cronbach’s alpha of 0.6 or higher indicates sufficient internal consistency reliability (Malhotra, 2010).

When I consider buying a [search good], I always look at OCR at the product page.

When choosing a [search good], OCR are not important to me.* I feel more comfortable buying a [search good], when I have consulted OCR.

I don’t like to consult OCR before I buy a [search good].* I rarely consult OCR about what [search good] to buy.* I like to consult OCR before I buy a [search good].

* Questions 2, 4, and 5 require reverse scoring.

Table 4- Questions concerning information seeking behavior for search goods

(25)

25

4. Results

The data for this study is collected via a questionnaire (appendix 1). In this section, demographic characteristics and the results of the Choice Based Conjoint (CBC) analyses will be provided. After the interpretation of the CBC analyses for both product categories, the comparison of the categories (on aggregated level) will be elaborated. The hypotheses are tested for both CBC-analyses, and on both aggregated and segment level. For clarity purposes, the hypotheses testing will be performed in the final section of this chapter.

4.1. Descriptive Statistics

In this study, 262 respondents participated. Of this population, 146 respondents (55.7%) are men and 116 respondents (44.3%) are women. The majority of respondents live in the area of Deventer (58.4%) or Groningen (16.8%), based on their postal code. The questionnaire starts with two questions regarding daily internet use and online purchase frequency (table 5). The mode for internet usage is 1-3 hours per day; the median is at 3-5 hours per day. For online purchase frequency, both the mode and median are at 4-6 online purchases per year.

Internet Use Online Purchase Frequency

# % # %

< 1 hour per day 8 3.1 < 2 times per year 28 10.7

1 - 3 hours per day 106 40.5 2 - 4 times per year 39 14.9

3 - 5 hours per day 79 30.2 4 - 6 times per year 101 38.5

5 - 7 hours per day 45 17.2 (almost) monthly 72 27.5

> 7 hours per day 24 9.2 > 1 time per month 22 8.4

Table 5 – Internet Use and Online Purchase Frequency

(26)

26

Household income Household size Education Job

# % # % # % # %

< 10K per year 44 16.8 1 person 85 32.4 No education 0 0.0 Scholar 9 3.4

10-20K per year 23 8.8 2 persons 53 20.2 Primary school 0 0.0 Student 89 34.0

20-30K per year 19 7.3 3 persons 34 13.0 VMBO 12 4.6 Working 140 53.4

30-40K per year 37 14.1 4 persons 60 22.9 HAVO/VWO 12 4.6 Unemployed 16 6.1

40-50K per year 33 12.6 5 persons 19 7.3 MBO 63 24.0 Retired/Disabled 8 3.1

>50K per year 47 17.9 > 5 persons 11 4.2 HBO 99 37.8

Unknown 59 22.5 WO 76 29.0

Table 6 - Demographic statistics

Another meaningful demographical statistic is the age of respondents. Figure 4 shows how age is distributed among respondents. The mean age of the participants in this study is approximately 32 years old. Half of the respondents are between 20 and 29 years old.

Figure 4 - Age distribution

4.2. Choice Based Conjoint Analysis for Experience Goods

4.2.1. Preference Function

In this CBC analysis, three variables are used: volume, valence, and dispersion. First, the preference function (linear, quadratic, or part-worth) for each attribute has to be determined. Regarding valence and dispersion it is already stated that a part-worth function is most appropriate. For volume, one

(27)

27

might consider to treat the parameters as a linear function. To find the most appropriate preference function, the parameters for each of the three segments3 are plotted in figure 5.

Figure 5- Parameters per segment (experience goods)

Figure 5 shows that the parameters for class 2 are more or less linear. For class 1, the difference between a volume level of 50 and 90 is limited, compared to the difference between a volume level of 10 and 50. The parameters for class 3 are decreasing; a volume of 10 ratings is preferred over 50 or 90 ratings. Consequently, treating volume as linear function is inappropriate, since the parameters of the different segments go into different directions. A linear function provides just a single part-worth, which needs to be multiplied by the level’s value to arrive at a part-worth for each level of volume. Thus, all variables will be considered as part-worths; a parameter will be estimated for each level of each variable.

4.2.2. Aggregated Model

The comparison of the relative importance of the different metrics of online consumer ratings will take place on aggregated level, thus before defining classes the importance of the attributes on aggregated level will be considered. The part-worths and importance of each attribute are summarized in table 7.

3 The choice for three segments will be elaborated later in this chapter

-2,5 -2,0 -1,5 -1,0 -0,5 0,0 0,5 1,0 1,5 2,0 2,5 10 50 90 Par am e te r

Volume

(28)

28

Dispersion is found to have no significant effect on consumers’ utility (p-value = 0.065; Wald = 5.4648). Due to this finding, the interpretation of parameters regarding dispersion has to be interpreted with less confidence.

Volume is found to be the most important attribute (50%). A number of ten ratings have the biggest negative influence of the aggregated utility, where ninety ratings provide a positive influence on utility.

Attribute Level Estimated part-worths Range of part-worths Relative Importance

Volume 10 -0.765** 50 0.099** 90 0.666** 1.4313 49.98% Valence 1.8 -0.413** 3 -0.416** 4.2 0.830** 1.2456 43.50% Dispersion Equal -0.118* Inverted-U 0.069 U-shape 0.048 0.1868 6.52% *Z-value > 1.96 ** Z-value > 2.58

Table 7 - Parameters and importance on aggregate level for Experience goods

The attribute valence has a relative importance of 43.5%. Both 1.8- and 3-star ratings have a negative influence on respondents’ utility. In contrast, 4.2-star star ratings provide the most positive influence on the aggregated utility.

Although the effect of dispersion is insignificant, it indicates a direction. In that context is found that the effect of both inverted-U-shaped and U-shaped dispersed ratings is slightly positive, while equal dispersed ratings are valued negative.

4.2.3. Model Selection for Segment Level Interpretation

(29)

29

However, treating online purchase frequency, gender, and education as active covariates, leads to a worse model fit (in terms of BIC and CAIC) compared to a model without covariates (appendix 2). Therefore, all variables are considered as inactive covariates.

Based on several model fit metrics, summarized in table 8, the optimal number of segments can be chosen. Given the relative fit (BIC and CAIC), one might consider to choose a four class model. However, the four class model choice is insignificant (p > 0.05). Based on the p-value, classification error, R2 , BIC and CAIC criteria, therefore three classes are considered to be most appropriate.

LL BIC(LL) CAIC(LL) Npar L² df p-value Class.Err. R²(0) R²

1-Class Choice -1065,9796 2165,3692 2171,3692 6 541,6715 256 1,30E-22 0 0,2877 0,2871

2-Class Choice -986,9638 2035,1793 2046,1793 11 383,6399 251 1,40E-07 0,0581 0,4525 0,4519

3-Class Choice -952,3410 1993,7755 2009,7755 16 314,3944 246 0,0021 0,0830 0,5262 0,5257

4-Class Choice -924,8884 1966,7121 1987,7121 21 259,4892 241 0,20 0,1063 0,6004 0,5998

5-Class Choice -912,8418 1970,4606 1996,4606 26 235,3961 236 0,50 0,1296 0,6758 0,6752

Table 8 - Model fit for different classes

4.2.4. Interpretation on Segment Level

Now it is decided to use a three class solution, the interpretation of the part-worths per segment can take place. Since LatentGOLD does not provide p-values per part-worth, the corresponding Z-values (*>1.96, **>2.58) are used to determine the significance (on a *95% or **99% confidence level) per part-worth.

Overall sample Segment 1 Segment 2 Segment 3

Volume 10 -0.7653** -1.2470** -1.4259** 0.1355 50 0.0992* 0.4786** 0.0876 -0.0564 90 0.6660** 0.7684** 1.3383** -0.0791 Valence 1.8 -0.4133** -2.1317** 0.5084** -0.5130* 3 -0.4162** -0.0571 -0.8773** 0.4385* 4.2 0.8295** 2.1888** 0.3689** 0.0744 Dispersion Equal -0.1175* -0.0957 -0.0957 -0.0957 Inverted-U 0.0693 0.0017 0.0017 0.0017 U-shape 0.0482 0.0940 0.0940 0.0940 *Z-value > 1.96; **Z-Value > 2.58

(30)

30

Figure 6 – Relative importance of attributes per segment (experience goods)

The class solution contains two large classes: class 1 (46.5% of the population) and class 2 (45.0%), and one smaller class: class 3 (8.3%). Dispersion is found to differ significant neither between classes (Wald statistics = 6.5933; p-value = 0.36) nor within classes (Wald(=) statistics = 2.7400; p-value 0.6) and is therefore considered as ‘class independent’.

Hereafter, segment profiles will be described in terms of attribute importance, preference levels and class characteristics. Although no demographic variables are treated as active covariates to predict the classes, these variables can be used to describe segments. The demographic characteristics per segment are summarized in table 10.

Segment1 Segment2 Segment3 InternetUse < 1 hour per day 3.77% 2.01% 4.34% 1 - 3 hours per day 43.45% 36.92% 42.07% 3 - 5 hours per day 26.29% 33.08% 35.53% 5 - 7 hours per day 18.14% 17.37% 11.60% > 7 hours per day 8.35% 10.62% 6.45%

OnlinePurchFreq < 2 times per year 12.13% 8.32% 14.50% 2 - 4 times per year 12.99% 16.76% 15.55% 4 - 6 times per year 39.28% 37.58% 39.44% (almost) monthly 27.98% 26.48% 29.62% > 1 time per month 7.62% 10.87% 0.89%

Gender male 48.76% 62.74% 57.49% female 51.24% 37.26% 42.51% Age Mean 33.97 29.28 34.72 Education VMBO 6.08% 3.37% 2.80% HAVO/VWO 5.33% 2.66% 9.71% MBO 28.71% 14.97% 42.84% HBO 39.12% 38.56% 27.78% WO 20.76% 40.44% 16.86%

(31)

31 10-20K per year 8.94% 9.34% 5.43% 20-30K per year 7.85% 4.98% 14.68% 30-40K per year 13.34% 15.88% 9.90% 40-50K per year 17.72% 8.09% 8.27% >50K per year 17.51% 18.65% 16.79% Unknown 23.29% 17.53% 41.54% Job Scholar 3.11% 4.27% 1.22% Student 26.61% 42.41% 31.20% Working 59.19% 46.83% 55.64% Unemployed 6.94% 4.90% 7.57% Retired/Disabled 4.16% 1.59% 4.38%

Table 10 - Demographic characteristics per segment (experience goods)

4.2.5. Class Characteristics

Segment 1: Confirmation seekers

Based on the relative importance of the attributes (figure 6) and the part-worths per attribute level (table 9), segment 1 can be described as ‘confirmation seekers’. Valence is the most important attribute (66.2%) in this segment, whereby negative ratings (1.8-stars) are accountable for the biggest negative influence on utility, and positive ratings (4.2-stars) provide the most positive influence on the utility for this segment. The attribute volume has a relative importance of 30.8%, a volume of 90 ratings, as well as a volume of 50 ratings, provides a positive contribution to the utility of confirmation seekers. A 3-star rating does not have a significant influence on the utility of the respondents in this segment. The dispersion of ratings plays a negligible role for confirmation seekers (2.9%).

Since positive valence is the most important attribute level, it is assumed that confirmation seekers prefer a 4.2 star rating, except when this is accompanied with the smallest volume of ratings.

With respect to the demographic characteristics (table 10) of confirmation seekers is found that this segment is slightly female dominant (51.2%). The mean age is 34 years old. Both daily internet use and online purchase frequency has more or less the same distribution compared to the total population. The majority of respondent is intermediate (MBO) or college (HBO) educated, and is currently working (59.2%).

Segment 2: Critical considerers

(32)

32

provides the most negative influence on utility. On the other hand, a volume of 90 ratings provides the most positive contribution to a critical considerer’s utility. The effect of 50 ratings is positive but insignificant, which means that it does not have a significant influence on the utility within this segment.

The relative importance of valence is 32%, hence it is striking that this is the only segment where a quadratic (ideal-point) relationship function can be distinguished: both negative and positive valenced ratings have a positive influence on the utility function, where medium valenced ratings affect the utility in negative sense. The relative importance of dispersion is negligible for critical considerers (4.3%).

The respondents in this segment are called critical considerers, since they consider only a volume of 90 ratings to be acceptable, in terms of credibility and reliability. Besides, they are seeking for negative ratings rather than positive ratings, from which is derived that they prefer critical ratings which might discourage their appreciation of a product, over ‘praising’ ratings to enlarge their appreciation. Critical considerers will often choose the largest possible volume of ratings, except when this is accompanied with an average (3.0-stars) valenced rating level.

Concerning the demographic characteristics of critical considerers, table 10 shows that this segment is male dominant (62.7%). It is the ‘youngest’ segment, given a mean age of 29, and almost 80% is highly educated (college or university). This segment contains most people who are currently studying; this probably explains the relative high percentage of respondents with a low income (less than 10K per year) in this segment4. Furthermore, critical considerers have, on average, the highest online purchase frequency and daily internet usage.

Segment 3: Inattentive choosers

The third segment can be described as ‘inattentive choosers’, based on their somewhat random distribution of preferences. Although valence is the most important attribute (figures 6) for this segment (70.2%), the largest worth difference for valence is only 0.95 (compared to a part-worth difference of 4.32 in the first segment). Inattentive choosers thus have no distinct preference for any of the attributes or levels. Only the effects of 1.8- and 3-stars ratings are significant, whereby inattentive choosers prefer a medium valenced rating (table 9). Since the negative influence of low valenced ratings surpasses the positive effect of medium valenced ratings, the effect of low ratings has a more negative effect on the utility of this segment than medium ratings affect utility in positive sense.

(33)

33

All other parameters do not have a significant influence on the utility of the respondents in this segment. Probably, this is caused by the fact that dispersion is more important in this segment compared to other segments (14.0 %).

The demographic characteristics, displayed in table 10, show that most of inattentive choosers are male (57.5%). The mean age is 35 years old, which makes this segment the ‘oldest’ segment. Almost half of the inattentive choosers is low (secondary school (VMBO) or intermediate (MBO)) educated.

4.2.6. Validity

After arriving at the optimal number of segments, the results can be validated both internally and externally. The internal validity can be calculated via the prediction table in LatentGOLD, by summing up the well-predicted cases and divide this by the total number of cases. For this Conjoint Analysis, the internal validity is ((415+324+267) / 1310=) 76.79%

The external validity is calculated via the hold-out sample used in the questionnaire. This results in a hit-rate of 58.78%, which is an improvement in comparison with a random selection of 33% (for three options). The hit-rates per stimulus in the hold-out set are summarized in table 11.

Choice predicted: 1 2 3

Volume: 50 50 50

Valence: 4.2 1.8 3.0

Dispersion: Inverted-U Inverted-U Inverted-U

Total: 162 85 15

Correct: 128 20 6

Percentage: 79% 24% 40%

Table 11 - External validity (experience goods)

4.3. Choice Based Conjoint Analysis for Search Goods

4.3.1. Preference Function

(34)

34

appropriate preference function, the parameters for each of the three segments5 are plotted in figure 7.

Figure 7 - Parameter per segment (search goods)

Figure 7 shows that the parameters for class 1 are more or less linear. For class 2, the difference between a volume level of 50 and 90 is limited, compared to the difference between a volume level of 10 and 50. The parameters for class 3 are decreasing; a volume of 10 ratings is preferred over 50 or 90 ratings. Consequently, treating volume as linear function is inappropriate, since the parameters of the different segments go into different directions. A linear function provides just a single part-worth, which needs to be multiplied by the level’s value to arrive at a part-worth for each level of volume. Thus, all variables will be considered as part-worths; a parameter will be estimated for each level of each variable.

4.3.2. Aggregated Model

The comparison of the relative importance of the different metrics of online consumer ratings will take place on aggregated level, thus before defining classes the importance of the attributes on aggregated level will be considered. The part-worths and importance of each attribute are summarized in table 12.

5 The choice for three segments will be elaborated later in this chapter

-2,5 -2,0 -1,5 -1,0 -0,5 0,0 0,5 1,0 1,5 2,0 2,5 10 50 90 Par am te r

Volume

(35)

35

Attribute Level Estimated part-worth Range of part-worths Relative Importance

Volume 10 -0.7764** 50 0.1012* 90 0.6752** 1.4516 53.93% Valence 1.8 -0.3588** 3 -0.4026** 4.2 0.7614** 1.164 43.24% Dispersion Equal -0.0446 Inverted-U 0.0315 U-shape 0.013 0.0761 2.83% *Z-value > 1.96 ** Z-value > 2.58

Table 12 - Parameters and importance on aggregate level for search goods

Dispersion is found to have no significant effect on consumers’ utility (p-value = 0.65; Wald = 0.8647). Due to this finding, the interpretation of parameters regarding dispersion has to be interpreted with less confidence.

Volume is found to be the most important attribute (54%). A number of ten ratings have the biggest negative influence of the aggregated utility, where ninety ratings provide a positive influence on utility.

The attribute valence has a relative importance of 43.2%. Both 1.8- and 3-star ratings have a negative influence on respondents’ utility. In contrast, 4.2-star star ratings provide the most positive influence on the aggregated utility.

Although the effect of dispersion is insignificant, it indicates a direction. In that context is found that the effect of both inverted-U-shaped and U-shaped dispersed ratings is slightly positive, while equal dispersed ratings are valued negative.

4.3.3. Model Selection for Segment Level Interpretation

To find the best fitting model, multiple models are estimated. First, the covariates (Internet use, online purchase frequency, age, gender, education, income, household size, and job) are used to predict class membership. For the modeling of search goods, gender and education are found to be significant (p < 0.05). The other variables do not significantly differ across segments, though they can be used to describe segments.

(36)

36

Based on several model fit metrics, summarized in table 13, the optimal number of segments can be chosen. Given the relative fit (BIC and CAIC), one might consider to choose a three or four class model. Since the classification error for the three class choice is much lower than for the four class choice, it is decided that using three classes is most appropriate.

LL BIC(LL) CAIC(LL) Npar L² df p-value Class.Err. R²(0) R²

1-Class Choice -1090,8264 2215,0629 2221,0629 6 570,7266 256 4,80E-26 0,0000 0,2644 0,2611

2-Class Choice -998,4721 2058,1960 2069,1960 11 386,0180 251 8,80E-08 0,0359 0,4431 0,4408

3-Class Choice -964,6512 2018,3960 2034,3960 16 318,3762 246 0,00130 0,0642 0,5079 0,5058

4-Class Choice -948,6094 2014,1541 2035,1541 21 286,2926 241 0,02400 0,1060 0,5674 0,5655

5-Class Choice -936,0594 2016,8958 2042,8958 26 261,1926 236 0,12000 0,1305 0,6107 0,6090

Table 13 - Model fit for different classes

4.3.4. Interpretation on Segment Level

Now it is decided to use a three class solution, the interpretation of the part-worths per segment can take place. Since LatentGOLD does not provide p-values per part-worth, the corresponding Z-values (*1.96, **2.58) are used to determine the significance (on a *95% or **99% confidence level) per part-worth.

Overall sample Segment 1 Segment 2 Segment 3

Volume 10 -0.7764** -1.945** -0.9458** 0.0577 50 0.1012* 0.3024* 0.4132** -0.0312 90 0.6752** 1.6522** 0.5326** -0.0264 Valence 1.8 -0.3588** 0.6900** -2.2889** -0.5096* 3 -0.4026** -1.0366** 0.1653 0.3872* 4.2 0.7614** 0.3465** 2.1237** 0.1225 Dispersion Equal -0.0446 -0.0022 -0.0022 -0.0022 Inverted-U 0.0315 -0.0485 -0.0485 -0.0485 U-shape 0.0130 0.0504 0.0504 0.0504 *Z-value > 1.96; **Z-Value > 2.58

(37)

37

Figure 8 – Importance of attributes per segment (search goods)

The class solution contains two large classes: class 1 (45.4% of the population) and class 2 (42.3%), and one smaller class: class 3 (12.2%). Dispersion is found to differ significant neither between classes (Wald statistics = 9.2555; value = 0.16) nor within classes (Wald(=) statistics = 6.0469; p-value 0.2)and is therefore considered as ‘class independent’.

The first striking feature is that the segments in this analysis seem to be similar to the segments distinguished in the first analysis6. Segment 1 (confirmation seekers) for experience goods derives more or less the same preferences of attributes compared to segment 2 for search goods, and so on. To check whether the respondents in segment 1 for experience goods are the same as those in segment 2 for search goods, the class memberships among both product types for all respondents are compared. This demonstrates that 54.1% of the respondents that are assigned to segment 1 (or 2) for experience goods are assigned to segment 2 (or 1) for search goods. With respect to the respondents in segment 3 (inattentive choosers) is found that 31.1% of the respondents which is assigned to segment 3 at least once, is allocated to this segment in both (segmentation level) analyses.

Hereafter, segment profiles will be described in terms of attribute importance, preference levels and class characteristics. Although no demographic variables are treated as active covariates to predict the classes, these variables can be used to describe segments. The demographic characteristics per segment are summarized in table 15.

Referenties

GERELATEERDE DOCUMENTEN

This may especially be achieved through improvised marketing interventions, which could cause virality and thus potentially increase firm value (Borah, Banerjee, Lin, Jain

In sum, except the moderating effect of the familiar brand Walmart, the product representation modes context background and AR have a greater impact on television and sofa

Additionally interaction variables were created between all the independent variables, with the main interaction variable called Inequality * Endorsement * SJS, this

[r]

III) the combination of online and offline advertisement on sales. H3b: The age of different consumer groups has a moderating effect on H3c: Income differences have a

By carefully adjusting the ridge waveguide cross-section, erbium concentration, and waveguide length, we demonstrated 20 dB of internal gain in spiral-shaped Al 2 O 3

As elucidated in the analysis above, most of the countries in the SADC region indeed have open economies – meaning they may stand to benefit from adopting a

The goal of this paper is to analyse the occurrences of self-citations with re- spect to different characteristics of papers (publication year, number of authors,