• No results found

The influence of car brand and other product characteristics on the price of new passenger cars.

N/A
N/A
Protected

Academic year: 2021

Share "The influence of car brand and other product characteristics on the price of new passenger cars."

Copied!
39
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Economics and Business

MSc Thesis Econometrics

The Influence of Car Brand and other Product Characteristics

on the Price of New Passenger Cars

Author: Dominic Bersee BSc 5934419 Supervisor: Dr. J.C.M. van Ophem 5934419 29 November 2013

(2)

Abstract

Previous studies regarding hedonic price functions on cars focused on technical specifications rather than brand name influence. In this study the influence of both brand name and several product characteristics on the price of new passenger cars are both quantified and analyzed. This is done by estimating three types of regression models, including brand name dummy variables/fixed effects in addition to basic technical specifications as explanatory variables. It is expected that brand name, which is tantamount to the exclusivity and image of a car brand, affects the price of a new passenger car.

The dataset consists of the technical specifications and standard prices of more than 3,200 cars. This data is measured annually from 2006 to 2013. The estimated models comprise multiple linear regression, ridge regression and fixed effects. A statistical procedure to detect the number of near linear dependencies between technical specifications is performed, in order to lower the degree of multicollinearity. Hence, a selection of the specifications that represent the essential features of a car is made.

The estimation results show that the influence of brand name in addition to product charac-teristics on the price of new passenger cars is evident. It turns out that consumers pay up to 100% or more for the brand name of exclusive brands compared to less expensive brands. Some inconsistencies between the estimates of the models are due to differences in estimation technique.

(3)

Acknowledgements

This academic paper forms the concluding part of the degree of Master of Science in Econo-metrics at University of Amsterdam. It focuses on the relationship between the price of new passenger cars and several observable characteristics of cars. The research for this thesis was conducted under the guidance of Hans van Ophem.

In the first place, I wish to express my gratitude to the University of Amsterdam and in particular the Faculty of Economics and Business for giving me the opportunity to fulfill my academic education. My gratitude especially goes to Hans van Ophem for taking the time as my thesis supervisor and for his assistance and feedback during my studies.

In the second place, I am grateful to DrivenData for providing me the data of new passenger cars. It would have been impossible for me to perform this kind of research without using relevant car data.

Finally, my appreciation goes out to my family and friends for giving me trust, encouragement and care during my academic education. I would particularly like to mention my parents for their moral and financial support. My family and friends have undoubtedly contributed to the interesting and informative time during my master degree programme.

(4)

Contents

1 Introduction 1

2 Literature Review 2

3 Data Description 4

3.1 The DrivenData Automotive Database . . . 5 3.2 Descriptive Statistics . . . 5 3.3 Data Structure and Selection Methods . . . 8

4 Hedonic Pricing: The Regression Models 9

4.1 Linear Regression . . . 9 4.2 Ridge Regression . . . 11 4.3 Fixed Effects with an Unbalanced Panel . . . 12

5 The Estimation Results 13

5.1 Interpretation and Estimation Assessment . . . 17 5.2 Recommendations . . . 20

6 Conclusion 21

References 22

A Linearity and Heteroscedasticity 24

B BKW Procedure for UK data 2006-2013 25

C Linear and Ridge Regression Estimates 30

(5)

1

Introduction

A quote can be very expressive and enlightening. The one below is about the emotions of a random consumer on automobiles and therefore considered as appropriate for the introduction to this thesis.

“Money may not buy happiness, but I’d rather cry in a Jaguar than on a bus.”

— Fran¸coise Sagan, a French playwright, novelist and screenwriter

Naturally, this statement cannot be applied to every consumer of a car. Consumers have different views about the utility of a car. Anyway, car manufacturers try to assess a customer’s willingness to pay versus the costs to produce a car in order to maximize their profit.

It is also possible to argue the other way around. The price of a new passenger car is affected by many factors. First, a car manufacturer faces all kinds of costs to create a car. Development and production costs are of particular importance. These costs are related to numerous product characteristics of a car, which represents a certain value to its customers. The second factor to take into account is the number of potential customers of a specific model with respect to the production quantity. Third, the competitiveness of the sale price of a car brand compared to similar cars of other brands plays a role in the pricing of a new passenger car. The latter two factors are related to the exclusivity and image of a car brand, which could be referred to as brand name. These main factors that determine the price of a modern new car have not yet been quantified in such a specific nature as in this study.

Previous studies are either outdated or not comparable to the research in this thesis. Griliches (1961) was one of the first who investigated the relationship of automobile prices in the US to various dimensions of an automobile in 1937, 1950, and 1954 through 1960. How-ever, only a few explanatory variables were used and the modern automotive market looks totally different from the market of more than fifty years ago.

Arguea and Hsiao (1993) wrote an article that can be considered to be the most comparable to the research for this thesis in terms of regression analysis. They examined the econometric issues for estimating hedonic price functions using US automobile demand as an example. They stated that is possible to combine economic theory with econometric techniques to analyze the implicit market for characteristics where no direct transactions are observed, although many issues remain. Selecting representative characteristics for differentiated products is the most important issue discussed here. Passenger cars have considerably changed over the years though. The latest developments show an emphasis on alternative power sources, safety measures and electrical equipment. And looking from a different perspective, there is an undeniable impact of the worldwide financial crisis on the automotive market, especially on car sales (Sturgeon and Van Biesebroeck, 2009), and the emergence of the Asian market (Sturgeon and Florida, 2000). Besides these arguments, the fixed effects model has formally not been used before to estimate the price of new passenger cars using panel data on cars.

(6)

More recently, Kuiper (2008) performed a multiple regression analysis on used car prices. Ex-planatory variables like body style, cylinders, liters (a measure of engine size), number of doors and dummy variables for some luxury features correspond to the explanatory variables used in this research. However, the mileage/odometer of cars is one of the most important explanatory variables in this study, but this is irrelevant for new cars.

An appropriate method to decompose the item researched into its constituent characteristics is the so-called hedonic pricing method. This method is based on the principle that the price of a marketed good is affected by certain external environmental or observable factors that can raise or lower the “base” price of that gooda. A requirement is that the composite good being valued can be reduced to its components and that the market values those components. I use real automotive panel data in combination with econometric techniques in such a way that clear results and conclusions can be drawn. The classical assumptions regarding regression analysis, including a linear relationship between the explanatory variables and the dependent variable, independent standards errors, homoscedasticity, and so forth, are utilized.

The structure of this thesis is as follows. Section 2 contains a literature review, which is followed by a data description in section 3. A brief discussion about the data sources and the gathering of the data can be found in subsection 3.1, the descriptive statistics in subsection 3.2 and two types of data selection in subsection 3.3. Subsequently, section 4 provides a description of the three models (subsection 4.1 to 4.3). Section 5 contains a brief display and discussion of the estimation results. An interpretation and assessment of the estimates and some recommen-dations can be found in subsection 5.1 and 5.2 respectively. Ultimately, the conclusion is given in section 6.

2

Literature Review

The introduction already discussed very briefly three studies on the pricing of cars. The first study was Griliches (1961), who investigated the relationship of automobile prices in the US to various dimensions of an automobile. In the second study, Arguea and Hsiao (1993) estimated a hedonic price function using panel data on US car characteristics. The third and last was Kuiper (2008), who performed a multiple regression analysis on used car prices. In this section, a more comprehensive overview of other and some related studies is given.

Going back to Griliches (1961), his main purpose was to investigate a relatively old, simple, and straightforward method of adjusting for quality change on measured car prices and price indexes. Furthermore, he wanted to find out whether (1) this method is feasible and operational, and (2) whether the results are promising and different enough to warrant the extra investment. He tried to derive implicit specification prices from cross-sectional data on the price of various “models” of the particular item (in this case a car). He used these in pricing the time series change in specifications of the chosen (average or representative) item. Eventually, he concluded that all the apparent increases in car prices between 1950 and 1960 can be explained by quality

a

(7)

improvements. The hedonic price index, which is a price index adjusted for changes in the quality of goods, is actually falling during 1950-1960. Triplett (1969) devoted some further consideration to the upward bias of price indexes because of quality change. He found that the hedonic measures employed by Griliches indicate negligible quality improvement in automobiles and provide no substantiation for the belief in an upward quality bias in the Consumer Price Index (CPI) for the 1960-65 period. However, the existence of biases in the hedonic indexes themselves limit their validity as measures of quality change.

In the 1970s, Cowling and Cubbin (1972) explained the derivation of an price index for the UK car industry and compared the movements of this index with those of other indices describing year-to-year changes in prices. This was followed up by Cubbin (1975), who looked at some of the aspects of pricing behaviour in the UK private motor car assembly industry 1956-1968 where quality and quality change is thought to be particularly important. He improved techniques of devising quality-adjusted prices and developed price-cost margins. In 1976, Ohta and Griliches took up a few limited topics in the analysis of automobile prices. They focused on the role of “makes” or “brands” in explaining price differentials among different models of automobiles. Additional information was to be derived from analyses of used car prices, and the gains to be had, if any, from using performance instead of physical (specification) characteristics in defining the relevant attributes of a commodity.

Atkinson and Halvorsen (1984) applied a hedonic procedure to estimate the effects of gaso-line price on the demand for automobile attributes and fuel efficiency. This procedure involved the direct application of a comparative statics analysis circumventing the problems of identifi-cation and severe multicollinearity affecting previous hedonic studies. The results in this paper indicated that the effect of induced changes in automobile attributes in response to increases in the price of gasoline is to substantially increase fuel efficiency. Feenstra (1988) investigated the quality change in Japanese car and truck imports over 1979-1985. He found substantial upgrading in Japanese car imports, with ambiguous quality change in trucks. Also, one half of the nominal increase in car prices over 1980-1985 is explained by quality improvement.

Bajic (1993) treated the automobile as a differentiable product and used a two-step ap-proach to estimate a system of structural demand equations for automobile attributes. His obtained results are statistically solid and consistent with theoretical expectations. He argues that this should be taken as an indication of the validity and robustness of his procedure. Also in the year 1993, Arguea and Hsiao published an article that has the most in common with this thesis. They considered some econometric issues associated with the characteristics approach to simplify complex market structures, where many differentiated products interact, to arrive at a smaller number of homogeneous attributes. These econometric issues include the choice of functional form for the hedonic price equation, the data requirement for the identification of market demand and supply of characteristics, and the practical method for selecting char-acteristics to represent differentiated products. Silver (1996) wrote a survey of the theoretical and empirical literature on the hedonic price model. He has sought to draw attention to the importance of understanding and accounting for quality changes. An analysis of time series or

(8)

cross-section price data without practical tools to adjust for quality changes may be misleading. Therefore, a theoretical framework is described, based on identifying quality in terms of bundles of characteristics. Furthermore, he related this theory directly to practical empirical issues. The treatment of these issues, such as omitted variables bias, multicollinearity, functional form and weights, are being grounded in the hedonic theory.

Murray and Sarantis (1999) have employed a complete set of panel data on UK car charac-teristics to estimate a hedonic car price model. This enabled them to examine price differences between various car models in terms of variations in individual car characteristics. They also paid greater attention to the specification of the hedonic price model than previous studies, as shown by the wide range of diagnostics reported. A second objective of the paper was to utilise the estimates of the hedonic price model to construct a hedonic price index for cars, which allowed them to investigate the increase of car prices due to quality and non-quality factors. They concluded that price differences between car models can indeed be explained by variations in individual car characteristics. In addition, they highlighted the importance of differences in pricing policies and strategic factors between car manufacturers.

In the article of Andersson (2005), a hedonic regression technique was used to estimate the value of traffic safety, using information from the Swedish automobile market. The results from the study showed that the market price of an automobile is negatively correlated with its inherent risk level, i.e. Swedish car consumers pay a safety premium for safer cars. Kuiper (2008) discussed the construction of a multivariate regression model to predict the retail price of 2005 General Motor (GM) cars. To be more precise, data collected from Kelly Blue Bookb for several hundreds of these used GM cars is used to develop a multivariate regression model to determine car values. These values are based on a variety of characteristics, such as mileage, make, model, engine size, interior style, and presence of cruise control.

The next part of this thesis consists of own quantitative research, starting with the data de-scription in section 3. Occasionally, references are made to the studies in this literature review.

3

Data Description

The dataset in this thesis consists of both technical data and standard prices of new passenger cars on sale in the UK. It is updated annually from 2006 to 2013. Some information about its source and the gathering of the data is given in subsection 3.1. Subsection 3.2 outlines some descriptive statistics about the car brands, prices and specifications. Finally, in subsection 3.3, the data structure is discussed and two types of data selection are proposed, in order to frame samples which are more diversified and where data loss is relatively reduced while estimating the parameters of the hedonic pricing models.

b

(9)

3.1 The DrivenData Automotive Database

The DrivenData Automotive Database contains technical specifications about every new pas-senger car on sale in the UK. All new models are listed, excluding special or limited editions. This includes more than 3,200 cars and 72 different variables of each car, specifications from engine detail and performance measures to CO2 emissions and service intervals. The data is

monitored, updated and validated daily by DrivenData through contact with manufacturers to ensure accuracy. Hereafter, this dataset will be called the UK data.

3.2 Descriptive Statistics

A selection of descriptive statistics that is relevant for this research includes car brand, prices and technical specifications. First, a boxplot of car prices per brand can be found in Figure 1. Looking for any brand-specific fixed effects, this boxplot provides an overview of the price dis-tribution per brand of the UK data from 2006 to 2013. Figure 1 shows that in the UK market, 67.8% of all car brands have a price range from £5,000 to £80,000. Further, it can be seen that the variance increases as car manufacturers are in a higher price segment. Manufacturers with a median price of £20,000±10,000 have an average standard deviation of £5652.41, compared to an average variance of £27,858.53 of manufacturers with a median price of £50,000 or higher.

£0 £80,000 £160,000 £240,000 £320,000 Wes tf ie ld Vo lvo V o lk sw ag en V au xh all TV R To yo ta Su zuki Su b ar u Ss an gY o n g Sma rt Sk o d a Se at Saa b Ro lls -Ro yce R en au lt P ro to n Po rs ch e P eu ge o t Per o d u a N o b le N is san Mo rg an M its u b is h i M in i MG Me rc e d es -B en z M cL are n Maz d a Ma yba ch M as e rat i Lo tu s Le xu s La n d R o ve r La mb o rg h in i Kia Jee p Ja gu ar Infin it i Hu mm er H yu an d ai H o n da G in et ta Fo rd Fi at Fe rr ari Do d ge Daih ats u D acia Co rve tte Citr o en Chr ysl er Ch e vr o le t Cate rh am Cad ill ac B MW B en tl ey A u d i A st o n M art in A lf a R o meo

Figure 1: Car price per brand.

When using panel data, it is also possible to look for potential year-specific fixed effects. By creating a violin plot, which is a combination of a box plot and a kernel density plot, it becomes visible to what extent any year-specific fixed effects exist. Additionally, the distribution of car prices (the dependent variable in the regression models) can be seen through the kernel density plot. Figure 2 contains this year-specific violin plot, which shows that there are hardly any year-specific fixed effects. Although the average car price increases every year, the effect of time

(10)

on prices is nevertheless considered negligible. From this point on, the assumption is made that the average price increase is due to an improved quality of the cars and time has no fixed effectc. Subsequently, the shape of the violins can be analyzed. The widest parts of the violins are at the left end of both figures, which means that the majority of the prices are below average and the remaining minority consists of upward outliers. This finding suggests that the functional form of the hedonic price regression should be semi-logarithmic to make this variable better fit the regression underlying assumptions. In addition to the shape of these violin plots, the arguments in Murray and Sarantis (1999, p. 9) to utilize a semi-log form in hedonic price regressions are followed.

0e+00 1e+05 2e+05 3e+05

2006

2008

2010

2012

Car price (£)

Figure 2: Violin plots of car prices per year.

To give a first impression of the regression models in section 4, a simple linear regression of car prices on engine power is performed. Engine power has always been a major numerical “quality” variable in hedonic price regressions on cars just as weight and length since Court (1939). Figure 3 provides, with data from 2013, plots of both car prices and the logarithm of car prices against engine power, including least squares regression lines. At first sight, the relationship between the two variables becomes more linear with the logarithm transformation. A linear relationship using the untransformed form of car prices looks, in particular on the right hand side of these plots, not evident. In order to perform this simple regression which “both characterizes the data and meets the conditions required for accurate statistical inference” (Cohen and Cohen, p. 233), the semi-log form is preferred.

Figure 3: UK data. 0 50000 100000 150000 200000 250000 300000 0.00 50.00 100.00150.00 200.00 250.00300.00 350.00 400.00450.00 500.00 Pr ic e (£) Engine power (kW) 2013 2013 Exponential function

(a) Price vs. engine power

R² = 0.7582 0.00 5.00 10.00 15.00 0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00 Log (pr ice) (£) Engine power (kW) 2013 2013 Linear function

(b) Log(price) vs. engine power

cMurray and Sarantis (1999) also concluded that price differences between car models can be explained by

(11)

Table 1 shows some descriptive statistics of the technical specifications. The first specifi-cation is a series of seven dummy variables comprising body style. The second one is a series of three dummies for transmission type. The dummy variable “automatic optional” means a car has standard manual transmission, but an upgrade to automatic at extra cost is possible. The third specification is wheel drive and the fourth and last specification with dummies is fuel type. The UK data has originally nine different fuel types, but six of these fuel types, such as (partly) electric drive and LPG, are relatively rare. These fuel types are aggregated to a dummy called “alternative fuel type” in Table 1. Furthermore, note that cars with fuel type “unleaded” can also run on super unleaded, but the same does not hold for cars with fuel type “super unleaded”, which are generally sportier model versions. It is noticeable that among the three major numerical “quality” variables in hedonic price regressions on cars (engine power, weight and lengthd), engine power has a coefficient of variation (CV) more than twice as high as weight. The difference between the CV of engine power and circumference is even greater. Looking back to the plots including regression lines in Figure 3, it can be said that a relatively high R-squared of 0.7582 and high degree of variability make engine power a very important explanatory variable.

Table 1: Descriptive statistics.

Variable UK data mean SD hatchback 0.35 * saloon 0.14 * estate 0.17 * people carrier 0.10 * 4 x 4 0.09 * coupe 0.07 * open car 0.08 * manual transmission 0.57 * automatic 0.28 * automatic optional 0.15 * front-wheel drive 0.67 * rear-wheel drive 0.18 * four-wheel drive 0.15 * unleaded 0.47 * super unleaded 0.03 * diesel 0.48 *

alternative fuel type 0.01 *

no. of gears 5.81 0.74 fuel consumption (mpg) 47.09 12.27 range (km) 962.98 229.37 engine power (kW) 122.00 64.26 torque (lb·ft) 211.38 94.20 power-to-weight ratio (W/kg) 80.25 34.23 no. of standard luxury features 17.61 3.32 no. of n/a standard luxury features 4.72 2.84 warranty time (years) 3.15 0.68 warranty time (miles) 66318.24 16113.32 luggage capacity seats up (litres) 403.45 149.58 number of seats 4.88 0.90 kerb weight (kg) 1488.43 314.27 circumference (m) 12.49 0.90 total number of observations 32286

* Standard deviation (SD) equalspmean(1 − mean).

d

Length is replaced by circumference (m) as the last major numerical “quality” variable in this research, because the size of cars is considered as more important than length.

(12)

3.3 Data Structure and Selection Methods

The division of cars in the UK data is as follows. Each car in the dataset has a name that consists of three parts and each part gives more specifics about the type of car. An example is “Alfa Romeo 147 1.6 Twin Spark Turismo 3dr”. The first part of this name consists of the manufacturer name “Alfa Romeo”, followed by the model type “147”, and finally the version of the car “1.6 Twin Spark Turismo 3dr”. The version is typically a description of the engine type, gearbox, a specific technology that separates the car from others (e.g. “quattro” technology of Audi or “BlueMotion” technology of VW) and the number of doors. Every car has also an identification code, which is unique in every annually obtained cross-section dataset. Table 2 provides an illustration of this data structure.

Table 2: The manufacturer/model/version structure of a car.

Manufacturer Model Version Identification code

version i ONEA002 version ii ONEA004 version i ONEB001 version ii ONEB002 version iii ONEB003 version iv ONEB005 version i TWOA001 version ii TWOA003 model b version i TWOB002 version i TWOC006 version ii TWOC007 version iii TWOC009 manufactur one model a model a model b model c manufacturer two

Using all observations in the regression models is possible. In the linear regression model, there are no drawbacks. However, 18.13% of all observations are loste in the fixed effects

model, because this percentage of cars is only observed once during 2006 to 2013. It frequently occurs that a car is marketed in a given year, gets a small update or facelift during that year for some reason (e.g. suffering from teething problems), and the car continues under another identification code the following year. Reducing this loss of data by creating a new identification code of “manufacturer+model+version” is counterproductivef. However, it is possible to bring the percentage of relative data loss down if a number of selection methods are implemented. These selection methods can serve as a specification check with regard to the estimation results using all the observations. The selection methods can either take place at car version level (this entails a data lossg of 3.59%) or at car model level (a data lossg of 0.72%). Assumptions have to be made about which car models or versions to select, but these assumptions are discussable and modifiable. Two different selection methods at car version level are described below and used in subsection 4.3.

e

If the cars are linked by their identification code through 2006-2013.

f

Using this new “code” to link the cars annually through the eight-year period results in a data lossg of 23.75%. g Data loss = P ivi1 P i P8

j=1vij, where vij = j if car version i occurs j ∈ {1, 2, ..., 8} times through 2006-2013 and

(13)

First, one can select the cheapest versions of each car model under the assumption that these versions are by far the most sold and therefore the most representative of each car model. The remaining more expensive versions are omitted. A lot of data is discarded through this method, but this method reduces the relative loss of data substantially.

The second selection method is almost similar to the previous one: the versions with the median price are selected. The assumption here is that every customer is sure about which manufacturer and model to choose, and without considering, a choice is made for the version with the median price.

4

Hedonic Pricing: The Regression Models

In this section, four different regression models are presented. These models measure the influ-ence of car brand and several product characteristics through their regression coefficients. The benefits, drawbacks and mutual trade-offs of these models are also discussed.

4.1 Linear Regression

The first model is a straightforward multiple linear regression, estimated by the method of ordinary least squares (OLS). This standard cross-section regression model can be written in vector notation as

y = α + Xβ + ε, (1.1)

where y is the n × 1 vector of dependent variables, α is the intercept, X is the n × k regression matrix, β is the k × 1 vector of regression coefficients, and ε is the n × 1 error vector. Note that n and k are the number of observed cars and explanatory variables respectively.

The two major assumptions made in this regression model (with standard estimation tech-niques) involve weak exogeneity and linearity. It is assumed that there is weak exogeneity, as the explanatory variables are determined by factors outside the model and they are not con-taminated with measurement errors. The assumption of linearity, which means that dependent variable yi(i.e. element i from vector y) is a linear combination of the regression coefficients and

the explanatory variables, also holds. Previous researchh has shown that a linear relationship is plausible and the residual plotsialso indicate that linearity is an acceptable assumption. These

residual plots can also be used to detect the presence of (severe) heteroscedasticity. It can be seen that, on average, the absolute value of the studentized residuals increases slightly whenbyi increases, which indicates existence of some heteroscedasticity. Therefore, the estimated stan-dard errors in this regression are heteroscedasticity-consistent (i.e. robust to disturbances that are heteroscedastic). The independence of errors is examined by the Durbin-Watson statistic, which indicates slight positive serial correlation, but the assumption of independent errors is nevertheless made because of delimitation of this research. The main focus is on the last as-sumption, which is absence of severe to perfect multicollinearity. This is also one of the main issues in previous studies regarding hedonic car price models.

h

Practically all articles in the Literature Review section.

i

(14)

The question is which explanatory variables to select. As stated in Arguea and Hsiao (1993), using a great number of explanatory variables makes the description of a car as a differentiable product more thorough. However, it is unlikely that a large number of components and specifications should be used, because of the following reasons:

• It is believed that the majority of car consumers is not aware of every component or specification and looks, more limited, at (and pays for) general quality aspectsj such as comfort, durability, economy, manoeuvrability, performance, safety and luxury;

• More homogeneous attributes and aggregated specifications are measurements with great explanatory power when many cars and brands are analyzed at the same time;

• Many characteristics are highly collinear due to technical or other considerations, which causes the problem of unstable or even nonsensical estimates of the regression coefficients; • Not every specification or technical detail of every car is disclosed by a manufacturer or

observed by data collectors for several reasons.

However, reducing the number of components and specifications too much could lead to a bad representation of the essential features of a car. In order to arrive at an appropriate representa-tion of cars, a statistical procedure originally suggested by Belsley, Kuh, and Welsch (1980) is used. This procedurekreduces the dimensionality of the variates, based on the conditional index measure together with the variance decomposition method. The first step in this procedure is the decomposition of the regression matrix:

X = UDV0, (1.2)

where U is a n×k matrix and both D and V are k ×k matrices. Furthermore, U0U = V0V = Ik

and D is a diagonal matrix containing the eigenvalues φj (j = 1, 2, .., k) of X on the diagonal.

Subsequently, the variance-covariance matrix of bβ can be written as var( bβ) = ˆσ2ε(X0X)−1= ˆσε2VD−2V0 ⇔ var( bβk) = ˆσ2ε

k

X

j=1

(υ2ij/φ2j). (1.3) Belsley, Kuh and Welsch (1980) then define the k condition indexes

κ(X) = φmax/φj for j = 1, 2, .., k. (1.4)

Large values of κ(X) are associated with linear dependencies among the columns of X. As a rule of thumb, it is suggested that moderate to severe multicollinearity exists for condition indexes of larger than 30. Finally, the variance proportions of the k-th regression coefficient associated with the j-th component of its decomposition are calculated to detect which specific linear dependencies exist:

πji = δij/δi for i, j = 1, 2, .., k, (1.5)

where δij = υ2ij/φ2j and δi = Pkj=1δij. These k × k variance decomposition proportions are

collected in the variance decomposition matrix Π. This matrix Π, together with the k condition indexes, are summarized in Table 3.

j

Murray and Sarantis, 1999.

k

(15)

Table 3: The BKW procedure.

κ(X) var( bβ1) var( bβ2) . . . var( bβk) φmax/φmax π11 π12 . . . π1k

..

. ... ... . .. ...

φmax/φmin πk1 πk2 . . . πkk

Belsley et al. (1980) propose that unacceptable high dependencies between two or more ex-planatory variables exist if the corresponding variance decomposition proportions are larger than 0.50, combined with κ(X) > 30.

4.2 Ridge Regression

In a linear regression, it is possible that the effect of multicollinearity reduces the precision of the OLS estimates, and that the coefficients are unstable and/or they may have the wrong sign. Hoerl and Kennard (1970) suggested the ridge regression as an alternative procedure to OLS, in particular when multicollinearity exists. This ridge regression penalizes the size of the regression coefficients. It is defined as the value of β that minimizes

(y − Xβ − α)0(y − Xβ − α) + λβ0β (2.1) instead of minimizing

ε0ε = (y − Xβ − α)0(y − Xβ − α) (2.2) in the OLS method. The result of minimizing (2.1) leads to the ridge estimator, which is given by

b

βridge= (X0X + λIk)−1X0y, (2.3)

compared to the OLS estimator b

βOLS = (X0X)−1X0y, (2.4)

which is the result of minimizing (2.2). Note that the only difference between (2.3) and (2.4) is the addition of parameter λ down the diagonal of X0X. This parameter represents the ridge regression penalty, which has an effect of shrinking the estimates towards zero:

b

βridge→ 0 as λ → ∞ and βbridge→ bβOLS as λ → 0.

When comparing different kind of regression models, there are typical trade-offs involved. This is also the case here: the ridge estimator introduces a downward bias, but the variance of the estimate is reduced. The bias and variance of the ridge estimator are given by

bias( bβridge) = −λQβ and var( bβridge) = σ2QX0XQ, (2.5)

where Q = (X0X + λIk)−1.

With regard to the selection of λ, a way to estimate an appropriate value of this parameter is offered by generalized cross-validation (GCV). The GCV estimate bλ is the minimizer of V (λ) given by V (λ) = 1 nk(In− H(λ))yk 2 1 ntrace(In− H(λ)) 2, (2.6)

(16)

where H(λ) = X(X0X + λIk)−1X0. The “existence theorem”, proven by Hoerl and Kennard

(1970), states that there always exists a value of λ for which the mean squared error (MSE) of the ridge estimator is smaller than the MSE of the OLS estimator. In section 5 it is checked whether the GCV estimates bλ are indeed such that MSE( bβridge) < MSE( bβOLS).

4.3 Fixed Effects with an Unbalanced Panel

Thus far, only cross-section regression models are presented, although the data actually com-prises panel data. In order to exploit the benefits of panel data, such as modeling both static and dynamic relationships, the fixed effects model is also proposed. The fixed effects model can be written as

yit= αi+ x0itβ + εit, (3.1)

where yit is the observed dependent variable of car i at time t, αi is the unobserved

time-invariant individual effect, x0it is the time-variant 1 × k regression vector, β is the k × 1 vector of regression coefficients and εit is the error term.

The benefit of the fixed effects model is that it includes individual-specific effects besides the explanatory variables in a linear (or ridge) regression. These individual-specific effects can be interpreted as the value of specific car models, corrected by yet to be determined explanatory variables. Subsequently, these individual-specific effects can be aggregated to brand-specific effects by calculating the average of the individual-specific effects for each brand. In this way, the influence of car brand is examined in a completely different way than in previous models. A downside of using fixed effects for brand name influence instead of dummy variables (in the linear regression) is that all possible influences beyond brand name are also included in the fixed effect terms, excluding the selected explanatory variables.

The UK data is an unbalanced panel, because not every car is observed every year in the UK data. The mechanics of fixed effects estimation with an unbalanced panel is not much more difficult than with a balanced panel. Exactly the same procedure is followed, with the same fixed effects estimator

b

βFE= (X0ZX)−1X0Zy, (3.2)

where Z = diag(A1, A2, ..., An). Instead of Ai = diag(IT − e1Te0T/T ) for balanced panels, in

this case Ai = diag(ITi− e

1

Tie

0

Ti/Ti) is used, where e

1

Ti is a Ti× 1 vector of ones and Ti is the

number of existence years of car version i.

An important issue with an unbalanced panel is determining the reason why the panel is unbalanced. The two panels used in this thesis are unbalanced because of a process called “attrition”. This attrition process is the entry and drop off of cars over time. If this attrition happens randomly, it is acceptable to estimate the model. If this is not the case, there is a missing data problem, which results in biased and inconsistent estimatesl. It is not allowed that the reason for cars entering and dropping out of the dataset is correlated with the error term. In the UK data, there are various reasons for the existence of attrition:

(17)

• As stated earlier in this thesis (p. 10), it frequently occurs that a car is marketed in a given year, gets a small update or facelift during that year for some reason, and the car continues under another identification code the following year;

• Car models disappear because their production stops, which is the result of decreasing popularity and hence decreasing sales;

• Car models disappear because of a changing vision or policy of a manufacturer;

• Car models disappear because of the disappearance of a manufacturer (because of bankruptcy or buyout by another manufacturer);

• Car models disappear because only a predetermined number at a relative short time period is produced (generally 1-3 years). This involves the more exclusive and expensive cars for a relatively small group of customers.

The vast majority of the cars are “attrited” due to the first two reasons, and in particular because cars continue under another identification code. It is assumed that the first reason is not correlated with the error term, i.e. the attrition happens randomly, such that this crucial condition, required for accurate statistical inference, is met.

5

The Estimation Results

The first challenge is to determine the number of explanatory variables, and select the variables with the most explanatory power. The list of eligible variables can be found Table 1. This list includes dummy variables for body style, transmission type, wheel drive and fuel type and also fourteen non-dummy variables. The statistical BKW procedure, which results in the condition index measure κ(X) and the variance decomposition matrix Π, is performed for these fourteen non-dummy variables. The dummy variables from Table 7 are excluded, because the BKW procedure is not designed for these type of variables. The results can be found in Appendix B. In these tables, the rows where κ(X) > 30 (i.e. where moderate to strong relations exist) are analyzed. By locating high variance decomposition proportions in the rows where κ(X) > 30, the variables where near linear dependencies exist, are identified. Variance decomposition proportions are high if πij > 0.25 instead of πij > 0.50 as in Arguea and Hsiao (1993). This

procedure is followed for every year.

The number of variables where a linear relationship is detected is never less than 9m. Given

this number, a set of 13 − 9 = 4 representative variables is selected: engine power (kW), kerb weight (kg), circumference (m) and number of standard luxury features. The somewhat arbitrary choice of these variables is based on their recurring appearance in previous studiesnand

common sense. The first three specifications relate to the major numerical “quality” variables

m

For 2006 and 2007, the number of variables where a linear relationship is detected equals 8, but variable “number of seats” is omitted, because this data is not available.

n

(18)

in hedonic price regressions on cars. The fourth variable is also used in Murray and Sarantis (1999), where its estimated coefficient was significant at the 5% level and positive in all periods. The condition indexes and variance decomposition proportions for these four selected vari-ables in 2013 are also computed, see Table 4. It can be seen that high dependencies still exist: the two rows where κ(X) = {437, 1192} contain high variance decomposition proportions for two variables. However, reducing the number of explanatory variables even more could lead to a bad representation of the essential features of a car. Therefore, no more non-dummy explanatory variables are included in the regression models.

Table 4: BKW procedure for the four selected variables 2013.

κ(X) engine power (kW) kerb weight (kg) circumference (m) no. of std. luxury features

1 0.00 0.02 0.00 0.00

27 0.95 0.08 0.00 0.00

437 0.02 0.74 0.03 0.39

1192 0.04 0.16 0.97 0.61

• The number of cars is reduced from 5224 to 4537 due to missing data.

On top of these four variables, dummy variables are added for three specific body styles: estate, coup´e and convertible. The hypothesis is that a car with one of these three body styles is in general more expensive than hatchback, saloon, people carrier and four-wheel drive cars after correction for weight and circumference. The extra cost for an estate is the additional practicality. The extra cost for a coup´e and a convertible is the additional attention paid to their design and appearance. On top of this, extra cost for a convertible arise because of the technology that is involved to create a convertible.

For the fuel type diesel, also a dummy is added, because cars on diesel are relatively more expensive than cars on unleaded gasoline, which cannot be explained by the other explanatory variables. Finally, because the emphasis of this thesis is on the influence of car brand, dummy variables for brand name are added to the regression matrix. The reference brand is Perodua, which is a Malaysian car company with the lowest average car priceo between 2006 and 2013.

In the ridge regression, parameter λ is determined by minimizing V (λ). This function is decreasing from λ = 0, until at a given point a minimum is reached. After this, the function starts to increase and its slope becomes steeper. Figure 4 provides a graphical illustration of V (λ) for 2013. This behaviour is observed for all years.

0 50 100 150 200 2.264810e−06 2.264820e−06 Index ridge .regression.2013$GCV

Figure 4: Plot of V (λ) with λ = seq(0, 0.2, 0.001).

o

(19)

In order to provide a brief overview of regression estimates, the annually averaged regression coefficients of both the linear and ridge regression can be found in Table 5. The car brands in this table are sorted by the height of their averaged regression coefficient. For the annual linear and ridge regression estimates, a reference is made to Appendix C. The heteroscedasticity-consistent standard errors of the linear regression can be found in Table 18. The standard errors of the ridge regression are omitted, because they not very meaningful for strongly biased estimates such as the ridge regression. Goeman et al. (2012, p. 18) argue that reporting a standard error of a penalized estimate can give a mistaken impression of great precision, completely ignoring the inaccuracy caused by the bias.

Table 5: Annually averaged regression estimates.

Variable βbOLS βbridge Variable βbOLS βbridge

Rolls-Royce 1.381∗∗∗ 1.376 Honda 0.329∗∗∗ 0.324

Morgan 1.369∗∗∗ 1.364 Alfa Romeo 0.319∗∗∗ 0.315

McLaren 1.353∗∗∗ 1.350 Toyota 0.311∗∗∗ 0.307 Caterham 1.330∗∗∗ 1.325 Ford 0.306∗∗∗ 0.301 Lamborghini 1.292∗∗∗ 1.287 Renault 0.300∗∗∗ 0.295 Ferrari 1.264∗∗∗ 1.259 Citroen 0.293∗∗∗ 0.289 Maybach 1.108∗∗∗ 1.103 Cadillac 0.289∗ 0.285 Westfield 1.104∗∗∗ 1.099 Saab 0.278∗∗∗ 0.273

Aston Martin 1.064∗∗∗ 1.059 Fiat 0.277∗∗∗ 0.273

Lotus 1.051∗∗∗ 1.047 Peugeot 0.276∗∗∗ 0.271

Noble 1.046∗∗∗ 1.041 Nissan 0.260∗∗∗ 0.255

TVR 0.829∗∗∗ 0.823 Mitsubishi 0.249∗∗∗ 0.244

Porsche 0.821∗∗∗ 0.816 Mazda 0.232∗∗∗ 0.228

Ginetta 0.765∗∗∗ 0.759 Skoda 0.221· 0.216

Bentley Motors 0.750∗∗∗ 0.746 Seat 0.219∗∗ 0.215

Maserati 0.745∗∗∗ 0.740 Chrysler 0.2180.213 Jaguar 0.514∗∗∗ 0.509 Suzuki 0.194 0.189 Corvette 0.483∗∗∗ 0.478 Jeep 0.176 0.171 Mercedes-Benz 0.481∗∗∗ 0.477 Daihatsu 0.150∗∗ 0.145 Lexus 0.480∗∗∗ 0.475 Hyundai 0.135 0.130 Audi 0.474∗∗∗ 0.469 Chevrolet 0.130 0.125 Smart 0.463∗∗∗ 0.458 MG 0.113 0.109 BMW 0.449∗∗∗ 0.444 Kia 0.089 0.085

Land Rover 0.431∗∗∗ 0.426 Hummer 0.065 0.061

Infiniti 0.400∗∗∗ 0.395 Dacia 0.064 0.061 Volvo 0.390∗∗∗ 0.385 Dodge 0.019 0.014 Vauxhall 0.349∗∗∗ 0.344 Perodua 0.000 0.000 Mini 0.349∗∗∗ 0.344 SsangYong -0.043 -0.047 Subaru 0.342∗∗∗ 0.338 Proton -0.052 -0.057 Volkswagen 0.335∗∗∗ 0.330

engine power (kW) 0.0033∗∗∗ 0.0033 estate -0.002 -0.002 kerb weight (kg) 0.0003∗∗∗ 0.0003 coup´e 0.091∗∗∗ 0.091

circumference (m) 0.0818∗∗∗ 0.0818 convertible 0.143∗∗∗ 0.143

no. of luxury features 0.0264∗∗∗ 0.0264 diesel 0.067∗∗∗ 0.067 • Significance level ‘***’: all p-values through 2006-2013 are < 0.001.

• Significance level ‘**’: all p-values through 2006-2013 are at least < 0.01. • Significance level ‘*’: all p-values through 2006-2013 are at least < 0.05. • Significance level ‘·’: all p-values through 2006-2013 are at least < 0.1. • Significance level ‘ ’: at least one p-value through 2006-2013 is > 0.1.

It turns out that the difference between both regressions is minimal. The MSE of the ridge regression is in five out of eight years slightly lower, but the estimated coefficients are also slightly downward biased, as discussed in subsection 4.2. The exact numbers can be found in Table 16 (Appendix C).

(20)

The next model to estimate comprises fixed effects. The first estimation of this model con-tains all observations, where the cars are linked by their identification code through 2006-2013. The estimatesp are presented in Table 6. The two selection methods, discussed in subsection

3.3, are implemented in the second and third estimation of the FE model. These estimatespcan be found in Table 20 and 21 respectively (Appendix D). Three car brands, Ginetta, Noble and TVR, are dropped in the estimation of the FE model, because these brands are only observed once (in 2006). Furthermore, the car brands in these tables are again sorted, this time by the height of their averaged individual-specific effect.

Table 6: Averaged fixed effects estimates.

Variable βbFE SD( bβFE) n Variable βbFE SD( bβFE) n

Rolls-Royce 1.972∗∗∗ (0.049) 6 Jeep 0.393∗∗ (0.038) 39

Maybach 1.906∗∗∗ (0.052) 4 Perodua 0.372∗∗∗ (0.032) 13

Ferrari 1.538∗∗∗ (0.049) 9 Alfa Romeo 0.338 (0.037) 143

Lamborghini 1.457∗∗∗ (0.052) 16 Mini 0.336∗∗ (0.035) 75

Bentley 1.446∗∗∗ (0.052) 21 Chrysler 0.335 (0.038) 47

Aston Martin 1.433∗∗∗ (0.047) 27 Mitsubishi 0.332 (0.036) 97

McLaren 1.430∗∗∗ (0.054) 2 Vauxhall 0.330 (0.035) 596

Maserati 1.128∗∗∗ (0.045) 14 Ford 0.301 (0.038) 633

Morgan 1.105∗∗∗ (0.039) 15 Toyota 0.294 (0.036) 254

Porsche 1.042∗∗∗ (0.044) 86 Peugeot 0.280 (0.037) 459

Lotus 0.856∗∗∗ (0.038) 16 Nissan 0.276 (0.036) 179

Land Rover 0.805∗∗∗ (0.040) 115 Citroen 0.273 (0.036) 286

Jaguar 0.738∗∗∗ (0.042) 119 Renault 0.262 (0.038) 457 Lexus 0.734∗∗∗ (0.038) 64 Hyundai 0.251 (0.037) 189 Caterham 0.725∗∗∗ (0.037) 25 Seat 0.237 (0.037) 205 Mercedes-Benz 0.676∗∗∗ (0.040) 629 Mazda 0.235 (0.036) 132 Infiniti 0.666∗∗∗ (0.039) 43 Skoda 0.220 (0.036) 389 Corvette 0.660∗∗∗ (0.046) 8 SsangYong 0.215 (0.039) 25 BMW 0.635∗∗∗ (0.039) 719 Chevrolet 0.211 (0.036) 82 Audi 0.605∗∗∗ (0.039) 885 Proton 0.210∗ (0.031) 15 Hummer 0.571∗∗∗ (0.044) 5 Kia 0.207 (0.037) 198 Westfield 0.565∗∗∗ (0.036) 6 Dodge 0.187 (0.038) 22 Cadillac 0.557∗∗∗ (0.041) 48 Daihatsu 0.181 (0.032) 17 Volvo 0.550∗∗ (0.039) 473 Suzuki 0.154 (0.033) 66 Saab 0.489∗∗∗ (0.038) 91 MG 0.149 (0.036) 6 Subaru 0.430 (0.036) 65 Smart 0.148 (0.039) 19 Volkswagen 0.415 (0.037) 570 Fiat 0.123 (0.035) 158 Honda 0.406 (0.037) 189 Dacia 0.102 (0.038) 6

engine power (kW) 0.0032∗∗∗ (0.0002) estate 0.004 (0.012) kerb weight (kg) 0.0000· (0.0000) coup´e -0.023 (0.028) circumference (m) 0.0400∗ (0.0164) convertible 0.0422 (0.028)

no. of luxury features 0.0081∗∗∗ (0.0006) diesel 0.070∗∗∗ (0.015) • Significance level ‘***’: all p-values of the cars of this brand are < 0.001.

• Significance level ‘**’: all p-values of the cars of this brand are at least < 0.01. • Significance level ‘*’: all p-values of the cars of this brand are at least < 0.05. • Significance level ‘·’: all p-values of the cars of this brand are at least < 0.1. • Significance level ‘ ’: at least one p-value of the cars of this brand is > 0.1.

A total of five regression estimates, partially presented in this section, are interpreted and reviewed in the next subsection. A number of recommendations is given in subsection 5.2.

pThe fixed effect terms and corresponding standard errors are averaged per brand. The cluster-robust standard

errors of the eight remaining explanatory variables are robust to disturbances that are heteroscedastic and autocorrelated. The clustering takes place at the car models. The assumption is that the serial correlation is caused by omitted factors that are not correlated with the explanatory variables.

(21)

5.1 Interpretation and Estimation Assessment

The first two estimated regression models are the linear and ridge regression. Their annually averaged estimates can be found in Table 5. The annually averaged linear regression is used as reference for the remaining estimated models. The reason for this is that the significance levels of its coefficients are relative high and, up to now, linear (cross-section) regression models are almost exclusively used as hedonic price regressions on cars.

The estimated coefficients of both the linear and ridge regression model can be interpreted as the β(100)% change in the price of a new passenger car for one unit change in xi, ceteris-paribus.

The coefficients can be divided in three groups: (1) four non-dummy variables emerged from the BKW procedure, (2) dummies for three types of body style and diesel and (3) dummies for brand name. This third and last group is measured with respect to reference brand Perodua.

It is noticeable that this reference brand has not the lowest coefficient, i.e. cheapest brand name, even though this brand has the lowest average car price. The annually averaged estima-tion results indicate that less has to be paid for Proton and SsangYong; although the coefficients of both brand names are not significant at the 5% level in all eight years. Typical American brands like Hummer, which produced trucks and sport utility vehicles (SUVs) until 2010, Jeep and Cadillac (manufacturing off-road vehicles/SUVs and luxury vehicles respectively) have rel-atively low brand name coefficients. The significance levels of Jeep and Cadillac are higher than those of Proton and SsangYong. German microcar specialist Smart has, in contrast to the American brands just mentioned, a relatively high brand name coefficient. From Mazda and onwards, the estimated coefficients become highly significant. Also noteworthy is that British luxury automobile manufacturer Bentley is relatively inexpensive despite its whopping average car price of £161,291.74. Furthermore, British sports car manufacturers like Ginetta, Westfield, Caterham and Morgan have a high coefficient, considering the average car price of these brands, compared to the brands that have a relative similar coefficient. The majority of coefficients at the top belongs to the more exclusive brands, specialized in luxury and sportiness, which is fully in line with their appearance and average car price. Asian brands such as Kia, Suzuki, Mitsubishi, Toyota, Subaru, Infiniti and Lexus are well spread over the list of estimated brand name coefficients.

The four non-dummy variables, emerged from the BKW procedure, are all positive and highly significant, which also applies for the dummies coup´e, convertible and diesel. Only non-positive dummy estate drops out, with an annually averaged p-value of 0.379. The only year that this variable is significant, is 2013. These findings confirm the hypothesis that a car with the characteristics coup´e/convertible and/or diesel is more expensive, omitting estate with an annually averaged p-value above 0.05.

The third, fourth and fifth estimated model is fixed effects. The coefficients of these models can also be divided in the same three groups: (1) four non-dummy variables emerged from the BKW procedure, (2) dummies for three types of body style and diesel and (3) fixed effects for brand name. The interpretation of the coefficients from group (1) and (2) is, just like the

(22)

coefficents in the linear/ridge regression, the β(100)% change in the price of a new passenger car for one unit change in xi, ceteris-paribus. Group (3), the fixed effects for brand name, should

be interpreted as the percentage increase with respect to the base price, represented by the estimated intercept. In other words, the car price increases by β(100)% after the choice of a given brand.

The analysis of the third model, fixed effects, where all cars are linked through their iden-tification code, is as follows. The estimation results can be found in Table 6. About half of the brand name fixed effects are significant at the 1% level, which is clearly less than the brand name coefficients in the linear regression. It is also worth to notice that the brands are more sorted by their average car price than in the linear regression, although some exceptions ex-ist. One of the biggest outliers is Perodua, with an increase of 37.2% with respect to the base price. It should also be pointed out that only the brand name fixed effects from Perodua and onwards are significant at the 1% level. The remaining brands below Perodua are insignificant, excluding Mini, SsangYong and Proton. This is an indication of poorly estimated coefficients for the middle and lower market segment. Another outlier is Smart: its insignificant coefficient is lower than 53 out of 55 other brands, while compared to its linear regression coefficient it is only lower than 21 out of 58 brands. Recall that the brands Ginetta, Noble and TVR drop out of the fixed effects model, because they are only observed once (in 2006). American truck and SUV manufacturer Hummer has, on the other hand, a relative high coefficient compared to its linear estimate. British sports car manufacturers Westfield, Caterham and Morgan lower down the list of estimated fixed effects compared to the linear estimates.

Analyzing the four non-dummy variables emerged from the BKW procedure, it can be seen that their coefficients decrease by relatively large numbers compared to the linear estimates: car weight decreases 100% (although its coefficient was already very small), circumference halves and number of standard luxury features becomes more than three times as small. These relatively large reductions arise because the fixed effects absorb a bigger share of the car prices than the brand coefficients in the linear/ridge estimates, at the expense of the remaining variables. This is simply a consequence of using the fixed effects estimator instead of OLS to estimate the corresponding model. The significance level of the two non-dummy variables weight and circumference deteriorates. Furthermore, the dummies for the three types of body style differ to the findings in the linear estimates. The estimated coefficients of estate and coup´e even have changing signs compared to the linear estimates. However, all three dummies are insignificant. Finally, dummy variable diesel differs less than 5% from the linear estimate and is highly significant.

The fourth and fifth estimated model comprise fixed effects with the use of the two data selection methods, presented in subsection 3.3. The purpose of these selection methods is to reduce the relative loss of data: the total number of observations decreases, but relatively less cars are observed onceq. The discarded data is no longer taken into account in the calculation of the

q

Recall that the fixed effect of a car model cannot be estimated by the FE estimator if it is only observed once during 2006 to 2013.

(23)

data lossg. This reduction of relative data loss is done by means of selecting one representative version of each car model. In the fourth model, the cheapest version of every car model is selected. In the fifth model, this concerns the median version with regard to the car price.

For both models it can be noted that the list of brand name fixed effects is in line with the linear estimates. There are less outliers than in the third model (i.e. fixed effects where all cars are linked through their identification code). Perodua and Smart, which are remarkable outliers in the third model, are now around the same spot on the list as in the linear regression. On the other hand, Hummer still has a relatively high brand name fixed effect, although less high than in the third model. The coefficients of the eight remaining explanatory variables (four dummies and four non-dummies) broadly correspond to the coefficients in the linear regression, if they are significant at the 5% level.

However, because the total number of observations decreases in the fourth and fifth model, the significance of the coefficients decreases compared to the third model. In both models, approximately 40% of the brand coefficients is significant at the 10% level (at least). By com-parison: in the third model, about half of the brand coefficients is significant at the 1% level. It can be concluded that the estimation results are more similar to the highly significant linear regression estimates than the estimates of the third model, but their significance levels are not as high as in the third model.

There are a number of general conclusions that can be drawn from the estimation results. The most important conclusions are discussed below.

In addition to the impact of product characteristics on the price of new passenger cars, also brand name influence is ascertained. This is evident from the five estimated models in this study. However, also some inconsistencies between the estimates of these models are observed. Explanations for these inconsistencies are different estimation techniques and to a lesser extent differences in the data.

The majority of coefficients in the linear regression estimates are highly significant and the results are what one would expect them to be. Therefore, this basic regression model is used as reference material for the other models.

A corrective procedure for dealing with near linear dependencies between explanatory vari-ables is the ridge regression. This type of regression handles the problem of small eigenvalues in the X0X-matrix by augmenting or inflating the smallest values to create larger magnitudes. Parameter λ is determined by GCV. In practice, this method yields virtually the same estimates as in the linear regression.

The estimated coefficients in the fixed effects model, where all observations are included, are more sorted by their average car price than in the linear regression. There are also some notable outliers compared to the linear regression: brands such as Perodua, Hummer and Smart are on a very different place in the list of brand name fixed effects compared to the linear regression. This is an indication of poorly estimated coefficients for a large part of the car brands, because the linear regression estimates are highly significant and are more in line with what one would

(24)

expect them to be.

The estimated coefficients in the fixed effects model, using two types of data selection meth-ods, are more in accordance with the highly significant coefficients in the linear regression than in the third model, in particular the order of brands. This suggests that the use of well chosen data selection methods can result in good estimates of brand name influence and other product characteristics on car price while using fixed effects.

A benefit of the fixed effects model is that both static and dynamic relationships can be modeled. A drawback of using fixed effects for brand name influence instead of dummy variables (in the linear regression) is that all possible influences beyond brand name are also included in the fixed effect terms, excluding the selected explanatory variables. This makes the interpreta-tion of the estimated fixed effect terms more complex than the dummy variables in the linear regression. Another drawback is that the significance of the estimated coefficients declines. A drawback of data selection is that data is discarded, which has also a negative influence on the significance of the coefficients.

5.2 Recommendations

A number of recommendations for further research can be made. The first is optional augmen-tation of representative explanatory variables for quality aspects such as durability, economy and manoeuvrability. These quality aspects have been omitted in this study, because it turns out to be difficult to find respective measures for these quality aspects, satisfying two basic requirements. Without losing the function of specific quality measure, these measures are not allowed to have (near) linear dependencies with the remaining explanatory variables and they have to be (as much as possible) known for all cars.

The second recommendation is performing comparable research with regard to the car mar-ket in other countries. The emphasis should be on confirming or attenuating the findings in this research with regard to other car markets.

The third recommendation is obtaining and using sales data which can be linked to the corresponding cars with their technical specifications. In this way, it is possible to examine the validity of the assumptions made in the two data selection methods. The assumption in the first selection method is that the cheapest version of each car model is by far the most sold and therefore the most representative version of each car model. The assumption in the second selection method is that every customer is sure about which manufacturer and model to choose, and without considering, a choice is made for the version with the median price.

The fourth and final recommendation is performing a statistical test for potential attrition or selectivity bias. This test is developed by and explained in Verbeek and Nijman (1992). It compares the estimators of both the fixed and random effects model for a unbalanced panel and its a balanced sub-panel. However, the creation of the balanced sub-panel leads in this research (using the UK data) to a substantial loss of observations through time, such that it is not performed.

(25)

6

Conclusion

In this study the influence of brand name and several product characteristics on the price of new passenger cars is both quantified and analyzed. Previous studies regarding hedonic price func-tions on cars focused on technical specificafunc-tions rather than brand name influence. By including brand name dummy variables/fixed effects in addition to the technical specifications that rep-resent the essential features of a car, the coefficients of brand name and product characteristics are estimated.

The panel data consists of technical specifications and standard prices of more than 3,200 new passenger cars on sale in the UK. It is measured annually from 2006 to 2013. The estimated models comprise multiple linear regression, ridge regression and fixed effects. In order to reduce the relative loss of data in the fixed effects model, two data selection methods are proposed. These selection methods serve as a specification check with regard to the estimates using all observations.

Multicollinearity is one of the main issues in hedonic price functions on cars. A statistical pro-cedure to detect the number of linear dependencies between explanatory variables is performed. This procedure indicated that linear dependencies exist between at least nine out of thirteen ba-sic technical specifications. Hence, four non-dummy explanatory variables are selected besides dummy variables for three types of body style and fuel type diesel.

The linear regression estimates prove to be highly significant and the results are what one would expect them to be. The differences between the linear and ridge regression estimates are mini-mal. The variance in the ridge regression is slightly reduced, but the estimated coefficients are also slightly downward biased. The estimates of the fixed effects model, using all observations, show that there are some notable outliers compared to the linear regression. The estimates of the fixed effects model, using two data selection methods, are more in line with the linear regression estimates. However, the estimated coefficients are less significant than in the linear regression. These lower significance levels are caused by using the fixed effects estimator instead of OLS.

It is evident that the influence of brand name in addition to product characteristics on the price of new passenger cars exists. Some inconsistencies between the estimates of these models are due to differences in estimation technique. It is recommendable to expand the number of explanatory variables to give of more complete representation of the essential features of a car, but severe multicollinearity needs to be avoided. Furthermore, comparable research with regard to the car market in other countries should be performed.

(26)

References

[1] Andersson, H. (2005). The value of safety as revealed in the Swedish car market: an appli-cation of the hedonic pricing approach. Journal of Risk and Uncertainty, 30 (3), 211-239. [2] Arguea, N.M. and C. Hsiao (1993). Econometric issues of estimating hedonic price functions:

with an application to the US market for automobiles. The Journal of Econometrics, 56 (1), 243-267.

[3] Atkinson, S.E. and R. Halvorsen (1984). A new hedonic technique for estimating attribute demand: an application to the demand for automobile fuel efficiency. The Review of Eco-nomics and Statistics, 417-426.

[4] Bajic, V. (1993). Automobiles and implicit markets: an estimate of a structural demand model for automobile characteristics. Applied Economics, 25 (4), 541-551.

[5] Belsley, D.A., E. Kuh and R.E. Welsch (1980). Regression diagnostics: Identifying influential data and sources of collinearity. J. Wiley.

[6] BusinessDictionary.com (2013). Hedonic pricing (http://www.businessdictionary.com/ definition/hedonic-pricing.html), 3 October 2012.

[7] Cameron, A. Colin and Pravin K. Trivedi (2009). Microeconometrics: Methods and appli-cations. New York: Cambridge University Press (eighth, revised printing, first printing in 2005).

[8] Cohen, J. and P. Cohen (1975). Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum.

[9] Court, A.T. (1939). Hedonic Price Indexes with Automotive Examples. In: The Dynamics of Automobile Demand. New York: The General Motors Corporation.

[10] Cowling, K. and J. Cubbin (1972). Hedonic price indexes for United Kingdom cars. The Economic Journal, 82 (327), 963-978.

[11] Cubbin, J. (1975). Quality change and pricing behaviour in the United Kingdom car in-dustry 1956-1968. Economica, 42 (165), 43-58.

[12] Feenstra, R.C. (1988). Quality change under trade restraints in Japanese autos. The Quar-terly Journal of Economics, 103 (1), 131-146.

[13] Goeman, J., R. Meijer and N. Chaturvedi (2012). L1 and L2 penalized re-gression models (http://cran.r-project.org/web/packages/penalized/vignettes/ penalized.pdf), 26 November 2012.

[14] Griliches, Z. (1961). Staff papers 3. Hedonic price indexes for automobiles: an econometric of quality change. In: The Price Statistics of the Federal Government. Cambridge: NBER, 173-206.

(27)

[15] Hoerl, A.E. and R.W. Kennard (1970). Ridge regression: Biased estimation for nonorthog-onal problems. Technometrics, 12 (1), 55-67.

[16] Ohta, M. and Griliches, Z. (1976). Automobile prices revisited: Extensions of the hedonic hypothesis. In: Household production and consumption. Cambridge: NBER, 325-398. [17] Kuiper, S. (2008). Introduction to multiple regression: How much is your car worth?.

Journal of Statistics Education, 16 (3).

[18] Murray, J. and N. Sarantis (1999). Price-quality relations and hedonic price indexes for cars in the United Kingdom. International Journal of the Economics of Business, 6 (1), 5-27. [19] Rasmussen, D.W. and T.W. Zuehlke (1990). On the choice of functional form for hedonic

price functions. Applied Economics, 22 (4), 431-438.

[20] Silver, M. (1996). Quality, prices and hedonics. International Journal of the Economics of Business, 3 (3), 351-366.

[21] Sturgeon, T. and R. Florida (2000). Globalization and jobs in the automotive industry. Final report to the Alfred P. Sloan Foundation. International Motor Vehicle Program, Center for Technology, Policy, and Industrial Development, Massachusetts Institute of Technology. [22] Sturgeon, T. and J. van Biesebroeck (2009). Crisis and protection in the automotive indus-try: a global value chain perspective. Effective Crisis Response and Openness: Implications for the Trading System, 91-118.

[23] Triplett, J.E. (1969). Automobiles and hedonic quality measurement. The Journal of Po-litical Economy, 77 (3), 408-417.

[24] Verbeek, M. and T. Nijman (1992). Testing for selectivity bias in panel data models. International Economic Review, 681-703.

(28)

A

Linearity and Heteroscedasticity

This appendix gives a more detailed look at the assumptions of linearity and presence of severe heteroscedasticity in the linear regression than subsection 4.1. These assumptions can be in-vestigated by plotting the studentized residuals as a function of the fitted values of the linear regression. Figure 5 contains this plot for the UK data in 2013, including a red regression line. Similar plots are observed for the remaining years (2006-2012).

9 10 11 12 −4 −2 0 2 4 6 8

Spread−Level Plot for linear.regression.2013

Fitted values

Studentiz

ed residuals

Figure 5: Spread-level plot for the linear regression in 2013.

It can be seen that the red regression line is equal to the x-axis, and the majority of points in the plot are centered around 0 for all values ofybi. Therefore, the assumption of linearity is considered as acceptable.

Absence of severe heteroscedasticity is another case. It is visible that the great majority of points in the plot is in the interval 9 <byi< 11 and approximately frombyi> 11, the distribution of the points increases. Although the biggest outliers are located atybi ≈ 9.5, absence of severe

heteroscedasticity cannot completely be ruled out. A consequence of severe heteroscedasticity is that the OLS estimates of the variance of the regression coefficients are possibly biased. The standard errors and hence inferences (such as p-values and significance levels) obtained from the data analysis are suspect. That is why the standard error estimates in the linear regression are heteroscedasticity-consistent (i.e. robust to disturbances that are heteroscedastic).

Referenties

GERELATEERDE DOCUMENTEN

This inconsistency is defined as the difference between the asymptotic variance obtained when the restricted model is correctly specified, and tlie asymptotic variance obtained when

- Voor waardevolle archeologische vindplaatsen die bedreigd worden door de geplande ruimtelijke ontwikkeling: hoe kan deze bedreiging weggenomen of verminderd

Van een groot aantal spuitdoppen worden de druppelgrootteverdelingen (karakteristieken) bepaald Op basis van deze karakteristieken worden referentiedoppen voor

In addition to the average level of registration taxes across fuels, higher registration tax levels for diesel cars compared to petrol cars tend to reduce the diesel share (see

›   This means that brands with a larger market share in a certain store take more brand switchers over from the other brands with a price promo-on than brands with a smaller

H2D: Consumer attitude (consumer evaluation, purchase intention and willingness to pay a price premium) towards the brand extension will be more positive for low

As both operations and data elements are represented by transactions in models generated with algorithm Delta, deleting a data element, will result in removing the

Tijdens een excursie van lichenologen in het voorjaar van 2000 werd de soort op vier plekken waargenomen (Aptroot e.a.. Over het algemeen past een lage biomassaproductie bij