• No results found

A model for housing prices in the Ecuadorian market

N/A
N/A
Protected

Academic year: 2021

Share "A model for housing prices in the Ecuadorian market"

Copied!
33
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Master’s Thesis

A Model for Housing Prices in the

Ecuadorian Market

Armas, Cynthia

Student number: 10824383

Date of final version: July 3, 2015

Master’s programme: Econometrics

Specialisation: Free Track

Supervisor: Prof. dr. F. R. Kleibergen

Second reader: dhr. dr. K. J. van Garderen

(2)

Contents

1 Introduction 1 2 Literature Review 3 3 The Model 5 3.1 Model Setup . . . 5 3.2 Estimation . . . 6 4 Data 8 5 Results 9 5.1 Estimated Models . . . 9 5.2 Group Specification . . . 15 5.3 General Review . . . 17 6 Conclusion 21 A Outputs 23

A.1 Estimates of the specifications described in Chapter 5 . . . 23 A.2 Significant coefficients for the model suggested according to the AIC and BIC

described in Chapter 5 . . . 23

Bibliography 31

(3)

Chapter 1

Introduction

The housing market has always been one of the most important economic areas for policy makers since it reflects the wealth of the population in a country. Specifically, in Ecuador, this market has been increasing its importance in the production function and increase of housing prices is a behavior reflected in all cities. Moreover, it is clear that people make their decision of buying a house depending on the characteristics it has, the observable and the unobservable ones such as the geographic location, the materials of the house, the number of bedrooms, bathrooms, the crime rate, the pollution level, and others. This idea is captured by the hedonic regression method: hedonic prices refer to implicit prices of attributes and are revealed to the agents from observed prices of differentiated products and specific amounts of characteristics associated to them, Rosen (1974).

In this research, the hedonic regression method is going to be used to explain the housing prices in Ecuador. The need of looking for a methodology to determine the housing prices is since in Ecuador there doesn’t exist an official regulation about it and therefore, there doesn’t exist a reference for the homebuyers with which they can know why a specific house has that price. Also, as explained before, in the hedonic method there can exist omitted variables effects that could bias the results and this thesis is going to explain why the use of this methodology is still valid.

With this, the main aim of this thesis is to find an accurate model that explains housing prices in the Ecuadorian market. This is, to find a model that can show the effect of increasing a particular attribute of a house (on its market price) when the other attributes remain fixed. Given that consumers maximize their utility, the marginal willingness to pay of the consumers for a little change on a specific house attribute because of its implicit price included in the model is analyzed. As specific research questions, a model that takes into account the effects of the omitted variables is going to be developed, the accuracy of making differentiation between neighborhoods is going to be studied, and a general idea about housing prices in the three most important cities of Ecuador is going to be set.

(4)

CHAPTER 1. INTRODUCTION 2

In order to complete this purpose, data from the yearly Edifications Survey pursued by the National Institute of Statistics and Census is going to be used, from 2004 to 2013 where information about the market price and the characteristics of the houses in Quito, Guayaquil and Cuenca is given.

The remainder of this thesis is organized as follows. Chapter 2 describes the existing litera-ture about the use of the hedonic method and its application in the specific case of the housing prices. Chapter 3 describes the general specification of the model that will be used in this thesis. Chapter 4 describes the database. Chapter 5 describes all the different specifications used to get the most accurate model that describes the price of the houses in Ecuador. Chapter 6 concludes.

(5)

Chapter 2

Literature Review

The hedonic regression method is used to express the price of an heterogeneus product as a function of its characteristics. This methodology allows to control for the effects of the change of the attributes of one product on the price that is observed and the resulting price is a product of the interaction between the supply and the demand of the housing market as is shown in some empirical works about this topic, where models to determine the price of a differentiated and indivisible good under perfect competition conditions are worked out. Hedonic models focus on markets in which a generic commodity incorporates different amounts of attributes that can vary. In an empirical investigation of hedonic models, one issue of interest is determining how the price of a unit of the commodity varies with the set of attributes it possesses, Rosen (1974).

Since house consumers and sellers are profit maximizers, the price that results from the interaction between supply and demand in the market is the one that clears it so that the bundles of characteristics of the product, houses in this case, perfectly reflects how the price is set. Rosen (1974) shows how this equilibrium is reached and he concludes that when goods can be represented as a set of certain attributes, the observed prices in the market can be described and can be compared in terms of these characteristics.

Heckman et al. (2009) considers the case where the conditions of the market change, this is, when the demand and the supply conditions change, the hedonic price function is not going to give an idea of the general equilibrium in the hedonic market and therefore, identification of the structural parameters is needed. Here, the rol of observed and unobserved heterogeneity is the most important reason to consider a nonparametric hedonic model without additive marginal product functions. Moreover, this paper considers that when the models can not be identified with a single cross section, the use of multiple markets is going to be enough to identify the model. This is, a cross-market variation in price functions is induced by the variation of the variables that are observed across the markets.

An extensive literature exists about hedonic methods, such as De Haan and Erwin (2013),

(6)

CHAPTER 2. LITERATURE REVIEW 4

where different specifications about the identification of the hedonic model is explained. How-ever, for this thesis the papers described before capture all the theoretical basis that are needed about hedonic regression methods that are used for setting prices of different products. On the other hand, Epple (1987), Lehner (2011), Franklin and Waddell (2002), use hedonic regression methods to establish the prices of the housing market and take into account diverse issues about the possible unobserved heterogeneity in the model.

Nonetheless, in reference to the housing prices this thesis is going to be developed under the theoretical structure framework of Bajari et al. (2010) where the omitted variables effect is treated through the observation of a house in different periods of time and making the following main identyfing assumptions: first, it is assumed that in a certain market, the house price in a certain period of time can be written as a function of its attributes. What is more important is that this first assumption includes those attributes that are observed by home buyers but are omitted from the regression specification. As a consequence of this, the unobservable variables are priced by the market even if the researcher can not measure them and therefore, the residual from the hedonic regression will contain information that will be used to price home attributes that are not directly observed.

Next, Bajari et al. (2010) assume a parameterization of the process that determines the dynamic evolution of the value of the omitted attribute. This assumption is related to the number of observations that are available for the research. In their empirical work, they con-sider transactions, repeated sales of the house, in different periods of time. For the authors, the evolution of the residuals of the model, that take into account the effect of the omitted variables, is uncertain for the sellers and the buyers in the market. So that, they consider that the omitted product attribute evolves according to a first-order Markov process. They proved that it is enough to work with two observations in order to capture the evolution of the omitted variables in the regression. In this thesis, this assumption is focused on the sense that for each neighborhood, 10 different periods of time are available, therefore the evolution of the value of the omitted atribute is going to captured in the hedonic regression model.

Finally, the last identifying assumption in Bajari et al. (2010) is related to the rationality of homebuyers with respect to their predictions about the evolution of the omitted housing attribute over time. This means that homebuyers are not going to make systematic erros when predicting this evolution. Therefore, the difference between the observed value at a certain period of time and the optimal forecast of this value based on the available information set at the prior period of time is uncorrelated with their current information set.

(7)

Chapter 3

The Model

In the following sections, the specifications of the hedonic model for the housing price in Ecuador are stated.

3.1

Model Setup

Let xi be the vector of attributes of the house, ui a scalar that captures any possible omitted

variable in the regression that is observed by the consumer but not by the researcher; moreover, define pi = p(xi, ui) as the equilibrium prices and p as the hedonic price function. Then, we

want to estimate p(xi, ui), the price of the house, using its characteristics xi, for different periods

of time t = 1, . . . , T . This is, we are going to estimate the general model:

ln(pi,t) = γ0+ x0i,tβ + ui,t (3.1)

Where i = 1, . . . , N , γ0 is the constant of the model and the natural logarithm has been used

to produce smooth results.

However, for consistency of the estimators E[ui,t|xi,t] = 0 is needed but its liable to fail

because of the presence of possible omitted unobservable variables, which means that we have to deal with endogeneity in the hedonic model caused by unobserved heterogeneity that will lead to biased estimators.

This means that our true model should be written as:

ln(pi,t) = γ0+ z0iγ1+ x0i,tβ + ui,t (3.2)

Where zi are individual specific time invariant variables.

Nevertheless, in the case of housing markets, these individual specific time invariant vari-ables (such as curb appeal, quality of neighborhood, crime rate, closeness to schools, shopping

(8)

CHAPTER 3. THE MODEL 6

malls, and others) are difficult to be found in an official register and it is clear that they are going to produce a variation in the willingness of consumers to pay for a house.

When correlated unobservables are time-invariant, which is this case, the use of panel data specifications is an alternative solution to control for the effects of omitted variables by specifying individual specific effects1, this is:

ln(pi,t) = αi+ x0i,tβ + ui,t (3.3)

With

αi = γ0+ zi0γ1

In our specific case, this can be explained as every homebuyer in a common neighborhood perceives the same effect of the unobservable variables on the price of the house across time.

Additionally, this study is going to focus on the change of the slope of the β when some interactive and some aggregated terms are considered in the hedonic model. As a general form, this can be expressed as follows:

ln(pi,t) = αix0i,tβa+ x 0

i,tβb+ ui,t

Where the sum of the β’s is going to give the new value of the slope of the parameters for each x and therefore the bias of the estimates is going to be detected. More different specifica-tions are described in next section.

3.2

Estimation

To estimate the model in (3.3), we first consider the model only with the quantitative indepen-dent variables that are available in the data set in order to know the values of their parameters and compare with different specifications so that we can identify if there exists a bias or not (and therefore, deal with the omitted variable effect). This is, we want to model:

ln(pi,t) = α + N umber of Roomsi,t∗ β1+ N umber of F loorsi,t∗ β2+

Green Areai,t∗ β3+ T otal Areai,t∗ β4+ ui,t (3.4)

Then, we will include dummy variables for the different neighborhoods as part of the regres-sors set and check how the estimates change in this new model:

1

(9)

CHAPTER 3. THE MODEL 7

ln(pi,t) = N

X

j=1

αjDj,it+ N umber of Roomsi,t∗ β1+ N umber of F loorsi,t∗ β2+

Green Areai,t∗ β3+ T otal Areai,t∗ β4+ ui,t (3.5)

Where j = 1, . . . , N corresponds to the dummy variables created for the 58 heighborhoods in the data set that is used and therefore the first term in the right hand of the equation (PN

j=1αjDj,it) includes the fixed effect mentioned in (3.3) and the N individual dummies Dj,it

equal one if i = j and equal zero otherwise.

Additionally, some interaction terms and some aggregated variables are going to be included in the model in order to check for the bias of the estimates in model (3.4) and (3.5). All these specifications are detailed in Chapter 5.

(10)

Chapter 4

Data

In order to make this research, the data set that is going to be used comes from the Edifications Survey carried out every year by the National Institute of Statistics and Census in Ecuador. In this survey, we find information about the geographic location of the houses, the materials used in the wall, in the basis, in the structure, for the curb appeal, number of bedrooms, number of floors, square feet of construction, square feet of the parking place, and the total value of the house (in dollars).

This study has been made for the three most important cities, economically, in Ecuador: Quito, Guayaquil and Cuenca, for the period of time 2004-2013; then, in the database there are 60500 observations for these cities and they are distributed as follows:

Year Number of Observations

2004 3953 2005 4155 2006 5660 2007 5492 2008 5486 2009 6240 2010 11312 2011 6764 2012 6339 2013 5099

Table 4.1: Number of Observations for Quito, Guayaquil and Cuenca

Moreover, there exist 32 urban sectors for Quito, 13 for Guayaquil, and 13 for Cuenca which means that we are going to create 58 dummy variables for these locations.

(11)

Chapter 5

Results

5.1

Estimated Models

We first perform regressions using as independent variables the number of rooms (Rooms), the number of floors (Floors), the area for green spaces (Green Area), and the total area of the house (Total Area), in logarithms. We don’t use the information for the Garage Area since around 99% of the observations is zero for this variable. Then, the specification for the first model is:

ln(P rice of the House) = α + ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+ u (5.1)

Table 5.1 shows us the estimates, the t-Statistic, the p-value and the VIF for each variable of this model:

Model Coefficient t Statistic p value VIF Constant 5.354 168.641 0.000 -Ln(Rooms) 0.185 15.159 0.000 1.433 Ln(Floors) 0.278 22.649 0.000 1.1388 Ln(Green Area) 0.209 63.172 0.000 1.152 Ln(Total Area) 0.740 88.770 0.000 1.916 Table 5.1: Coefficients

VIF is the Variance Inflation Factor and some authors consider that when this is higher than 10 there exists a multicollinearity problem (Wooldridge (2002)), however as we see in Table 5.1 this is rejected.

Since all the variables are significant at the 5% level of confidence, we can interpret these results as follows: in general, the increase of one room in a house in Ecuador is going to increase

(12)

CHAPTER 5. RESULTS 10

its price in a 0.185%, the increase of one floor is going to increase its price in a 0.278%, the increase of one square meter for green areas is going to increase its price in a 0.209%, the in-crease of one square meter in the total area of the house is going to inin-crease its price in a 0.740%.

Also, for multiple regression coefficients, a joint test statistic is required in order in order to test H0: ˆβ1= ˆβ2= . . . = ˆβK = 0. Then, the F-test is used, which can written as:

F = (SSRr− SSRur)/q SSRur/(N − (K + 1))

(5.2) where SSRr stands for the sum of the suared residuals of the restricted model and SSRur

is the same for the unrestricted model. Moreover, N is the number of observations, K is the number of independent variables in the unrestricted model and q is the number of restrictions (or the number of coefficients being jointly tested).

Table 5.2 shows some test statistics, such as the F-Test and the Durbin - Watson, and adjusted R2 for the model presented in Table 5.1 which allow us to reject the null hypothesis, this is, the joint test for the coefficients of the number of rooms (Rooms), the number of floors (Floors), the area for green spaces (Green Area), and the total area of the house (Total Area) rejects that they are equal to zero. In other words, these variables do explain the dependent variable, the Price of the House. Also, this result is confirmed for the value of the adjusted R2.

Value F-Test 8295.981 Adjusted R2 0.534

p-value 0.000 Durbin - Watson 1.156 Table 5.2: Statistics for (5.1)

Now, we wish to analyze the difference in the price of a house when it’s constructed in a specific geographic area. This is, we would like to know if the parameters for the 58 dummies created for the urban sectors of Quito, Guayaquil and Cuenca has an important significance. To do this we regress:

ln(P rice of the House) = β0+ 57

X

j=1

αjDj+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+ u (5.3)

For this model, the neighborhood La Concepcion - Quito, that corresponds to the dummy variable 58 (D58), is considered as baseline. Section 5.3 discusses the parameter estimates of

(13)

CHAPTER 5. RESULTS 11

Table 5.3 shows some test statistics, such as the F-Test for the significance of the dummy vari-ables used in this specification, the Durbin - Watson, and adjusted R2 for the model presented in (5.3) which allow us to reject the null hypothesis, this is, the joint test for the coefficients of the number of rooms (Rooms), the number of floors (Floors), the area for green spaces (Green Area), the total area of the house (Total Area), and the 58 dummy variables, rejects that their estimates are equal to zero. Also, we can see that the value of the F-statistic is much lower than the one in Table 5.2 and there is a gain in the value of the adjusted R2.

Value F-Test 621.100 Adjusted R2 0.562

p-value 0.000 Durbin - Watson 1.152 Table 5.3: Statistics for (5.3)

Moreover, we want to know how the effect of the interaction between independent variables, N umber of Rooms, F loors, Green Area, and T otal Area, and the dummies varies the slopes of the coefficients of the independent variables, and how they produce a change of the dependent variable, P rice of the House. This is, we run the models:

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Rooms) ∗ D1) ∗ β5+ . . . + (ln(Rooms) ∗ D57) ∗ β61+ u1 (5.4)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(F loors) ∗ D1) ∗ β5+ . . . + (ln(F loors) ∗ D57) ∗ β61+ u2 (5.5)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Green Area) ∗ D1) ∗ β5+ . . . + (ln(Green Area) ∗ D57) ∗ β61+ u3 (5.6)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(14)

CHAPTER 5. RESULTS 12

where D1, . . . , D57 are the dummy variables for each neighborhood and ui is the error term

for each model, i=1, 2, 3, 4.

Table 5.4 shows us some test statistics, such as the F-Test for the significance of the interac-tive terms included in this specification, the Durbin - Watson, and adjusted R2 for (5.4), (5.5), (5.6) and (5.7). With these statistics we can see that the parameters of the different regressors in these models are different from zero (the p-value of the F statistic for each model is less than 5%, then we reject the null hypothesis of the parameters being equal to zero) and there is an adjustment of around 60% for each model.

Model F-Test p-value Adjusted R2 Durbin - Watson (5.4) 600.547 0.000 0.554 1.152 (5.5) 606.155 0.000 0.556 1.156 (5.6) 674.823 0.000 0.578 1.165 (5.7) 611.291 0.000 0.558 1.149

Table 5.4: F-Test and adjusted R2 for (5.4), (5.5), (5.6) and (5.7)

Now, we want to know how the price of a house changes if we take more than one interactive term at the same time, this is, we run two additional models:

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Rooms) ∗ D1) ∗ β5+ . . . + (ln(Rooms) ∗ D57) ∗ β61+

((ln(F loors) ∗ D1) ∗ β62+ . . . + (ln(F loors) ∗ D57) ∗ β118+ u1 (5.8)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Green Area) ∗ D1) ∗ β5+ . . . + (ln(Green Area) ∗ D57) ∗ β61+

(ln(T otal Area) ∗ D1) ∗ β62+ . . . + (ln(T otal Area) ∗ D57) ∗ β118+ u2 (5.9)

Table 5.5 shows us some test statistics, such as the F-Test for the significance of the ag-gregated interactive terms of this specification and the Durbin - Watson, and adjusted R2 for (5.8) and (5.9). We can see that the parameters of the different regressors in these models are significant (the p-value of the F statistic for each model is less than 5%, then we reject the null hypothesis of the parameters being equal to zero) and there is an adjustment of around 60% for each model.

(15)

CHAPTER 5. RESULTS 13

Model F-Test p-value Adjusted R2 Durbin - Watson (5.8) 333.285 0.000 0.564 1.167 (5.9) 395.153 0.000 0.616 1.186

Table 5.5: F-Test and R2 adjusted for (5.8) and (5.9)

Additionally, from the results of (5.6) and (5.7), we can think about creating some groups for the neighborhoods in the different cities according to the standard deviations of the esti-mates which show to be similar among the neighborhoods. Specifically, we are going to create ten groups taking into account the results of these models. For (5.4) these ten groups are as follows:

E1: if the neighborhood is located in Cuenca and the standard deviation in (5.6) is 0.005.

E2: if the neighborhood is located in Cuenca and the standard deviation in (5.6) is 0.006.

E3: if the neighborhood is located in Cuenca and the standard deviation in (5.6) is 0.007.

E4: if the neighborhood is located in Guayaquil and the standard deviation in (5.6) is 0.005.

E5: if the neighborhood is located in Guayaquil and the standard deviation in (5.6) is 0.006.

E6: if the neighborhood is located in Guayaquil and the standard deviation in (5.6) is 0.007.

E7: if the neighborhood is located in Quito and the standard deviation in (5.6) is 0.005.

E8: if the neighborhood is located in Quito and the standard deviation in (5.6) is 0.006.

E9: if the neighborhood is located in Quito and the standard deviation in (5.6) is 0.007.

E10: all the remaining scenarios.

Then, we model:

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Green Area) ∗ E1) ∗ β5+ . . . + (ln(Green Area) ∗ E9) ∗ β13+ u1 (5.10)

We do the same procedure with the results of the iteractive model in (5.7)1:

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(T otal Area) ∗ F 1) ∗ β5+ . . . + (ln(T otal Area) ∗ F 9) ∗ β13+ u2 (5.11)

As we can see in Figure A.2, if we consider the interaction suggested in (5.10) we get a coefficient of 0.217 for the Green Area variable which was 0.299 in (5.6), this is, the coefficient

1

(16)

CHAPTER 5. RESULTS 14

for this variable increases in 8.2%.

Also, in Figure A.3, where the results for the interaction suggested in (5.11) is shown, we see that the coefficient for the T otal Area variable is 0.762 which was 0.774 in (5.7), this is, the coefficient for this variable increases in 1.2%.

Table 5.6 shows us some test statistics, such as the F-Test and the Durbin - Watson, and adjusted R2 for (5.10) and (5.11). We can see that the parameters of the different regressors in these models are significant (the p-value of the F statistic for each model is less than 5%, then we reject the null hypothesis of the parameters being equal to zero) and there is an adjustment of around 60% for each model.

Model F-Test p-value Adjusted R2 Durbin - Watson (5.10) 3091.870 0.000 0.561 1.173 (5.11) 3551.437 0.000 0.574 1.204

Table 5.6: F-Test and R2 adjusted for (5.10) and (5.11)

Now, in order to know what is the total effect of one of the independent variables, e.g. the number of rooms, in the price of the house given that another characteristic is added (e.g. the number of floors, square meters in the green area or square meters in the total area) depending on its location we want to perform aggregated models for the different independent variables as follows:

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Rooms) + ln(F loors)) ∗ D1 ∗ β5+ . . . +

(ln(Rooms) + ln(F loors)) ∗ D57 ∗ β61+ u1 (5.12)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Rooms) + ln(Green Area)) ∗ D1 ∗ β5+ . . . +

(17)

CHAPTER 5. RESULTS 15

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Rooms) + ln(T otal Area)) ∗ D1 ∗ β5+ . . . +

(ln(Rooms) + ln(T otal Area)) ∗ D57 ∗ β61+ u3 (5.14)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(F loors) + ln(T otal Area)) ∗ D1 ∗ β5+ . . . +

(ln(F loors) + ln(T otal Area)) ∗ D57 ∗ β61+ u4 (5.15)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(F loors) + ln(Green Area)) ∗ D1 ∗ β5+ . . . +

(ln(F loors) + ln(Green Area)) ∗ D57 ∗ β61+ u5 (5.16)

ln(P rice of the House) = β0+ ln(Rooms) ∗ β1+ ln(F loors) ∗ β2+

ln(Green Area) ∗ β3+ ln(T otal Area) ∗ β4+

(ln(Green Area) + ln(T otal Area)) ∗ D1 ∗ β5+ . . . +

(lnGreen Area) + ln(T otal Area)) ∗ D57 ∗ β61+ u6 (5.17)

where D1, . . . , D57 are the dummy variables for each neighborhood and ui is the error term

for each model, i=1,. . . ,6.

Table 5.7 shows us the F-Test, the Durbin - Watson statistic, and adjusted R2 for (5.12), (5.13), (5.14), (5.15), (5.16), and (5.17). We can see that the parameters of the different regres-sors in these models are significant (the p-value of the F statistic for each model is less than 5%, then we reject the null hypothesis of the parameters being equal to zero) and there is an adjustment of around 60% for each model.

5.2

Group Specification

As a concluding analysis, we would like to analyze the prices of the houses through a more general group specification which in this case it’s referred to the three mayor cities we are

(18)

con-CHAPTER 5. RESULTS 16

Model F-Test p-value Adjusted R2 Durbin - Watson (5.12) 614.544 0.000 0.555 1.152 (5.13) 662.594 0.000 0.574 1.162 (5.14) 609.669 0.000 0.557 1.149 (5.15) 621.489 0.000 0.558 1.149 (5.16) 668.844 0.000 0.576 1.162 (5.17) 647.557 0.000 0.568 1.157

Table 5.7: F-Test and adjusted R2 for (5.12), (5.13), (5.14), (5.15), (5.16), and (5.17)

sidering in this thesis.

We consider first, the case for the capital, Quito. We run the same model specification as in (3.4) we see that in this case, the number of rooms is not significant at a 5% significance level, therefore we take out this variable from the model and we make a new regression without it. According to this, the price of a house that is located in Quito is going to increase in 0.549% if it has an additional floor, 0.360% for one square meter more in the green area and 0.505% for one square meter in the total area.

Moreover, according to the specification in (3.5), this is including the dummy variables for the neighborhoods in this city, we see that the variable number of rooms is again significant at a 5% significance level but its sign now is negative.

Now, we consider the case for Guayaquil. Again, we first regress the model specification given in (3.4); here, if a house has an extra room, its price is going to increase in 0.295%, if it has an extra floor, its price is going to increase in 0.309%, if it has one square meter more in the green area, its price will be 0.318% higher and 0.926% if it has one square meter in the total area.

Considering the neighborhoods of this city, we get an improvement in the previous model given that the adjusted R2 increases and we have significative estimates for all the dependent variables and with the expected signs.

Finally, for Cuenca, we also get unexpected results given that the coefficients for the number of rooms and the number of floors are negative (-0.165 and -0.055, respectively). For this city, if there is one more square meter for the green area of a house, it’s going to increase its price in 0.044% and if there is one more square meter for the total area, the price will increase in 0.910%.

Inclluding in the analysis the neighborhoods of this city doesn’t contribute to an improve-ment of the model given that only 2 neighborhoods have significant parameters in this estimation

(19)

CHAPTER 5. RESULTS 17

and therefore the adjusted R2 is the same as before.

Table 5.8 shows us the F-Test, the Durbin - Watson statistic, and adjusted R2 for the spec-ifications mentioned in this section.

Model F-Test p-value Adjusted R2 Durbin - Watson (3.4) for Quito 1066.282 0.000 0.435 1.572 (3.5) for Quito 480.371 0.000 0.600 1.636 (3.4) for Guayaquil 14661.841 0.000 0.801 1.110 (3.5) for Guayaquil 6061.476 0.000 0.821 1.127 (3.4) for Cuenca 1549.822 0.000 0.376 1.364 (3.5) for Cuenca 1036.109 0.000 0.376 1.365

Table 5.8: Statistics for different model specifications for Quito, Guayaquil and Cuenca

5.3

General Review

The main aim of this thesis is to find the most accurate model to explain the prices of the houses as a function of their characteristics. This model has to take into account that there exist unobserved variables that could cause biased estimates (and therefore inconsistency) of the observed ones due to a possible endogeneity in the model.

Once we have explored many combinations for the hedonic model (3.1) in Section 5.1 we find that the slope of the coefficients for the dependent variables vary as is indicated in Figure 5.1

Figure 5.1 shows the estimates for the regressors in the different models discussed in Section 5.1. We can see that, in general, these coefficients don’t have too distinct values among all the specifications, specially for the Green Area and the Total Area; however, for the Number of Rooms and the Number of Floors, there exists more variation between the estimates. We also identify that for the general group specification the estimates of all the variables are very different from the other models, and in some cases inconsistent with the reality (about the sign of the estimates) so that the specifications that consider the three cities separetely seem to show endogeneity problems. Table A.1 also shows these estimates.

On the other hand, when we analyze the different results of the models considering the neighborhoods in each city, we can see that the interactive terms do produce important changes in the estimates of the variables and allow us to identify the differences of buying a house in a specific neighborhood of the cities. This is shown in Figures 5.2, 5.3, 5.4 and 5.5.

(20)

CHAPTER 5. RESULTS 18

Figure 5.1: Coefficients for the independent variables in the different models of Section 5.1

Figure 5.2: Performance of the models for the estimates of the Number of Rooms

Moreover, Figure 5.6 shows the statistics for all the models discussed in Section 5.1, we can see that the statistics, the F-test and the Adjusted R2, are different for the models (5.1), (5.10), (5.11) and for the general group specifications. We know that large values of the F-test statistic give evidence against the validity of the hypothesis of the the regression equation as a whole being significant. Note that a large F is induced by a large value of R2. Moreover, the F statistic is a measure of the loss of fit (namely, all of R2) that results when we impose the restriction that all the slopes are zero. If F is large, then the hypothesis is rejected. We clearly see that in our case, the joint test between the models tells us that models with the form of (5.1), (5.10), (5.11) will show a lower loss of fit and therefore, will be more acurate to use. Special consideration must be taken with the F-test of the group specifications because eventhough their values are higher than the others, specially for Guayaquil, we have said that the estimates might be biased and therefore the regressions will give us inconsistent results, so that we can not trust only in the value of this test.

Additionally, another statistic that can be used in order to select a model is the information criterion. Cameron and Trivedi (2005) argue that “the model with the smallest information

(21)

CHAPTER 5. RESULTS 19

Figure 5.3: Performance of the models for the estimates of the Number of Floors

Figure 5.4: Performance of the models for the estimates of the Green Area

criterion is preferred taking into account that the information criteria are log-likelihood crite-ria with degrees of freedom adjustment”. They say that this is because “there exists a tension between model fit, as measured by the maximized log-likelihood value, and the principle of par-simony that favors a simple model. The fit of the model can be improved by increasing model complexity. However, parameters are only added if the resulting improvement in fit sufficiently compensates for loss of parsimony”. Figure 5.7 shows the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) for all the previous models. According to this statemen, the models that are analyzed with the group specification (Quito, Guayaquil and Cuenca separetely) should be chosen but as it has been said before, the estimates are biased and we can not distinguish between the prices in different neighborhoods in each city given that the estimation of these models give inconsistency of the parameters.

Therefore, according to the AIC and BIC of the rest of the specifications treated in 5.1, the model described in (5.9) should be choosen, which as said before, includes the interaction terms of the quantitative variables and the neighborhoods for each city, so that it’s feasible to distinguish the prices among these places.

(22)

CHAPTER 5. RESULTS 20

Figure 5.5: Performance of the models for the estimates of the Total Area

Figure 5.6: Statistics for the models of Section 5.1

(23)

Chapter 6

Conclusion

Hedonic regression models constitute an accurate methodology to establish housing prices even-though there exists unobserved heterogeneity that can bias the estimates of the model. Cross-market information and repeated transactions, are enough to overcome the problem of the omitted variables when the model is carried on.

This, together with fixed effects estimation for the time invariant unobserved attributes, correlated unobservables for individuals that are time-invariant when panel data is available, produce consistent estimators using hedonic regressions. In the specific case of the price of the houses, we assume that the omitted variables such as the crime rate, the pollution level, the curb peal of the house, and others, in a specific neighborhood, are the same for all the individuals.

When the model is missespecified, for example in the general group specification, we get in-consistencies in the estimates of the parameters of the independent variables, such as a negative sign for the N umber of Rooms variable or even the non-significance of it.

This research shows that the differentiation between the neighborhoods in each city is the most important factor used to treat the effects of the omitted variable when panel data is avail-able. The estimates produced with these specifications, are representative and help to get rid of the bias in the model. Moreover, including the fixed effects for each neighborhood helps to show the dynamics of the housing prices in a real economic market and can be extended to any other where information is available.

Specifically for the ecuadorian market, the neighborhood La Concepcion, located in the north of Quito, is the most expensive one and the square meters of the green area in the house is the most significant variable that influences the price of the house. However, compared with Guayaquil, the neighborhood Garcia Moreno which is the most expensive neighborhood in this city, is, in some specifications, more expensive than La Concepcion and this in fact true given the conditions of the economic market in both cities. More details about the coefficients and

(24)

CHAPTER 6. CONCLUSION 22

the interpretation are given in A.2 where the significant estimates of some of the interactive models are shown.

(25)

Appendix A

Outputs

A.1

Estimates of the specifications described in Chapter 5

Variable (5.1) (5.3) (5.4) (5.5) (5.6) (5.7) (5.8) (5.9) (5.10) (5.11) (5.12) (5.13) (5.14) (5.15) (5.16) (5.17) Quito1 Quito2 Guayaquil1 Guayaquil2 Cuenca1 Cuenca2 ln(Rooms) 0.185 0.169 0.509 0.180 0.177 0.174 -0.118 0.167 0.128 0.121 0.420 0.278 0.246 0.175 0.176 0.176 -0.270 -0.078 0.295 0.242 -0.165 -0.164 ln(F loors) 0.278 0.234 0.248 0.972 0.223 0.244 1.381 0.192 0.216 0.219 0.507 0.227 0.244 0.335 0.346 0.234 0.549 0.347 0.309 0.313 -0.055 -0.056 ln(Green Area) 0.209 0.180 0.191 0.191 0.299 0.185 0.184 0.682 0.217 0.199 0.190 0.274 0.186 0.185 0.284 0.234 0.360 0.259 0.318 0.268 0.044 0.044 ln(T otal Area) 0.740 0.679 0.704 0.715 0.694 0.774 0.693 0.290 0.749 0.762 0.707 0.694 0.759 0.771 0.696 0.744 0.505 0.297 0.926 0.750 0.910 0.910

Table A.1: Estimates for the explanatory variables in the different models of Section 5.1

A.2

Significant coefficients for the model suggested according

to the AIC and BIC described in Chapter 5

Figure A.1 shows us the output for model (5.9) that was the one suggested by the AIC and BIC criteria that incorporates interactive and aggregated terms in the specification. Here, it’s clear that the change in one square meter of the green area is going to have a more important influence (its estimate is the highest one 0.682) in the price of the house rather than the number of rooms (0.167), the number of floors (0.192) or the total area (0.290).

These estimates tell us that if there is an increase in the number of rooms, the price of the house is going to increase in 0.167%. If the numbers of floors increases the price is going to in-crease in 0.192%. Also, if the total area inin-creases, the price is going to inin-crease in 0.290% as well.

Moreover, we can see that, if for example the house is located in the neighborhood Bellav-ista, in Cuenca, the change in the green area of the house is going to affect the price of it in 0.085% (0.682%-0.597%) less than if it were located in the neighborhood La Concepcion, in Quito.

Additionally, if the house were located in the neighborhood Garcia Moreno, in Guayaquil, a change in the square meters of the total area of the house is going to affect in 0.345%

(26)

APPENDIX A. OUTPUTS 24

(0.290%+0.055%) more than if it were in La Concepcion, in Quito.

The same interpretation can be done comparing all the neighborhoods of the cities com-pared with the baseline considered for these models (which in this case is the neighborhood La Concepcion). Also, for the other specifications, the interpretation of the estimates are done in the same way but are not detailed here since the aim of this thesis is not to interpret the estimates of all the models but study the accuracy of the specifications.

(27)

APPENDIX A. OUTPUTS 25

(28)
(29)
(30)
(31)

APPENDIX A. OUTPUTS 29

Figure A.2: Coefficients for model (5.10)

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. 95,0% Confidence Interval for B Collinearity Statistics B Std. Error Beta Lower Bound Upper

Bound Tolerance VIF 1 (Constant) 5,409 ,032 166,463 ,000 5,345 5,473 LN_Rooms ,128 ,012 ,050 10,672 ,000 ,104 ,151 ,678 1,475 LN_Floors ,216 ,012 ,083 17,771 ,000 ,192 ,239 ,693 1,443 LN_GArea ,217 ,003 ,283 66,456 ,000 ,211 ,224 ,835 1,197 LN_Area ,749 ,008 ,499 89,969 ,000 ,733 ,765 ,492 2,032 IST12 -,055 ,004 -,059 -14,948 ,000 -,063 -,048 ,975 1,026 IST13 -,049 ,003 -,059 -14,829 ,000 -,055 -,042 ,972 1,029 IST14 -,029 ,003 -,037 -9,192 ,000 -,035 -,023 ,917 1,091 IST15 ,067 ,005 ,048 12,323 ,000 ,056 ,077 ,989 1,012 IST16 ,062 ,006 ,044 11,237 ,000 ,051 ,073 ,989 1,011 IST17 -,076 ,003 -,119 -29,242 ,000 -,081 -,071 ,909 1,100 IST18 ,057 ,005 ,047 11,889 ,000 ,048 ,067 ,972 1,029 IST19 ,031 ,005 ,024 5,975 ,000 ,021 ,042 ,977 1,023

(32)

APPENDIX A. OUTPUTS 30

Figure A.3: Coefficients for model (5.11)

Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. 95,0% Confidence Interval for B Collinearity Statistics B Std. Error Beta Lower Bound Upper

Bound Tolerance VIF 1 (Constant) 5,415 ,032 168,911 ,000 5,353 5,478 LN_Rooms ,121 ,012 ,048 10,226 ,000 ,098 ,144 ,676 1,480 LN_Floors ,219 ,012 ,084 18,317 ,000 ,196 ,243 ,692 1,446 LN_GArea ,199 ,003 ,259 62,354 ,000 ,193 ,206 ,848 1,179 LN_Area ,762 ,008 ,508 92,062 ,000 ,746 ,779 ,483 2,071 IST23 -,043 ,002 -,072 -18,345 ,000 -,048 -,038 ,954 1,048 IST24 -,020 ,003 -,029 -7,231 ,000 -,025 -,015 ,919 1,088 IST25 ,237 ,007 ,132 34,275 ,000 ,224 ,251 ,986 1,014 IST26 ,063 ,004 ,065 16,890 ,000 ,055 ,070 ,978 1,022 IST27 -,065 ,003 -,091 -22,855 ,000 -,071 -,060 ,931 1,075 IST28 -,065 ,005 -,052 -13,504 ,000 -,074 -,055 ,981 1,020 IST29 ,042 ,004 ,046 11,717 ,000 ,035 ,049 ,954 1,048 a. Dependent Variable: LN_Price

(33)

Bibliography

Bajari, P., Cooley, J., Kim, K., and Timmins, C. (2010). A theory-based approach to hedonic price regressions with time-varying unobserved product attributes: The price of pollution. American Economic Review, 102:1898–2926.

Cameron, A. and Trivedi, P. (2005). Microeconometrics, Methods and Applications. Cambridge University Press, 8 edition.

De Haan, J. and Erwin, D. (2013). Hedonic regression methods, in oecd, et al., handbook on residential property price indices. Eurostat.

Epple, D. (1987). Hedonic prices and implicit markets: Estimating demand and supply functions for differentiated products. The Journal of Political Economy, pages 59–80.

Franklin, J. and Waddell, P. (2002). A hedonic regression of home prices in king county, washington, using activity-specific accessibility measures.

Heckman, J., Matzkin, R., and Nesheim, L. (2009). Nonparametric identification and estimation of nonadditive hedonic models. Econometrica, 78:1569–1591.

Lehner, M. (2011). Modelling housing prices in singapore applying spatial hedonic regression. Rosen, S. (1974). Hedonic prices and implicit markets: Product differentiation in pure

compe-tition. The Journal of Political Economy, 82:34–55.

Wooldridge, J. (2002). Introductory Econometrics: A Modern Approach. Cengage Learning, 4 edition.

Referenties

GERELATEERDE DOCUMENTEN

Usíng a relevant test statistic, we analytically evaluate the risk charac- teristics of a seemingly untelated regressions pre-test estimator (SURPE) that is the GLSE if a

The findings suggest that white employees experienced higher on multicultural norms and practices as well as tolerance and ethnic vitality at work, and preferred an

This is a test of the numberedblock style packcage, which is specially de- signed to produce sequentially numbered BLOCKS of code (note the individual code lines are not numbered,

Switching to a font encoding supporting the Greek script is possible without switching the Babel language using the declarations \greekscript (no switch if the current encoding

Asymptotic (k-1)-mean significance levels of the multiple comparisons method based on Friedman's test.. Citation for published

Het publiek gebruik van dit niet-officiële document is onderworpen aan de voorafgaande schriftelijke toestemming van de Algemene Administratie van de Patrimoniumdocumentatie, die

C is een buigpunt: de daling gaat van toenemend over in afnemend (minimale helling) en punt E is ook een buigpunt: de stijging gaat over van toenemend in afnemend.. De helling

the vehicle and the complex weapon/ avionic systems. Progress at the time of writing this paper is limited, however the opportunity will be taken during the