• No results found

Real Estate Reimagined An attributional analysis on pricing and marketing strategies in Californian real estate

N/A
N/A
Protected

Academic year: 2021

Share "Real Estate Reimagined An attributional analysis on pricing and marketing strategies in Californian real estate"

Copied!
86
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Real Estate Reimagined

An attributional analysis on pricing and marketing strategies in Californian real estate

Master Thesis Marketing Management & Marketing Intelligence January 15, 2018 Author Laura Mollema (S1987348) Friesestraatweg 390 9718 NW Groningen lauramollema@gmail.com +31613606618 Supervisor

Prof. Dr. P.C. Verhoef (p.c.verhoef@rug.nl)

Research institute University of Groningen Faculty of Economics & Business

(2)

2

Summary

This study investigates which physical attributes and marketing effects, in Southern California’s high-end real estate market, are most important in determining the level of the listing and closing price of

residential properties. Anticipating on these attributes can help real estate agents to more effectively target potential buyers and sellers, thereby increase competition, and ultimately push the offering price. Marketing effects in this study consist of actual marketing activities such as support through digital media, but also the cumulative time a property is on the market. The real estate business is

characterized by relationship marketing, although little activity is actually registered, measured, and monitored by agents which results in a lack of evidence-based marketing. Additionally, an analysis on property attributes helps agents to support why certain prices are valid for a particular property, contributing to the agent’s bargaining power. Hence, the gap between the stage where the business currently resides in and the ambitious stage the business could be transformed to, is worth analyzing. The following research questions will be explored in this study:

‘Based on physical property attribute attractiveness and marketing effects, what insights can

quantitative research provide for an effective marketing and pricing strategy that supports real estate agents and companies?’

1. What is the importance and significance of the physical property attributes and marketing effects, and in which direction do these affect the listing and closing price?

2. Taking these elements into consideration, what would be suggested for the direction of the marketing and pricing strategy?

(3)

3

Although the linearized version of the model that was operated for estimations appeared not to be the optimal functional form, some careful conclusions can be drawn. The attributes that are most important in determining the listing and closing price are the home and lot square footage, the distance to the ocean, the type of unit (front or back), and the number of common walls. The second most important attributes are the number of full bathrooms, the number of levels, whether the property is equipped with a cooling system, hinder of noise from Pacific Coast Highway, and whether a rooftop deck or balcony is present. Unfortunately, not all hypothesized relations were significant which leaves room for data and model improvement. With regards to the pricing and marketing strategies, the key is to integrate both strategies so that the whole is greater than the sum of its parts. This can be initiated through additional quantitative research such as choice-based conjoint analysis. The information gathered can be implemented in the pricing strategy in such a way that the attributes with the highest part worth utilities receive more valence. More explicitly, when the number of bedrooms is highly valued by buyers, a higher listing price can be set for a property with more bedrooms when it is equal to other properties considering the home square footage for instance. Implementation in the marketing strategy should then result in an emphasis on the number of bedrooms.

(4)

4

Preface

This study was conducted by L.D.A. (Laura) Mollema, from Groningen, The Netherlands. A particular personal interest initiated the topic of this thesis. Having visited the United States in 2015, the country raised more and more of my interest. While graduating from my PrMSc track in June 2016, I got an e-mail from Hanze University of Applied Sciences where I took my bachelor’s from 2012 to 2015: my bachelor thesis had been nominated for the Commercial Prize by a leading membership organization that unites the most important companies and businesses in the district of Groningen. Out of total number of 120 marketing theses, my thesis was selected because of the highly valuable outcome of the research combined with substantiated and applicable marketing advice for HEAD, a stakeholder

organization for finance professionals in the healthcare sector in the Netherlands. During the graduation ceremony of July 5th 2016, my thesis finally was announced the prize-winning thesis, receiving a reward and a free membership of the “Commercieele Club Groningen” (CCG), handed over by one of the board members of CCG. Another consequence was that I had to perform a pitch on September 12th 2016 at the CCG on a night that was coincidentally American-themed. I pitched about my future plans and how I wanted to move to Los Angeles and find my way in marketing analytics. By the end of the night one man, Chris, hinted that he knew someone in the States, Josef, who lived just a little under Los Angeles and was very successful in the real estate business. Chris got me and Josef in touch after the pitch, I booked my flight to Newport Beach in October 2016 and visited Josef in April 2017. I immediately received a job offer and figured that the data provided by the agency and my knowledge on analytics, could take the business to the next level.

Therefore, I was evenly grateful to my supervisor Prof. Dr. P.C. Verhoef who allowed me to study this topic even though I initially was assigned to his group to perform a study on customer journey touch points. I also want to thank Mr. Verhoef for his time and constructive feedback along the writing process. Another professor I would like to thank is Dr. K. Dehmamy, who was not my supervisor but helped me out with some puzzles in R Studio.

(5)

5

Table of contents

Introduction ... 7

1. Theory ... 10

1.1 An introduction to the business of (Southern California) Real Estate ... 10

1.1.1 General market of real estate ... 10

1.1.2 Agent’s interference ... 12

1.2 Conceptual model ... 13

1.2.1 Property attributes... 14

1.2.2 Marketing activities ... 19

1.2.3 Listing and closing price ... 21

2. Design ... 23

2.1 Data ... 23

2.2 Method ... 25

2.3 Descriptives ... 27

2.4 Missings ... 29

2.5 Histograms, bar charts, scatterplots ... 31

3. Results ... 32

3.1 Specification OLS1, OLS2, OLS3 ... 32

3.2 Non-normality ... 33 3.3 Endogeneity ... 37 3.4 Multicollinearity ... 38 3.5 Heteroscedasticity ... 39 3.6 Functional form ... 41 3.7 Summary ... 42

3.8 Final model specification and estimation ... 43

3.8.1 Estimation OLSF1 ... 43

3.8.2 Estimation OLSF2 ... 45

3.8.3 Estimation OLSF3 ... 46

3.9 Validation ... 46

(6)

6

3.9.2 Statistical validity ... 47

4. Conclusion ... 49

4.1 Discussion ... 52

4.2 Limitations... 53

5. Recommendations and implications ... 54

Literature ... 56

Appendix A – Map of the State of California, USA ... 63

Appendix B – Property selections ... 64

Appendix C – Linearized models in R ... 66

Appendix D – Histograms ... 67

Appendix E – Barcharts ... 72

Appendix F – Scatterplots ... 75

Appendix G1 – Residual plots and histograms OLS2, OLS3 for DataImp ... 82

(7)

7

Introduction

The US mortgage market essentially imploded between January and August 2007 (FT, 2017; Malpezzi, 2017). Now, about ten years later, consumer confidence in the Western world has not yet been restored to its pre-crisis levels but is growing nevertheless and most promising in North-America (Nielsen, 2017). Apparently compared to previous years people are less afraid to spend and to invest, which has affected the world economy positively. Liao, Zhao and Sing in their 2013 study also confirmed that ‘the consumption change induced by house price appreciation (in the article called the ‘housing wealth effect’) is dependent on households’ attitudes toward risk’. They indicate that ‘the estimation results suggest a significant negative relationship between the housing wealth effect and households’ risk attitudes. Households, who are less risk averse, experience greater consumption changes in response to house price appreciation’. Consequently, in relation to the (minor) increase in consumer confidence, both durable and non-durable consumer goods as well as services are hence being

consumed at an increasing rate (Fereidouni & Tajaddini, 2015). This also entails that the housing market has been crawling back up from hitting rock bottom several years ago. In California for instance, the market is booming with its skyrocketing housing prices, and so is the competition (Collins, 2017 and Khouri, 2016).

So concerning what has happened in the past ten years and taking the current level of competition into account, what does this hold for the future of US residential real estate agencies? Despite the

(8)

8

become a real estate agent, that is, become licensed. However, only a small percentage of the agents can actually make a living out of it and an even smaller percentage becomes very successful. This raises the question: What do successful agents do differently from non-successful agents and how does this affect the path to purchase or even bigger, the real estate market as a whole? Astonishingly, little empirical research has been performed in this area; an area within an industry that has affected the world economy on a tremendous scale throughout the years.

Another element that contributes to the importance of this research is that for most people buying a house is one of the biggest purchases they make in their lifetime. It requires investing both emotionally and financially in the search for the perfect home. Research within real estate could be enriched by including concepts from psychology that have affiliation with marketing. This will assist agents in analyzing and explaining the behavior of real-estate consumers, which currently is an underrepresented field of study and is hence unknown to many agents.

From a quantitative point of view, data on 480 properties in a coastal area (Orange County) in South California, US, will be analyzed. This is especially an interesting area, considering that the Orange County Register (OCR) reported that the Orange County median house price has set a record at

$795,000 (Collins, 2017), and that 1-in-5 Orange County home sales tops seven figures (Lansner, 2017). Also, homes in the State of California are found to be radically more expensive than in other states (Zillow, 2017). The analysis will focus on home attributes that people might consider important when buying a property, and how this affects the desirability and price of the property. Research on attribute desirability has been performed in plural in the last decades (Gloudemans & Miller, 1976; Isakson, 1998; Benjamin, Guttery & Sirmans, 2004). However, these are often limited by merely investigating the effects of certain physical attributes while avoiding marketing related determinants. Also, most existing literature has global coverage or is oriented at regions culturally very dissimilar to the United States, such as Asia. Therefore, this does not allow for a focus on issues that are particular to any specific region, such as Orange County. For this study, not only will the most influential attributes be

(9)

9

study, several research questions and hypotheses will structure the research process. First, the research question with two sub questions are presented below.

‘Based on physical property attribute attractiveness and marketing effects, what insights can

quantitative research provide for an effective marketing and pricing strategy that supports real estate companies?’

1. What is the importance and significance of the physical property attributes and marketing effects, and in which direction do these affect the listing and closing price?

2. Taking these elements into consideration, what would be suggested for the direction of the marketing and pricing strategy?

(10)

10

1. Theory

This section will exhibit an overview of the scientific theories that are relevant for the current study. Some theories are more of an informative nature, while others will be tested and therefore

hypothesized within the scope of this research. First, some introductory pages on the business of real estate should give the reader a feeling of what issues occupy the market. Next, a conceptual model is suggested. Consequently, theory that is necessary to provide understanding for the conceptual model and all the elements of the research question will be discussed.

1.1 An introduction to the business of (Southern California) Real Estate

Since real estate is not a market that is frequently highlighted in marketing literature, some introduction and clarification to this topic is desired. In this introduction to the business of real estate, definitions, industry dynamics, prices, market power, and an agent’s interference are respectively discussed below. Additionally, a map of the state of California with a mark for Orange County is enclosed in appendix A.

1.1.1 General market of real estate

(11)

11 in determining the importance of attributes later on, both the square footage (sqft) of the building/structure and the lot will be estimated.

Even though the definition in the Business Dictionary is quite specific it still leaves room for further distinction. First, a distinction should be made for the type of real estate, namely residential real estate. Residential real estate (or private real estate) focuses on single family homes that serve as housing properties. The term ‘real estate’ in this study only entails residential real estate which means that the separate area of for example commercial and industrial real estate are not considered. For the current research, residential real estate will very simply be defined as ‘the lot and its attached structure, including the benefits and disadvantages gained here from’. Second, in general a market consists of demand and supply or a buying and a selling side of the equation, where normally the seller and buyer roles might be implicit but often very clear. In the residential real estate market, so the buying and selling of real estate, these roles are clear as well but here both sides are accompanied by a real estate agent who mediates the buying/selling process; a third party. On the selling/owner side, the agent will aim for a good closing price for the owner, and on the buying side the agent will aim for a good deal for the buyer. As this explanation indicates, the promoted interests are each other’s ends of a continuum. Therefore, an agent generally represents one of both sides, but may in rare situations represent both sides which is called dual agency (Johnson, Lin & Xie, 2015; Zucker & Zucker, 2016). Ample research has been performed in the area of the effectiveness of dual agencies, but this is not within the scope of this research. Next, for this research, three important types of prices are distinguished namely listing, offering, and closing price. Because these prices are common in the financial market, it should be noted that the consequent definitions are developed by the author to suit the real estate market. The listing price indicates the price at which a property hits the market and which is determined by the real estate agent and the seller. Next, the offering price is the actual bid that potential buyers make on the

(12)

12

1.1.2 Agent’s interference

Now that the general market of real estate has been discussed, it is worthwhile to more broadly discuss an agent’s interference in the buying and selling process. So, prior to the actual discussion about attributes and prices, some studies on an agent’s interference are highly important to review.

(13)

13

buyer gets the opportunity to visit the property. However, before any buyers or sellers are interested, they first must be informed about the opposite side of the negotiation, meaning that potential sellers should be notified on potential buyers and vice versa. Subsequently, both parties must be notified regularly through calls or emails to be informed about (follow up on) recent developments or properties that just hit the market and might seem interesting. All these activities are aimed at convincing the party of interest to either list or purchase with the agent. When these activities suit the customer it is likely to assume that it increases the probability of achieving either one of two goals. Unfortunately, the data available for this study does not provide any particular insights on agent activity within the selling process. Therefore the agent’s interference will not be conceptualized as a marketing attribute. Nevertheless, this remains an interesting field to study in the future.

1.2 Conceptual model

Prior to elaborating on the attributes of interest, the conceptual model is presented below in figure 1.1.

(14)

14

The model is a representation of the attributes and its assumed relations based on data availability and earlier studies, which will be discussed next. Also, every individual attribute will be clarified following definitions and outcomes of previous research. The impact (positive/negative) of the individual

attributes is not conceptualized in the model to sustain simplicity. They are however hypothesized in the table of expectations at the end of this chapter.

1.2.1 Property attributes

Some previous studies (Bond, Seiler & Seiler, 2002; Ozdilek, 2016) have attempted to measure what physical property attributes of a home most significantly define price, but none of these have been performed with data from Southern California with the purpose to determine an accountable marketing and pricing strategy in such a high-end segment. For this particular study, the term ‘attributes’ is

preferred as an indicator for measurement of the independent variables that affect either or both the listing and the closing price, depending how they are modeled in figure 1.1.

(1) & (2) Square footage (SQF) of home and lot

Bond, Seiler & Seiler (2002) list several potential attributes in their study ‘Residential Real Estate Prices: A Room with a View’. From age, air-conditioning, attic, basement, bathrooms, bedrooms, construction quality, fireplace, frontage, home SQFT, lot size, and roof style, they find that lot size and square footage of the home significantly affect a home’s value. This has also been confirmed by Adair, Berry & McGreal (1996) and Anderson & West (2006). Estimation of these separate attributes should however be performed with care, as these effects might be endogenous. Endogeneity could be present because of the simultaneity of the variables. Both measure the effect of size, which means that the square footages of the home and the lot explain (part of) the same variance in the listing and closing price. The home and lot square footage are hypothesized to have a positive effect on the listing and closing price.

(3) View and distance to ocean

(15)

15

views on home value and found this effect to be significant, date back to the early 1970s (Darling, 1973; Plattner and Campbell, 1978; Gillard, 1981; and Seiler, Bond and Seiler, 2001). For this analysis, there is accounted for 3 view categories: (1) full view, (2) partial view, (3) no view. Benson, Hansen, Schwartz & Smersh (1998) created similar categories. For the distance to the Pacific Ocean it is assumed that the smaller the distance the more preferred. Benson, Hansen, Schwartz & Smersh (1998) distinguish between 0.1, 0.5, 1, and 2 miles from the ocean. Although the exact numbers in the following analysis might differ from their study, they find that “The percentage impacts for ocean view homes imply that a $200,000 home with no view would sell for $336,620 with an ocean view (a 68.31% increase) if located 0.1 miles from the water, $312,420 if located 0.5 miles from the water, $274,060 if located one mile from the water, and $251,280 if located two miles from the water with an ocean view” (Benson, Hansen, Schwartz & Smersh, 1998). Therefore, an ocean view and the distance to the ocean are worth considering when estimating the listing and closing price. An ocean view is expected to increase both prices, and an increase in the distance to the ocean is expected to have a negative effect on both prices.

(4) Front/back unit

(16)

16 (5) Age

Mok, Chan & Cho (1995) found that compared to other structural attributes, the age of the building is amongst the top three in explaining the variation in housing price. Hipp & Singh (2014) in their ‘Changing Neighborhood Determinants of Housing Price Trends in Southern California, 1960–2009’ mention that ‘whereas earlier research suggests that the age of housing stock is associated with home devaluation, age may be a proxy for the structural condition of a home (Franklin and Waddell 2003; Oates 1969). Li and Brown (1980) in Boston, and Goodman and Thibodeau (1995) in Dallas, found a positive coefficient for the quadratic housing age variable, implying that consumers are willing to pay a premium for homes with a historic quality. In contradiction, Anderson & West (2006), Seong-Hoon, Bowker & Park (2006) and Cameron (2006) found that age has a negative effect. Due to constraining findings, it is difficult to determine beforehand what effect age will have on the listing and closing price, and if there even is a significant effect.

(6) Number of bedrooms

Although the number of bedrooms that are desired by potential buyers may fluctuate with the number of members in the household, objectively Adair, Berry & McGreal( 1996) find that the more bedrooms a property is provided with, the higher the value of the home. This was also found by Seong-Hoon, Bowker & Park (2006). The number of bedrooms is therefore hypothesized to increase (decrease) the listing and closing price with every unit increase (decrease).

(7) Number of bathrooms

Listings in general distinguish between a (1) full bathroom, (2) ¾ bathroom, (3) ½ bathroom and (4) ¼ bathroom. A full bathroom includes a tub with shower, a sink and a toilet. ¾ is a shower without a tub, but including a sink and a toilet. ½ is a sink and a toilet and ¼ is just a sink. Each ¼ basically stands for the different separate facilities. Earlier research has found that the presence of a bathroom increases the value of the property (Adair, Berry & McGreal, 1996; Anderson & West, 2006). Although the exact numbers will not allow for replication, according to the estimates of the traditional hedonic autoregressive model, the closing price of a house is expected to rise by $7,706 with an additional bathroom (Can, 1992). Since the present study distinguishes more specifically between types of

(17)

17 (8) Levels

The number of levels (stories) a property contains, is represented in the data by the number of levels. More levels on the one hand can infer a disadvantage as one might be obligated to take the stairs to enter the house or go to another floor within the house. On the other hand, an advantage might be the fact that the more levels a house contains, or the higher the structure of the building is, the bigger the chance of an ocean view compared to a one story flat lot house. Mok, Chan & Cho (1995) in their study confirm that the number of levels is expected to increase the housing price. They also found that together with age, compared to other structural attributes, the story level of the building is amongst the top three in explaining the variation in housing price. The number of levels is therefore not only

expected to be an important determinant, it is also expected to positively affect both prices.

(9) Parking

If a parking spot is either on-site or assigned (so potentially located elsewhere, like in a central parking space) to the property, parking is indicated by the number of parking spots. If no parking spot is assigned to the property (for instance street parking), parking is indicated as zero. Adair, Berry & McGreal (1996) highlight that on-site parking is in favor of the value of the home. Can (1992) states that the closing price can also significantly be increased by having a two-car garage. Since the type of parking spots that are assigned to a property are not specified in the data, it will in general be assumed that the number of parking spots boosts the listing and closing price for every unit increase.

(10) Pool

(18)

18 (11) Common walls

A common wall or party wall is “A partition erected on a property boundary, partly on the land of one owner and partly on the land of another, to provide common support to the structures on both sides of the boundary” (Jacobus, 1986). In simple terms, it is/are one or more walls shared with neighbors. Previous research has found that an increasing number of common walls, decreases the property value (Bailey, 1966). So, for every unit increase in the number of common walls a negative impact is assumed on the listing and closing price.

(12) Temperature regulation

In-home climate can for instance be regulated by heating (central or fireplace) or by

air-conditioning. Can (1992) found that both a fireplace and air-conditioning (AC) have a positive effect on housing price. Seong-Hoon, Bowker & Park (2006) also confirmed this effect for a having a fireplace. Benson, Hansen, Schwartz & Smersh (1998) presented that heating through forced air or even through hot water or a heat pump is positively related to the housing price. In general, all elements of

temperature regulation (fireplace, heating, cooling) are expected to positively influence both prices.

(13) Distance to Pacific Coast Highway

Properties being too close to Coast Highway are negatively influenced by this location effect, as Adair, Berry & McGreal (1996) find that traffic noise decreases home value. So does Nelson (1982) who presented a sensitivity analysis that suggests a noise discount on the closing price. An additional

interesting finding is that “highway noise does not lead to increased time on the market”. Confirmatory, Swoboda, Nega and Timm (2015) showed that “an average reduction in house prices of approximately 0.4 percent per additional decibel”. Being too far away on the other side, could compromise

convenience of highway access and increase commuting time (Leggett & Bockstael, 2000) and therefore decrease price. A major caveat of measuring both distances was pointed out by Gamble, Sauerlender & Langley (1974), who investigated both the effects of regional accessibility and highway-generated disturbances on property values. They determined a relationship among the Noise Pollution Level (NPL), distance from the highway, and reduced property values. Considering this relationship and the

(19)

19

distance to Pacific Coast Highway on the listing and closing price is hypothesized to have a positive effect; increasing the distance (decreasing the noise) increases property value.

(14) Cumulative Time On the Market (CTOM)

The CTOM is available through data of the MLS. Cumulative in the MLS means the total sum of periods and days a property is on the market. Properties might switch agents during their marketing time, which causes these separate periods of a property’s time on the market (TOM). Jud, Seaks & Winkler (1996) state that the degree of housing market illiquidity is most often measured by cumulative time on the market (CTOM). This phenomenon has been studied widely and a number of papers

discovered some determinants for CTOM. Belkin, Hempel & McLeavey (1976) show that CTOM is a negative function of the difference between listing price and closing price. Assumed is that the longer a property is on the market, the lower the closing price will be compared to the listing price. Yavas & Yang (199) very explicitly found that ‘The listing price affects how long it takes to find a buyer, and the time on the market influences the price that results from the bargaining between the seller and the buyer’. They also point out that ‘many studies fail to recognize the simultaneity problem between TOM and the selling (closing) price; TOM can affect selling price as selling price can affect TOM’. However, because of the complexity of the current model, this simultaneity effect will not be accounted for in the present study and is therefore suggested to consider in future research. Haurin (1988) finds that also typicality contributes to the marketing time and determines that the more atypical the house, the longer the marketing time. However, this time can be decreased by an agent’s assistance (Baryla & Ztanpano, 1995). Since no data on agent interferences is available and therefore not included in the model, the CTOM is expected to negatively affect the listing and closing price.

1.2.2 Marketing activities

(20)

20

listings. These can be considered marketing activities as well since the agent is promoting the listing. However, the operated dataset does not provide any integrated information on such activities.

(1) Video

No particular studies on the use of videos related to a listing have been performed yet. Since the data available for this study allows for observing such an effect, it will however be investigated in order to contribute to the field of digital marketing media in real estate. The effect of a video is assumed to be positive, as one might compare this to a virtual tour of the property which is found by Allen, Cadena, Rutherford & Rutherford (2015) to have a positive effect on the closing price. Nevertheless should interpretations of the estimated values be executed carefully, as the variable of video might be

endogenous to some other aesthetic attributes of the property. An agency might choose to not shoot a video of a property that is in bad shape. However, since the latter effect will not be modeled, it is assumed that a video as marketing activity contributes to the closing price.

(2) Pictures and number

Very interestingly, Allen, Cadena, Rutherford & Rutherford (2015) performed a study called “Effects of Real Estate Brokers' Marketing Strategies: Public Open Houses, Broker Open Houses, MLS Virtual Tours, and MLS Photographs”. This study is interesting because it accounts for the effects of both the presence of pictures and the number of pictures, and whether there is a difference in effectiveness in relation to estimating closing price. They find that initially the presence of pictures establishes a positive effect on the closing price. However, from a number of zero pictures up to five pictures, the effect of pictures is marginally negative in relation to the closing price and might even vanish the initial positive effect of the presence of pictures. This means that agencies should always consider to upload at least six pictures in order to contribute to the rise of the closing price. Confirmed by this same research is that from a number of seven up to ten pictures the closing price will increase. For the current study it will therefore be assumed that the number of pictures, when they exceed the bar of seven, will have a positive effect on the closing price.

(21)

21

public open house all positively influence the closing price. Respectively, a virtual tour, an open house by a broker, and a public open house perform the most influence on the closing price.

1.2.3 Listing and closing price

As explained in section 1.1.3, this study distinguishes between three types of prices. The data available for this study initially provides information on the listing price. However, through manual alterations of the data also the closing price is included. This adaptation might provide some relevant insights related to how property attributes and marketing activities determine the closing price. The offering price is not available because this is considered confidential information and is therefore not disclosed. In addition to the working definitions of prices described in the introduction to real estate, plenty of theory has been written about what determines these prices. A highly relevant study by Bond, Seiler & Seiler (2002) reveals that ‘square footage and lot size significantly affect a home’s value’ and that having a very desirable view adds a substantial amount of money to the value of the home. The relevance of this study is high because the data from the area that will be investigated is particularly of interest for consumers because of the location and potential ocean view. Another research performed by Ozdilek (2016) innovatively distinguishes between land and building components of the property and how they affect property price, and consequently present ‘the bases for the separability thesis and provided an empirical solution which can be used in practice’. A suggestion made by Cheung, Yau & Hui (2004) encompasses that ‘in particular, estimated house price appreciation is usually systematically higher among properties that change hands more frequently’. They therefore suggest that ‘the determination of important factors affecting the transaction frequency or intensity of a housing unit should be a more fundamental research question’. Although transaction frequency is not something that is associated with the scope of this study, the implications about the specific factors that influence frequency might be useful. Verhoef, Neslin, and Vroomen (2007) indicate that product price is a proxy for the financial risk involved with a purchase of a product. The higher the price, the higher the financial risk for a consumer if (s)he makes a ‘wrong’ decision.

(22)

22 Table 1.1 Table of expectations

perspective, namely by differing the listing price and investigating how this affects the presumed value of the attributes.

1.2.4 Table of expectations

The table of expectations, as presented below in table 1.1, sums all hypothesized relations from sections 1.2.1 to 1.2.3. This table will be repeated after estimation, to verify the assumed relations.

Table of expectations

Predictor Hypothesized relation Dependent variable

Property attributes (PA)

SQFT home Positive Listing price

SQFT lot Positive Listing price

Ocean view Positive Listing price

Distance to the ocean Negative Listing price

Unit Front/Back Positive/Negative Listing price

Age Not specified Listing price

# bedrooms Positive Listing price

# bathrooms Positive Listing price

Level Positive Listing price

Parking Positive Listing price

Pool Positive Listing price

Common walls Negative Listing price

Temperature regulation Positive Listing price

Distance to Coast Highway (Noise) Positive Listing price

Outdoor living space Positive Listing price

Cumulative Time On the Market Negative Listing & closing price Marketing activities (MA)

Video Positive Listing & closing price

# of pictures Positive Listing & closing price

Price

(23)

23

2. Design

In order to make a contribution to the field of research that aims to predict and assess attribute importance and marketing effectiveness and their effect on housing prices, this chapter will discuss the data and methodology used. The most important data source used for this research is the CRMLS database. CRMLS stands for California Regional MLS and is ‘The Nation’s Largest and Most Recognized MLS’ (CRMLS About US, 2017). The homepage offers a portal to a matrix wizard which only is available to members of realtor associations that are affiliated with this regional MLS. The author has been granted access to this matrix by Teles Properties (Newport Beach office) – A Douglas Elliman company. In this chapter the research approach is described step by step and first contains a clarification of the research type. Next, the population, sampling method and sample size will be illustrated. Following these elements, some indications about the representability will be depicted. To conclude the chapter, the final sections will contain an explanation of the data collection process and which types of analysis accordingly fit the generated data.

2.1 Data

This explorative study has dual empirical goals: (1) to determine which property attributes and (2) which marketing activities significantly affect the listing and the closing price for the specific sample. Insights generated from this analysis should provide some support for a marketing and pricing strategy. Attributes that come out important should receive more attention. The term ‘most important’ will for this study be defined as the attributes that have the highest estimate (either positive or negative) in the output table of the data model analysis. In terms of attention, if the view significantly contributes to estimating the housing price this view should mainly be communicated through all current and perhaps new channels operated by the real estate agency. Not included in this study are the many

environmental influences that might affect price. Since data from a very particular geographical area is collected, the expectation is that the results will not be affected by national-level economic factors, demographics or infrastructure.

(24)

24

Avocado Drive to Poppy Avenue, resulting in a pentagon. The pentagon consists of 480 ‘luxury’

properties that have been sold between 1/1/2014 and 31/10/2017. This timeframe has specifically been chosen to establish recency of the effects that might be discovered, because for instance mentioned in the attributes section where the age of a home is discussed, perception of the age has changed over the years. No further special listing conditions will be researched, such as properties that have been sold by banks or under auction. An overview of the data selection can be found in appendix B.

On the one side, data of all these 480 properties are available. On the other side a barrier to perform research on this subject was defined by Knight, Sirmans, Gelfand & Ghosh (1998) who found that ‘Real estate data are often characterized by data irregularities: missing data, censoring or truncation, measurement error, etc. Practitioners often discard missing- or censored-data cases and ignore measurement error’. Taking these findings, the specific nature of the dataset, and its initial size into account, at this stage no further sampling will be executed. However, any observations that come with additional conditions to the sell have been removed from the dataset afterwards since not every exception could have been excluded in the data selection process. Examples are listings that are only presented for comparison purposes with other properties, listings that have actually sold before the timeframe which this study is focusing on, a listing that appeared to be only a lot, and a front unit listing that specifically had to be sold together with the back unit. The final listing that has been deleted from the dataset concerns a property for which it was unclear whether the attributes presented in the listing belonged to the current or proposed structure (a proposed structure is a new structure that will be built on the land after removing the current structure). After removing these listings, the dataset contains n = 461 observations.

The findings of this research might also be representable to other ocean view areas with high density of luxury properties. Think of Hawaii, Oregon, Washington on the West coast, and New York,

(25)

25

2.2 Method

In establishing the correct method for estimating the effect of the attributes and marketing activities on the listing and closing price, the following elements should be taken into account. As the first goal is to determine which attributes affect the listing and closing price, another distinction is necessary, namely to what extent the listing price also affects the closing price. This indicates that the effect of the listing price should be considered when estimating the impact of the other attributes on the closing price.

For the independent or predictor variables the purpose of the analysis is to highlight what attributes influence the price level but also to what extent they caused a price change (difference between listing and closing price). Considering these elements, the OLS-model best fits this approach (Isakson, 1998). To account for all these effects, three types of OLS should be performed. The first model (OLS1) regresses the property attributes on the listing price. The second model (OLS2) regresses the property attributes and marketing activities on the closing price, without including the listing price. The marketing activities are marked bold in OLS2. The third model (OLS3) regresses the property attributes, marketing effects, and the listing price on the closing price. The addition of the listing price in OLS3 is marked bold.

Note that for variables that can take excessive proportions compared to other variables in the model, are estimated with a log-transformation. This concerns both the listing and closing price, the square footages of the home and the lot, and the distances to the ocean and Pacific Coast Highway. The designs of the three models are preliminary as the final models will be determined after running several checks on statistical validity.

(26)

26 Variable and dataset specification

Dependent LP CP = = Listing price Closing price Ratio Ratio Independent = HSQF OView DisOc Unit Age Bed Bfull B3qua Bhalf B1qua Level Parking PoolP PoolC Comwal Fire Heat Cool DisNo Roofdeck BalTer PatYar CTOM Video Photo ε = = = = = = = = = = = = = = = = = = = = = = = = = =

Home square footage Ocean view, yes (1)/no (0)

Shortest walking distance to the ocean Sort of unit, back (2) or front (1)

Age of the property (Year now – year built) Number of bedrooms

Number of full bathrooms (bath tub, shower, toilet, sink) Number of ¾ bathrooms (shower, toilet, sink)

Number of ½ bathrooms (toilet, sink) Number of ¼ bathrooms (sink)

Number of levels/stories within the home

Number of parking spots assigned to the property Private pool, yes (1)/no (0)

Community pool, yes (1)/no (0)

Number of common walls with other properties Fireplace

Heating

Cooling/Air conditioning

Distance to Pacific Coast Highway measured by celestial latitude Rooftop deck/terrace, yes (1)/no (0)

Upstairs balcony/terrace/patio, yes (1)/no (0) Downstairs patio/yard (ground level) , yes (1)/no (0) Cumulative Time on the Market

(promotional) video, yes (1)/no (0) Number of pictures added to the listing Residuals Interval Binary Interval Nominal Ratio Ratio Ratio Ratio Ratio Ratio Interval Ratio Binary Binary Ratio Ratio Ratio Ratio Interval Binary Binary Binary Ratio Binary Ratio Dataset

Data = 480 listings of properties

Figure 2.1, 2.2, and 2.3 show the three multiplicative OLS-models.

Figure 2.1 OLS1 model

𝐿𝑃 =∝∗ 𝐻𝑆𝑄𝐹𝛽1∗ 𝐿𝑆𝑄𝐹𝛽2∗ 𝑂𝑉𝑖𝑒𝑤𝛽3∗ 𝐷𝑖𝑠𝑂𝑐𝛽4∗ 𝑈𝑛𝑖𝑡𝛽5∗ 𝐴𝑔𝑒𝛽6∗ 𝐵𝑒𝑑𝛽7∗ 𝐵𝑓𝑢𝑙𝑙𝛽8∗ 𝐵3𝑞𝑢𝑎𝛽9 ∗ 𝐵ℎ𝑎𝑙𝑓𝛽10∗ 𝐵1𝑞𝑢𝑎𝛽11∗ 𝐿𝑒𝑣𝑒𝑙𝛽12∗ 𝑃𝑎𝑟𝑘𝑖𝑛𝑔𝛽13∗ 𝑃𝑜𝑜𝑙𝑃𝛽14∗ 𝑃𝑜𝑜𝑙𝐶𝛽15

∗ 𝐶𝑜𝑚𝑤𝑎𝑙𝛽16∗ 𝐹𝑖𝑟𝑒𝛽17∗ 𝐻𝑒𝑎𝑡𝛽18∗ 𝐶𝑜𝑜𝑙𝛽19∗ 𝐷𝑖𝑠𝑁𝑜𝛽20∗ 𝑅𝑜𝑜𝑓𝑑𝑒𝑐𝑘𝛽21 ∗ 𝐵𝑎𝑙𝑇𝑒𝑟𝛽22∗ 𝑃𝑎𝑡𝑌𝑎𝑟𝛽23∗ 𝜀

(27)

27

The OLS1 contains all variables that could be important for determining the listing price. Logically, variables that contain a ‘time-effect’ such as the time on the market and the effect of the marketing activities (Photo/Video) are excluded for this model. OLS2 consequently includes the attributes of OLS1 and the ‘time-effects’ of CTOM, Video and Photo. In, OLS3 all elements of OLS2 are included but additionally the listing price is added to the equation to check whether the estimates of the other coefficients change in either size, sign, or significance. For the estimation in R Studio, a linearized version of the multiplicative model has been used which also required several log-transformations for large continuous variables, knowing LP, CP, HSQF, LSQF, DisOc, and DisNo.

Note that these three models are still preliminary. Final model specification will be determined in chapter 3. Chapter 2 is concluded with a presentation of variable descriptives, a correction for missings, and a brief analysis of the visualized variable distribution based on boxplots and histograms.

2.3 Descriptives

First, some descriptives of the dataset with n = 461 were obtained. These raw descriptives highlighted some errors in the dataset. The first was discovered in the minimum listing price, which presented a value of €199,500.00. This seemed odd, taking the median housing price of the area into account. A correction was made by adding one 0, resulting in an amount of €1,995,00.00 which has been verified by checking the original listing. Next, the descriptives reported a missing value for the DisOc and DisNo variable. These missings have been resolved by calculating the according distances based on the

𝐶𝑃 =∝∗ 𝐻𝑆𝑄𝐹𝛽1∗ 𝐿𝑆𝑄𝐹𝛽2∗ 𝑂𝑉𝑖𝑒𝑤𝛽3∗ 𝐷𝑖𝑠𝑂𝑐𝛽4∗ 𝑈𝑛𝑖𝑡𝛽5∗ 𝐴𝑔𝑒𝛽6∗ 𝐵𝑒𝑑𝛽7∗ 𝐵𝑓𝑢𝑙𝑙𝛽8∗ 𝐵3𝑞𝑢𝑎𝛽9 ∗ 𝐵ℎ𝑎𝑙𝑓𝛽10∗ 𝐵1𝑞𝑢𝑎𝛽11∗ 𝐿𝑒𝑣𝑒𝑙𝛽12∗ 𝑃𝑎𝑟𝑘𝑖𝑛𝑔𝛽13∗ 𝑃𝑜𝑜𝑙𝑃𝛽14∗ 𝑃𝑜𝑜𝑙𝐶𝛽15

∗ 𝐶𝑜𝑚𝑤𝑎𝑙𝛽16∗ 𝐹𝑖𝑟𝑒𝛽17∗ 𝐻𝑒𝑎𝑡𝛽18∗ 𝐶𝑜𝑜𝑙𝛽19∗ 𝐷𝑖𝑠𝑁𝑜𝛽20∗ 𝑅𝑜𝑜𝑓𝑑𝑒𝑐𝑘𝛽21 ∗ 𝐵𝑎𝑙𝑇𝑒𝑟𝛽22∗ 𝑃𝑎𝑡𝑌𝑎𝑟𝛽23∗ 𝐶𝑇𝑂𝑀𝛽24∗ 𝑉𝑖𝑑𝑒𝑜𝛽25∗ 𝑃ℎ𝑜𝑡𝑜𝛽26∗ 𝑳𝑷𝜷𝟐𝟕∗ 𝜀

Figure 2.3 OLS3 model

𝐶𝑃 =∝∗ 𝐻𝑆𝑄𝐹𝛽1∗ 𝐿𝑆𝑄𝐹𝛽2∗ 𝑂𝑉𝑖𝑒𝑤𝛽3∗ 𝐷𝑖𝑠𝑂𝑐𝛽4∗ 𝑈𝑛𝑖𝑡𝛽5∗ 𝐴𝑔𝑒𝛽6∗ 𝐵𝑒𝑑𝛽7∗ 𝐵𝑓𝑢𝑙𝑙𝛽8∗ 𝐵3𝑞𝑢𝑎𝛽9 ∗ 𝐵ℎ𝑎𝑙𝑓𝛽10∗ 𝐵1𝑞𝑢𝑎𝛽11∗ 𝐿𝑒𝑣𝑒𝑙𝛽12∗ 𝑃𝑎𝑟𝑘𝑖𝑛𝑔𝛽13∗ 𝑃𝑜𝑜𝑙𝑃𝛽14∗ 𝑃𝑜𝑜𝑙𝐶𝛽15

(28)

28 address of the listing. Then the Unit variable reported a maximum of 3, while it should report a minimum of 0 (resembling a back unit) and a maximum of 1 (resembling a front unit). The error was traced in the dataset and corrected for based on information from the original listing. Another error was found in the number of common walls, which presented a maximum of 5 and was corrected for by 4, since one apartment building is part of the listings which has a front door on one side, a balcony on the other, and neighbors on the remaining four sides. This means that the only common walls could be with upstairs, downstairs, and left/right side neighbors. The final error in the data was the maximum of balconies. This variable showed a maximum value of 2, whereas the variable will be analyzed as a binary variable. The value was corrected for by a 1 based on the original listing. A summary of all descriptives is presented at the end of section 2.4 in table 2.2.

(29)

29

than surrounding lots. 100 sqft however is a faulty value since the home square footage is 1768 and the property does not contain an excessive number of levels. Lastly, 42560 and 52185 sqft are also

considered incorrect data points since the lot would then equal the size of a soccer field. All incorrect values were removed from the listing and imputed with the same method as the other missings have been imputed, as described in the next section.

2.4 Missings

Although some of these errors were relatively easy to verify, the LSQF reported more disturbing numbers. The variable presented 101 N/A’s (missings), which is almost 22% of the entire number of observations. Therefore a preliminary boxplot and histogram were run (not reported). Additionally, the descriptive values show a median of 3540 and a mean of 3974 while the histogram simultaneously shows an extreme frequency peak around those values, counting 50 observations of 3485 sqft and 118 observations of 3540 sqft. Concluding, within the 360 observations of LSQF that do not report a missing, almost 47% of those observations draw an extremely unilateral picture. These numbers evoke some suspicion on how random the missings are and if imputation/exclusion of the missings would enhance the analyses. The next question is, which technique is appropriate? Considering that the data are

missing at random (MAR), ‘Listwise Deletion’ for eliminating missings is a well-known technique, but this would result in a great loss of information. Also, since no other missings are revealed in the data, they may not be random and actually depending on the LSQF variable itself. Considering that the data might not be MAR but missing not at random (MNAR), the situation becomes increasingly complex. Using MAR techniques for imputation leads to biased estimations of the parameters which results in inaccurate predictions and interpretations. More ‘State-of-the-art’ methods are Multiple Imputation (MI) and Expectation Maximization (EM). A drawback of the EM method is that it does not generate standard errors. This can be resolved by using MI and operating it in the right package in R studio. The ‘MICE’ package generates pure MI which means that the imputed values are on the regression line. This is immediately a drawback of ‘MICE’ because the imputed values are always exactly on the regression line, whereas data points in a dataset are never all on one straight line. Also, ‘MICE’ assumes the data are MAR. A way to restore the ‘lost error’ is by implementing hot-deck imputation. Predictive mean

(30)

30

following two options seem more viable. The ‘missForest’ package generates imputations based on trees which restores the ‘lost error’, so this would be a valid option. Left is the random hot-deck

imputation method, operated through the ‘Hmisc’ package. Deciding between the ‘missForest’ package and the ‘Hmisc’ package, the latter one is preferred due to operational convenience. Running the imputation several times to assure no extreme outliers have been imputed multiple times, results in a new dataset with imputed values for LSQF. This dataset, called ‘DataImp’, will be used for further

analyses. Since the missings had to be resolved first, the table of descriptives is presented in this section.

Variable Mean Median Std. D Min Max

LP 2320437.22 1850000 1566076.49 768000 14995000 CP 2228033.02 1825000 1459477.49 675000 13750000 HSQF 2019.23 1800 948.98 625 8000 LSQF 3851.21 3540 1212.44 871 10766 OView 0.73 1 0.45 0 1 DisOc 3360.13 3600 1278.59 80 7000 Unit 0.705 1 0.46 0 1 Age 26.01 15 23.84 0 108 Bed 3.082 3 0.99 1 8 Bfull 2.382 2 0.96 0 6 B3qua 0.2755 0 0.62 0 5 Bhalf 0.4664 0 0.55 0 2 B1qua 0.01735 0 0.15 0 2 Level 2.375 2 0.70 0 5 Parking 2.202 2 1.00 0 11 PoolP 0.08243 0 0.28 0 1 PoolC 0.02169 0 0.15 0 1 Comwal 0.6486 1 0.68 0 4 Fire 0.9111 1 0.28 0 1 Heat 0.9978 1 0.05 0 1 Cool 0.7028 1 0.46 0 1 DisNo 794.38 750 445.31 80 2360 CTOM 106.82 83 102.06 0 888 Video 0.1562 0 0.36 0 1 Photo 19.62 19 11.65 0 67 Roofdeck 0.5488 1 0.50 0 1 BalTer 0.744 1 0.44 0 1 PatYar 0.6985 1 0.46 0 1

(31)

31

By reviewing table 2.2, two remarks about the descriptives are in place. First, a heating system seems to be present in almost every property (mean = 99.78%). Second, a community pool is almost always absent (mean = 2.17%). These numbers indicate that these variables are not relevant to include as predictors of the listing and closing price and should therefore be excluded from the final model.

2.5 Histograms, bar charts, scatterplots

All continuous variables within the ‘DataImp’ dataset were run in histograms (appendix D). Some of these present some clear outliers or skewness. Although some outliers were verified to be true data points, they may nevertheless result in distribution problems at a later stage of the research. Leeflang et al. (2015) in their book on ‘Modeling Markets’ state that one of the reasons why non-normality tests may indicate deviations from the normal distribution is the presence of outliers in the residuals. Especially in small data sets the OLS estimates are sensitive to outliers, because large residuals receive a lot of weight in the least squares minimization problem. Consequently, when deviations from normality are caused by outliers, OLS results may no longer be unbiased or efficient. Next, for the binary and categorical variables some bar charts were developed (appendix E). These show the frequencies within each of those variables. The categorical and binary variables of the dataset are OView, Unit, PoolP, PoolC, Fire, Heat, Cool, Video, Roofdeck, BalTer, and PatYar. The bar charts show that the majority of the properties has an ocean view and that more front than back units have sold. Almost none of the properties have a private pool. Most of the properties have a fireplace, but cooling is less common. Although many properties have multiple pictures attached to their listing, videos are less common. When it comes to outdoor living space, slightly more properties have a rooftop deck than properties that do not. The majority of the properties has an upstairs outdoor living space, such as a balcony or terrace and/or a downstairs patio or yard. Ultimately, all variables are presented in

(32)

32

3. Results

This chapter will show all estimations required to ultimately answer the research questions. First, three versions of the OLS model will be specified preliminary. Second, the initial models will be

estimated for overall modeling power. Next, some additional checks and estimations will be carried out to determine if one or more models contain statistical discrepancies that prevent the estimation from being efficient and unbiased. Consequently, the final models for estimation will be specified, estimated and validated to ultimately interpret the estimates and draw inferences that should lead to answering the research questions.

3.1 Specification OLS1, OLS2, OLS3

Since the OLS-model best fits the data and the type of research, this model was estimated for all three versions and called OLS1, OLS2, OLS3. The first model (OLS1) regresses the property attributes on the listing price. The second model (OLS2) regresses the property attributes and marketing activities on the closing price, without the listing price. The third model (OLS3) regresses the property attributes, marketing activities, and the listing price on the closing price. Because the descriptives showed that Heat and PoolC are no relevant predictors, they have been excluded from the model. All variables that could become zero are integrated as multipliers, to prevent the LP or CP from becoming zero when one of the independent variables becomes zero. The new models are presented below.

Figure 3.1 OLS1 model

𝐿𝑃 =∝∗ 𝐻𝑆𝑄𝐹𝛽1∗ 𝐿𝑆𝑄𝐹𝛽2∗ 𝑂𝑉𝑖𝑒𝑤𝛽3∗ 𝐷𝑖𝑠𝑂𝑐𝛽4∗ 𝑈𝑛𝑖𝑡𝛽5∗ 𝐴𝑔𝑒𝛽6∗ 𝐵𝑒𝑑𝛽7∗ 𝐵𝑓𝑢𝑙𝑙𝛽8∗ 𝐵3𝑞𝑢𝑎𝛽9 ∗ 𝐵ℎ𝑎𝑙𝑓𝛽10∗ 𝐵1𝑞𝑢𝑎𝛽11∗ 𝐿𝑒𝑣𝑒𝑙𝛽12∗ 𝑃𝑎𝑟𝑘𝑖𝑛𝑔𝛽13∗ 𝑃𝑜𝑜𝑙𝑃𝛽14∗ 𝐶𝑜𝑚𝑤𝑎𝑙𝛽15 ∗ 𝐹𝑖𝑟𝑒𝛽16∗ 𝐶𝑜𝑜𝑙𝛽17∗ 𝐷𝑖𝑠𝑁𝑜𝛽18∗ 𝑅𝑜𝑜𝑓𝑑𝑒𝑐𝑘𝛽19∗ 𝐵𝑎𝑙𝑇𝑒𝑟𝛽20∗ 𝑃𝑎𝑡𝑌𝑎𝑟𝛽21∗ 𝜀

Figure 3.2 OLS2 model

(33)

33

These models have been estimated initially to ensure that all consequent procedures for the different models are lucrative. No estimation of the coefficients is executed yet. The general model statistics are presented in table 3.1.

Considering the p-value which is <2.2e-16, it is safe to assume that all models have significant

explanatory ability for estimation of both the LP and CP, compared to the null-model. The multiple R2 for OLS1 indicates that 82.07% of the variance in the listing price can be explained by the predictor variables in the model. This number is relatively high, which could be caused by the large number of variables in the model. Therefore, the adjusted R2 should be interpreted. Table 3.1 shows that this value is almost equally high, namely 81.15%. For OLS2 the multiple and adjusted R2 are 82.07% and 81.01%,

respectively. For OLS3 the multiple and adjusted R2 are 99.31% and 99.27%, respectively. This

preliminary analysis shows that the predictor variables for all models have sufficient explanatory power to continue analyses. An estimation of the individual coefficients will follow from section 3.8.1. Next, it is important to determine whether there are any factors present that prevent the results from being biased and efficient. This will be done though a non-normality, endogeneity, multicollinearity, heteroscedasticity, and functional form check.

3.2 Non-normality

As discussed previously in section 2.3 to 2.5, the initial descriptives showed some clear outliers. Among the assumptions that are concerned with OLS estimation, is that the data should be normally distributed. Additionally, since these models are estimated following a linearized model, the data should

Model Multiple R2 Adjusted R2 F-statistic DF RSE p-value

OLS1 0.8207 0.8115 88.94 21 and 408 0.199 <2.2e-16

OLS2 0.8207 0.8101 77.26 24 and 405 0.1973 <2.2e-16

OLS3 0.9931 0.9927 2338 25 and 404 0.03866 <2.2e-16

Table 3.1 OLS1, OLS2, OLS3 initial model estimation

𝐶𝑃 =∝∗ 𝐻𝑆𝑄𝐹𝛽1∗ 𝐿𝑆𝑄𝐹𝛽2∗ 𝑂𝑉𝑖𝑒𝑤𝛽3∗ 𝐷𝑖𝑠𝑂𝑐𝛽4∗ 𝑈𝑛𝑖𝑡𝛽5∗ 𝐴𝑔𝑒𝛽6∗ 𝐵𝑒𝑑𝛽7∗ 𝐵𝑓𝑢𝑙𝑙𝛽8∗ 𝐵3𝑞𝑢𝑎𝛽9 ∗ 𝐵ℎ𝑎𝑙𝑓𝛽10∗ 𝐵1𝑞𝑢𝑎𝛽11∗ 𝐿𝑒𝑣𝑒𝑙𝛽12∗ 𝑃𝑎𝑟𝑘𝑖𝑛𝑔𝛽13∗ 𝑃𝑜𝑜𝑙𝑃𝛽14∗ 𝐶𝑜𝑚𝑤𝑎𝑙𝛽15 ∗ 𝐹𝑖𝑟𝑒𝛽16∗ 𝐶𝑜𝑜𝑙𝛽17∗ 𝐷𝑖𝑠𝑁𝑜𝛽18∗ 𝑅𝑜𝑜𝑓𝑑𝑒𝑐𝑘𝛽19∗ 𝐵𝑎𝑙𝑇𝑒𝑟𝛽20∗ 𝑃𝑎𝑡𝑌𝑎𝑟𝛽21 ∗ 𝐶𝑇𝑂𝑀𝛽22∗ 𝑉𝑖𝑑𝑒𝑜𝛽23∗ 𝑃ℎ𝑜𝑡𝑜𝛽24∗ 𝑳𝑷𝜷𝟐𝟓∗ 𝜀

(34)

34

follow a linear pattern. If this is not the case, the estimations will be biased. Both linearity and normality can be visualized by creating a QQ-plot and a histogram of the residuals of all three models respectively. Checking all three models is relevant because of the added predictors in OLS2 and OLS3 compared to OLS1. At first sight, the function appears to be quite linear (figure 3.4). However, the bottom left and top right corner do show some exceptional data points that could be outliers. Therefore, a histogram of the distribution of the residuals was created (figure 3.5). Residual plots and histograms of OLS2 and OLS3 are available in appendix G1.

(35)

35

Clearly, the distribution of residuals still shows outliers, especially for OLS3. These visual representations are not unexpected, since in section 2.2.1 already hinted towards possible non-normality. However, because all exceptional data points were verified, what else could cause these outliers? Clearly, there are some very large and some very small properties included in the dataset as shown in table 2.2. For instance, properties with a home square footage of 8000, 11 parking spots, and 8 bedrooms, to properties with 4 common walls, or a home that was built in 1908. These diversities point towards the dataset not being sufficiently specific or segmented when considering the mean and median of these variables. Comparing a condo in apartment buildings (lower end of the market) to huge villas (upper end of the market), could logically lead to irrelevant outcomes. Also, in listing presentations agents usually only refer to properties with comparable attributes, as mentioned in section 1.1.5. From a practical perspective, it would therefore not be legitimate to include properties in the dataset that would never be used as each other’s comparatives. Hence, it is necessary to distinguish between these segments of properties. In table 3.2 below, the more explicit specifications are displayed of the properties that will

Figure 3.5 Histogram of the residuals of OLS1 Histogram of residuals OLS1 DataImp

(36)

36 Table 3.2 Explicit specification of the targeted properties

be investigated. Any properties that do not meet these requirements, will be removed from the dataset. This is valid because the size of the dataset will still be sufficiently large to run the estimations.

After deleting the properties that did not meet the requirements of table 3.2, the new dataset contains 430 listings (observations) and will be called ‘DataOut’. This dataset now replaces dataset ‘DataImp’. The new distributions of the residuals for all three models are once more presented in plots and histograms enclosed in appendix G2. Both show that the intensity of the outliers has decreased. By eliminating the disturbing outliers, the distribution of the data better approaches normality. However, in order to assure that the data is normally distributed, Shapiro-Wilk normality test was performed on the studentized residuals of the OLS1, OLS2, and OLS3 model. The outcomes are displayed in table 3.3.

Model Score p-value OLS1 W = 0.98801 0.001326 OLS2 W = 0.99011 0.005448 OLS3 W = 0.95657 6.062e-10

This test suggests that the null hypothesis of the data being normally distributed, should be rejected at p<0.05 for all models. Since support has been found that the residuals of the models do not follow a normal distribution, a common follow-up procedure is bootstrapping. However, for this study the full population of the aimed area was used for estimation, so resampling is not a valid option. On the other

Coefficients Min Max Coefficients Min Max

HSQF 700 6000 Parking 0 6

LSQF >1000 8000 PoolP 0 1

OView 0 1 PoolC 0 1

DisOc (set by mapping area) Comwal 0 2

Unit 0 1 Fire 0 1

Age 0 80 Heat 0 1

Bed 1 5 Cool 0 1

Bfull 0 5 DisNo (set by mapping area)

B3qua 0 5 Roofdeck 0 1

Bhalf 0 5 BalTer 0 1

B1qua 0 5 PatYar 0 1

Level 1 5

(37)

37

hand, because the sample is sufficiently large (N = 430) and the same size as the population, the data will nevertheless be treated as having a normal distribution.

3.3 Endogeneity

In order to determine whether the first stage of the model has any uncovered effects, a preliminary check for endogeneity is necessary. Endogeneity can occur for instance when there is simultaneity, which means that the dependent variable and one or more of the independent variables are determined by the same (market) phenomenon. This results in the independent variable(s) being correlated to the residual (ε).

Model Dep. Var. Stud. Res. Indep. Var. Pearson (cor) p-value (cor) Correlation assumed (H0 rej.) OLS1 LP ε HSQF 0.0398 0.41 No OLS1 LP ε LSQF 0.0069 0.89 No OLS1 LP ε OView -0.0007 0.99 No OLS1 LP ε DisOc 0.0683 0.16 No OLS1 LP ε Unit 0.0026 0.96 No OLS1 LP ε Age 0.0028 0.95 No OLS1 LP ε Bed -0.0012 0.98 No OLS1 LP ε Bfull 0.0016 0.97 No OLS1 LP ε B3qua -0.0059 0.90 No OLS1 LP ε Bhalf -0.0008 0.99 No OLS1 LP ε B1qua 0.0055 0.91 No OLS1 LP ε Level -0.0011 0.98 No OLS1 LP ε Parking -0.0016 0.97 No OLS1 LP ε PoolP -0.0026 0.96 No OLS1 LP ε Comwal -0.0016 0.97 No OLS1 LP ε Fire 0.0001 0.99 No OLS1 LP ε Cool -0.0009 0.98 No OLS1 LP ε DisNo -0.0214 0.66 No OLS1 LP ε Roofdeck -0.0013 0.98 No OLS1 LP ε BalTer 0.0011 0.98 No OLS1 LP ε PatYar 0.0034 0.94 No OLS2 CP ε CTOM 0.0042 0.93 No OLS2 CP ε Photo -0.0005 0.99 No OLS2 CP ε Video -0.0020 0.97 No OLS3 CP ε LP -0.0002 0.99 No

(38)

38

Table 3.4 presents the correlations between all independent variables and the residual of the model. This is tested following the Pearson correlation method for which the residuals had to be

studentized. For all variables, the null hypothesis that no correlation and thus no endogeneity is present, cannot be rejected. One variable that presents slightly different values is Fire. The correlation coefficient is much smaller than the other coefficients and the p-value approaches 1. This is probably because a large number of properties have a fireplace. Similar coefficient effects were found for OView.

3.4 Multicollinearity

(39)

39

3.5 Heteroscedasticity

Applied to this study, heteroscedasticity means that the variance in the listing price increases as the size of one or more of the independent variables increases along with the volume of the price. When heteroscedasticity is present, residuals are too large and the assumed linear relation might in fact not be linear. For OLS models this means that the estimates are not biased but they are inefficient. The

Rainbow test indicates whether the linear fit of the model is adequate even if some underlying relationships are not linear. When the relationships are not linear the residual will increase and the presence of heteroscedasticity becomes more likely. Outcomes that show p-values <0.05 mean that the null hypothesis in which equal variances are assumed, should be rejected. Applying this test to all three

Coefficients VIF, OLS1 VIF, OLS2 VIF, OLS3 Inflated log(HSQF) 4.321692 4.430078 5.485505 FALSE log(LSQF) 1.162862 1.176305 1.225449 FALSE OView 1.320246 1.342878 1.344228 FALSE log(DisOc) 1.382275 1.389003 1.669518 FALSE Unit 2.909205 2.914407 3.193680 FALSE Age 2.593038 2.599223 2.605493 FALSE Bed 3.106650 3.127519 3.131089 FALSE Bfull 4.165304 4.188147 4.227512 FALSE B3qua 2.434650 2.442800 2.446000 FALSE Bhalf 1.389985 1.414525 1.416794 FALSE B1qua 1.047703 1.051860 1.052967 FALSE Level 2.410549 2.430716 2.459969 FALSE Parking 1.355254 1.364376 1.364394 FALSE PoolP 1.055536 1.058478 1.058923 FALSE Comwal 1.822942 1.841660 2.016921 FALSE Fire 1.152567 1.155556 1.156032 FALSE Cool 1.656461 1.664351 1.701893 FALSE log(DisNo) 1.144338 1.170762 1.213091 FALSE Roofdeck 2.222959 2.223416 2.245499 FALSE BalTer 1.292481 1.297393 1.312094 FALSE PatYar 2.588331 2.600643 2.609218 FALSE CTOM - 1.096773 1.108066 FALSE Photo - 1.179145 1.182537 FALSE Video - 1.143969 1.143995 FALSE LP - - 5.657448 FALSE

(40)

40 Table 3.7 Levene’s test OLS1, OLS2, OLS3

OLS models, shows that both OLS1 and OLS2 have signs of heteroscedasticity but OLS3 has not. The next question is which variables cause the heteroscedasticity.

More specifically, a test that compares the residuals of the model to the individual predictor variables is the Levene’s test (Levene, 1960). For this study the predictors of OLS1 are regressed on the OLS1 residuals, the additional predictors in OLS2 on the residuals of OLS2, and the same holds for OLS3. Results are displayed in table 3.7.

Model Rain DF1 DF2 p-value

OLS1 1.3255 215 193 0.02296

OLS2 1.4092 215 190 0.007849

OLS3 1.2188 215 189 0.08158

Model Coefficients F-value p-value

OLS1 HSQF 0.9282 0.7007

OLS1 LSQF 1.724 0.0002008

OLS1 OView 0.0135 0.9076

OLS1 DisOc 2.159 1e-07

OLS1 Unit 2.866 0.09118

OLS1 Age 2.321 2e-07

OLS1 Bed 3.291 0.01131 OLS1 Bfull 0.5513 0.7373 OLS1 B3qua 1.464 0.2121 OLS1 Bhalf 0.9891 0.3728 OLS1 B1qua 0.7813 0.4585 OLS1 Level 1.092 0.3602 OLS1 Parking 1.214 0.298 OLS1 PoolP 0.9029 0.3425 OLS1 Comwal 6.821 0.001213 OLS1 Fire 0.204 0.6517 OLS1 Cool 4.23 0.04033

OLS1 DisNo 1.855 5.2e-06

OLS1 Roofdeck 1.383 0.2402

OLS1 BalTer 0.03511 0.8514

OLS1 PatYar 0.2389 0.2389

OLS2 CTOM 1.669 9.72e-05

OLS2 Photo 1.286 0.1019

OLS2 Video 0.1258 0.723

OLS3 LP 2.003 6e-07

Referenties

GERELATEERDE DOCUMENTEN

The created BPMN models and regulative cycles in the papers of Bakker (2015), Peetsold (2015) and Kamps (2015) are used as input for the new design cycle to validate the

Overall, we found that among the models included in the mixed model, the random forest model gave the best median out-of-sample predictions for terrace houses and detached houses,

This article presents an index that includes this information, a Real Estate Market Index (“REMI”) that combines median sales price, volume (number of sales) and median days on

Therefore, considering the results presented above from antecedent studies and considering the supposition made based on the Social Identification Theory, there is a reason

Heaters are positioned above the buried waveguide and used to affect the effective refractive index of the waveguide (in the reference path) to compensate

This is the so-called voluntary Transparency Register and it was seen as an enhancement to transparency, because it made it possible for European citizens to

In addition, our theory shows how CP can explain several nontrivial current signatures in form of sharp spikes and dips observed (but unexplained) in molecular dynamics

In the paper, we demonstrated the generation of ultrasound fields at therapeutically relevant acoustic pressures and frequencies; compatibility of the devices with ultra high