• No results found

From statistical inequity to social inequality: the implementation of geographically weighted regression in automated valuation models for single family residential real estate in the Netherlands

N/A
N/A
Protected

Academic year: 2021

Share "From statistical inequity to social inequality: the implementation of geographically weighted regression in automated valuation models for single family residential real estate in the Netherlands"

Copied!
101
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

From statistical inequity to social inequality: the implementation of

geographically weighted regression in automated valuation models for

single family residential real estate in the Netherlands.

Bachelor Sociale Geografie en Planologie

University of Amsterdam

Luc Hermans (11016043)

8-March-2019

(2)

1 Mentor: dhr. dr. R.I.M. (Rowan) Arundel MSc

Date: 8-March-2019 Auteur: Luc Hermans Student number: 11016043 Address: Hondecoeterstraat 7 III 1071 LP, Amsterdam.

(3)

2

Contents

Glossary list………4 Abstract………...6 §1. Introduction………...7 §1.1. Research question………...9 §1.2. Societal impact……….10 §1.3. Academic impact………..11 §1.4. Reading guide………...11

§2. Theoretical framework and operationalisation………12

§2.1. The built environment characteristics and the influence on both regression types……..12

§2.1.1. Land values………...12

§2.1.2. Built environment characteristics………..14

§2.2. Automated valuation models………16

§2.3. Sampling………...17

§2.4. Dummy coding and including location in automated valuation models….…….17

§2.5. Multiple regression analyses and Geographically weighted regression………...19

§2.5.1. Multiple regression analyses……….19

§2.5.2. Geographically weighted regression……….20

§2.6. Model performance………..24 §2.6.1. Goodness of fit………..25 §2.6.2. Horizontal equity………...26 §2.6.3. Vertical equity………...27 §2.6.4. Indicator ranges……….27 §2.7. Impact………..28

§2.7.1. Housing wealth inequalities………..28

§2.7.2. Housing and welfare……….28

§2.7.3. Home ownership………...29

§2.8. Conceptual model……….30

§2.9. Hypotheses………...31

§3. Methodology, data and analyses……….32

§3.1. Research design………32 §3.2. Cases……….33 §3.2.1. Montferland………...34 §3.2.2. Haren (Groningen)……….34 §3.2.3. Hengelo………..35 §3.2.4. Nijmegen………...35 §3.3. Data gathering………..36

(4)

3

BOX 1- BAG corrupt data issues……….40

§3.5. Analyses………...41

BOX 2- Maximal N………...42

BOX 3- assessment based on property value usable floor area vs. assessment based on property volume………42

§3.5.1. Descriptive statistics………..43

§3.5.2. The MRA-based AVM………..44

§3.5.3. The GWR-based AVM………...47

§3.6 Model performance calculations………...48

§3.6.1. Median assessment to sales price ratio………..50

§3.6.2. COD………...50

§3.6.3. PRD………...52

§3.6.4. PRB………53

§4. Results……….55

§4.1. Modifiable areal unit problem………..56

§4.2.1. Montferland………...57

§4.2.2. Haren (Groningen)………...58

§4.2.3. Hengelo……….59

§4.2.4. Nijmegen………...59

§4.3. WOZ in policy on rent and lease hold………..61

§4.3.1. WOZ-value and rent in both the social and the private sector………..61

§4.3.2. WOZ-value and lease hold in Amsterdam………62

§5. Discussion………...63

§6. Conclusion………..65

Literature………..68

Appendix A. Memo………..72

Appendix B. Haren (Groningen)………..73

Appendix C. Hengelo………80

Appendix D. Nijmegen……….87

Appendix E. Syntax………..96

Appendix F. Model performance calculation in Excel……….99

(5)

4

Glossary list

This glossary list provides clarification for all abbreviations used in this thesis. All abbreviation are included however not all abbreviations are as important as others. Some explanations of abbreviations may slightly differ in text. When this occurs the definition in text are leading. It is highly

recommended to rip this glossary list out and keep it on hand while reading the thesis.

AICc: corrected Akaike Information Criterion

The AICc is an model performance indicator used by ArcMap to compare geographically weighted regression models. A lower AICc-score indicates a better performing model.

AVM: Automated Valuation Model

An automated valuation model (AVM) is a model to calculate a valuation based on a regression analyses. the model coefficients of the independent variables are multiplied by the independent variables of the property that needs to be assessed, the outcome of the sum of these multiplications forms the assessed value.

This method is commonly referred to as Hedonic Price Modelling in economic and econometric studies.

ASR: Assessment to Sales price Ratio

The assessment to sales price ratio is the calculated by dividing the assessed value by the sales price. This ratio forms the cornerstone on which the ratio studies on model performance are based.

CAMA: Computer Assisted Mass Appraisal

Computer assisted mass appraisal is a collective noun for all assessments assisted by computers. While definitions of CAMA differ this thesis uses it as a synonym to automated valuation models (see:AVM)

CBD: Central Business District

The central business district is commonly the central part of urban systems with the highest land values.

COD: Coefficient Of Dispersion

The coefficient of dispersion is a model performance indicator for horizontal equity of real estate assessment models. The calculations of the coefficient of dispersion are based on the ASR-calculations.

CV: Cross Validation

The Cross Validation is a model performance indicator to compare different models, ArcMap uses cross validation to decide which model is performing best.

GIS: Geographic Information System

A geographic information system is a computer application that allows to visualize, analyse and present spatial data. A geographic information system can be used as a research method.

GWR: Geographically Weighted Regression

Geographically weighted regression is a linear regression type that allows the consideration of distances between observation.

(6)

5

GWR-based AVM: Geographically Weighted Regression based Automated Valuation Model

See GWR and AVM

IAAO: International Association of Assessment Officers

IAAO is a non-profit, educational, and research association. It is a professional membership organization of government assessment officials and others interested in the administration of the property tax.

MAUP: Modifiable Areal Unit Problem

The modifiable areal unit problem occurs when point based data (such as transaction prices) gets aggregated in higher level polygons (such as neighbourhoods). The aggregation results in a loss of data through smoothing effects.

MPV: Model Property Value

The model property value is the outcome of an automated valuation model (see AVM). The model property value is a synonym to assessed value, however when model property value is used the assessed value is build up by the application of one of the models created in this thesis.

MRA: Multiple Regression Analyses

Multiple Regression Analyses is a linear and non-linear regression type.

MRA-based AVM:

See MRA and AVM

PRB: Price Related Bias

The price related bias is a model performance indicator to test for vertical equity. The price related bias test for differences in assessed value that occur when values double.

PRD: Price Range Differential

The price range differential is a model performance indicator that tests for vertical equity. The price range differential tests the relation between assessed and market value.

(Wet) WOZ: Wet Waardering Onroerende Zaken

Special Act for Real Estate Assessment in the Netherlands.

WOZ-value:

The WOZ-value is the assessed value in the Netherlands used as a tax base among other usages.

WWS: Woningwaardestelsel

The Woningwaardestelsel translates roughly to housing value system. The WWS is a point based system that limits the amount of rent that can be levied up on renters in the residence. Points are based on building characteristics including the WOZ-value.

(7)

6

Abstract

Since the annual appraisal of property is mandatory in The Netherlands the interest in Computer Assisted Mass Appraisal (CAMA) and automated valuation models (AVMs) has been rising. In the international field of appraisal new ways are explored to get these AVMs more precise and accurate. One of the more recent ways is the implementation of

geographically weighted regression (GWR) as opposed to the more traditional AVMs based on multiple regression analyses (MRA). Because location is very important for the value of real estate, valuation models based on GWR are expected to give more accurate results. The research on this combination of GIS and valuation models has mostly been conducted in the United States of America, Canada and Australia.

Although not being the first research on the combination conducted in The Netherlands this thesis fills the gap between literature and the practical implementation of GWR in the Dutch context. Based on the actual data of four municipalities this thesis shows that GWR can improve performance of valuation models. The improved performance is measured using the international accepted International association of assessment officers (IAAO) standard for testing valuation results (standard on ratio studies). However more research is needed to make GWR suitable for real implementation by municipalities for the formal annual assessment of residential real estate for the Act of Real Estate Assessment.

The thesis further addresses the social aspect of valuations on social inequalities with regards to housing and home ownership by looking at the impact of the WOZ-value on Dutch policy.

(8)

7

§1. Introduction:

In 1989 the Dutch government decided to change it property tax system. The change meant a swift from taxation based on the parcel ground to taxation based on the value of real estate in economic transactions (Bervoets et al. 2016). After the political process The Special Act for Real Estate Assessment, in Dutch known as the Wet Waardering Onroerende Zaken (Wet WOZ), became effective in 1995. A year after the act became effective Brunsdon et al. introduced geographically weighted regression to academic geographic literature (Wheeler and Páez 2010).

The act of 1995 determined the valuation of real estate had to be done every four years. A five year transition period was put between both tax systems and the first round of formal

assessments was presented to tax payers in 1997. With the four year cycle this meant that the second round would take place in 2001 with the valuation date set two years before at 1-1-1999 (Gieskes and Kathmann 2010). In this period valuations were mostly of manual

character. After 2008 however, assessment had to be done annually and there was only a one year period between valuation date and the formal assessment notice to tax payers. Appraising real estate manually is time consuming and with the transition to annual valuation also came the rise of mass appraisal methods (Computer Assisted Mass Appraisal (CAMA) and

automated valuation models (AVMs)) (Francke 2010; Matysiak 2017). In short an AVM is a computer driven valuation based on building characteristics. These models save time for assessors and this time can be used to assess extremely heterogeneous buildings which cannot be captured by models.

AVMs mostly rely on multiple regression analyses (MRA) based approach to explain the dependant variable through independent variables (Arentze & Stevens 2018). The

“traditional” MRA-models (MRA-based AVMs) were first used in 1922 and really came into practice after World War II. MRA is a linear and non-linear regression techniques, but both techniques leave little room for minor influence of location (Moore & Myers 2010). Based on the MRA a model can be build. That model can make a prediction of the property value by multiplying property characteristics with the model, a more in depth review of AVMs is included in the theoretical framework. Location as such is implemented through the use of a dummy variable based on postcodes or neighbourhoods. This dummy variable can take on a value of either 0 or 1 where 0 can be labelled “no” and 1 can be labelled “yes” for being in a neighbourhood or a postcode area (Cusack 2018). This locational dummy variable forms a locational adjustment for the AVM in doing so.

Valuation of property and appeals against these valuations are not limited to the Netherlands. The international association of assessing officers (IAAO) formulates standards and funds research to improve valuations all over the world. The IAAO is a great contributor to the body of research on AVMs. One of the more recent developments in the industry of assessment is the introduction and further application of Geographic Information Systems (GIS), resulting in an annual conference and multiple publications on the subject. Straightforward applications of GIS such as visualisation of data can help detecting error within the datasets. Visualisation can help with the communication towards the taxpayer ultimately resulting in less appeals being made (Cusack et al. 2018).

A more ambitious application of GIS in assessment of real estate is the combination of GIS and AVMs. The combination can have multiple slightly different implementations. Recent

(9)

8 research suggests that the implementation of geographic weighted regression can increase model performances (Bidanset et al. 2017). However, most of these studies were conducted in the United States and Canada (Moore & Myers 2010). Geographic weighted regression (GWR) is in its essence a form of regression which takes geographically closer subject into more consideration and thus introducing Tobler’s first law of geography in AVMs, which states that:

“everything is related to everything else, but near things are more related than distant things.” (Bidanset et al 2017; Tobler 1970, p.236).

The GWR uses coordinate points to calculate distance between observations, these distances play a role in the resulting model. The GWR will be explained more elaborate in the

theoretical framework.

Most research found in the literature on the implementation of geographically weighted regression in automated valuation models was conducted in the United States of America (USA), Canada and Australia. As cities differ from each other from country to country, it is necessary to understand those difference with regards to important parts of the regression types such as distances and building characteristics or from here the built environment characteristics. Alongside the tracks of implementing the geographically weighted regression in automated valuation models and the study on the built environment characteristics

emphasis is put on the social impact of improved assessment. As Piketty (2014) showed the impact of capital on inequalities is huge and as his critics then further emphasized, the role of housing should not be diminished. As housing and housing wealth are important influences on society and inequality in societies a thorough understanding of this relation is needed to understand the influence of this thesis. Through neerslag (cause) and weerslag (impact) urban planning and urban geography influence each other continuously, to structure this, the effect of the built environment characteristics on automated valuation models is addressed first, thereafter the implementation is explained and finally the influence of housing values and home ownership on society is emphasized.

Furthermore it is beyond scope of this thesis to address all societal issues that are linked with housing values. The focus of the thesis is to improve the connection between assessed value and market value through methodological improvement in Dutch municipalities, the

methodology will thus be the focus and will be thoroughly explained throughout the thesis. It is because of this focus that the theory on the methodology is in the theory section and not in the methodology section.

Although this thesis is conducted in four different municipalities the municipality of

Montferland is taken as example, the others follow the same methodology and their raw data can be found in the appendixes. The thesis only takes single family residential real estate into account because of the fact that this kind of properties cannot be on the same location with regards to x and y coordinates.

(10)

9

§1.1. Research question

These developments leads to the following research question which will be the focus of this thesis.

To what extent can implementing geographic weighted regression (GWR) improve the automated valuation model accuracy and precision performances for assessing single family residential real estate in the Netherlands, how is this differentiated by built environment characteristics and how does this influence housing inequity in Dutch society?

This question will be broken down into five sub-questions.

- What are the impact of the built environment characteristics and administrative boundaries on model performances in the Netherlands?

- Does the application of geographic weighted regression increase the automated valuation models goodness of fit for residential real estate in the sample of Dutch municipalities?

- Does the application of geographic weighted regression increase the automated valuation models horizontal equity for residential real estate in the sample of Dutch municipalities?

- Does the application of geographic weighted regression increase the automated valuation models vertical equity for residential real estate in the sample of Dutch municipalities?

- What is the societal impact concerning housing inequity of improved model performance of automated valuation models in the Netherlands ?

(11)

10

§1.2. Societal impact

Even twenty years after the Spetial Act for Real Estate Assessment was initiated it remains a topic of discussion. Every once in a while discussion about the act and the quality of

assessments pops up in national media, most news about the Special Act for Real Estate Assessment gets to front pages when appeals can be made against the annual appraised value. Recently the Dutch broadcasting foundation (NOS) dedicated an article to the act in which was stated mistakes are being made during the process of assessment in multiple

municipalities (NOS.nl 2018). These errors can lead toward a valuation which does not reflect market value on the valuation date of the property. As a result, among other things such as an improving economy, appeals against the WOZ-valuations have been rising from 2015 and onward in absolute numbers and in percentages after years of decreasing numbers of appeals after the start of the act in 1995. Filing an appeal against an assessment even started to become a way to earn money for no-cure-no-pay companies. These companies earn money through appealing against the WOZ-valuation on behalf of the tax-payer through

compensations for the won trail and a compensation for the valuation report. In 2018 of all appeals 32.4% were made by no-cure-no-pay companies (Waarderingskamer.nl 2018). These companies have already been topic of questions in the Dutch house of representatives

(Tweede Kamer der Staten-Generaal 2018). A good understanding of valuation theory and communication can reduce these appeals. To indicate the importance and size of the WOZ it is good to know that the costs of appraising are roughly 150 million euros and the tax income with the WOZ-value as tax base is about 11 billion euros every year (Waarderingskamer.nl 2018).

The impact of the WOZ-value will be addressed more in depth further on in the thesis. In short, the WOZ-value can be said to act as the basic divider for who can own property and who what home owners can ask for rent on the houses they don’t use themselves. In recent years the WOZ-value gained importance and people can benefit from both high and low values. As these dual interests are rising everybody is concerned with a fair and equitable value which occurs when the assessed value matches market value on the valuation date.

(12)

11

§1.3. Academic impact

As stated, the research on the implementation of GWR in AVMs is mainly conducted in the USA. however research has been also conducted in Canada and Australia (Moore & Myers 2010). Besides from these studies only two other studies have been found which were conducted elsewhere, one in Portugal and one in The Netherlands, which is a masters dissertation in the field of Geographic Information and Management Applications (GIMA) (Bhattacharjee 2012; Oud 2017). The academic relevance of the thesis is thus found in the lack of research in the Dutch context or the European context at that.

One of the most relevant parts of the thesis is the combination of hedonic price modelling ,which is a common practice in economic studies, with valuation theory and social science which is a combination that gets overlooked often. As assessed value is a resemblance of market value when the assessed value is conducted in a good way, the assessed value can be used a defendable indicator for a wide range of spatial social phenomena. The WOZ-value, which is the Dutch assessed value, was used as an indicator by Boterman and van Gent (2013) while doing research on housing liberalisation and gentrification in Amsterdam. A better understanding of how assessed values do originate can therefore increase the defensibility of social science which use the assessed value as an indicator.

§1.4. Reading guide

The thesis can arguably be broken down into three main questions which together form the answer to the main research question presented earlier. The theory and results sections will therefore be divided into three different “tracks” which are ultimately combined and put together to one coherent conclusion at the end of this thesis. The first track will be answering the first sub-question of this thesis, the middle track will cover the answer on sub-questions two till four. Finally, the last track will answer the fifth and last sub-question of this thesis. To be absolutely clear, the thesis tracks cannot be read apart from each other as the tracks are highly intertwined and an understanding of the conclusion can only be achieved if all theory is understood correctly.

The thesis follows a normal structure starting with the theoretical framework followed by the methodology section. Thereafter, the analyses is explained by example of the municipality of Montferland. Subsequently, the results are presented after which a discussion is held on the thesis an finally the conclusions are drawn from the thesis.

Note:

While speaking of vertical and horizontal (in)equity in this thesis, statistical deviation from the market value in the testing set is meant. it is important that these (in)equities are not misunderstood as social (in)equalities.

(13)

12

§2. Theoretical framework and operationalisation:

Before starting the analysis which forms the contributing part of this thesis a few concepts need to be explained in depth. The goal of this theoretical framework is thus to build the foundation of necessary understanding of these concepts. The framework can be divided into seven concepts (next seven sections) which all build upon each other and finally result in a conceptual model in section 2.8 and three hypotheses in section 2.9.

§2.1. The

built environment characteristics and the influence on both regression types.

To understand the influence of the built environment characteristics on the implementation of geographically weighted regression in automated valuation a basic introduction of land use planning and economics is needed. Secondly the built environment characteristics is explained briefly, the focus of this paragraph will be the differences between the built

environment characteristics in the Netherlands and the United States, although research on the implementation of GWR in AVMs has been done outside the USA the bulk is done within the USA therefore this context is compared with the Dutch context. Subsequently a short

explanation between the built environment characteristics and distances between single family residential real estate is addressed. After looking at the the built environment characteristics and distances the differences between planning cultures and the influence of these cultures on building characteristics is briefly explained.

As this thesis is of economical character and mainly concerned with market value a brief introduction of market workings is logical. The explanation remains superficial as explaining how markets work would be a good thesis in the field of economics and the focus of this thesis is not to explain this but to improve the connection between market value and assessed value for single family residential real estate in the Netherlands through implementing geographically weighted regression in automated valuation models.

§2.1.1.

Land values

The idea of market value is based on the theory of the invisible hand and free market by Adam Smith. The theory states that everyone attempts to maximize their own economic profit and output (Evans 2004). The theory is made under the assumption that the market is

perfectly competitive and that there are enough buyers and sellers who are fully informed about the available alternatives, also the transactions affect only the buyer and the seller (Evans 2004). Only if all of these assumptions are met the market can result in a pareto optimum which is the state of the market when one cannot improve its economic prosperity without disproving the economic prosperity of someone else (Evans 2004). Chang (2014) states that the free market does not exist as every market is delineated by laws and intervened through policies . The market of real estate is not different in that notion. Laws and

interventions are made by governments to keep the market sound one of the earliest examples in the Netherlands is the woningwet of 1901 which extended the political position of housing corporations in the Netherlands (Heerikhuizen and Wilterdink 2012).

The value of land heavily influences the activities that take place on a given location. This topic has been the focus of both economists and geographers for over a century with one of the greatest contributors on this topic Von Thünen publishing his The Isolated State in 1826. Von Thünens work contributes on the connection between land values and economic

(14)

13 activities with its focus on agriculture (Von Thünen and Hall 1966). The basic notion that land values reduce over distance from the central business district (CBD) can logically be assumed to influence the value of real estate (Danton and Himbert 2018). While land values may be highest near or within the CBD almost all cities in the western culture zone have developed in ,at least to some extent, multicentric urban regions (Knox and Pinch 2010). The assumed decrease of land value over distance from the CBD is an assumption that includes thinking about absolute distance into the economics of land value (Coe, Kelley and Young 2013). The theory of the bid rent curve as the theory based on von Thünens work is

commonly referred to is on the single urban centre scale. To create a more in depth

understanding the relationship between land use values and distances the scale of the region has to be addressed.

(15)

14 Image 1 shows the abstract elaboration of Walter Christaller his central place theory. The central place theory states that there is a hierarchy between cities in a region (King 1985). Cities that are placed higher offer more specialist goods and service such as an opera house, these specialist goods and services may in turn lead to higher land values from the CBD of the higher placed cities. These services and goods are referred to as neighbourhood effects and can be a driver for land values within cities and municipalities (Tordoir 2015). While positive neighbourhood effects exist and may drive the land values up, there are also negative

neighbourhood effects such as pollution which might drive the land values down (Evans 2004; Tordoir 2015). Cities that are placed higher in the hierarchy are often more polycentric as opposed to smaller and lower placed cities within the hierarchy (Knox and Pinch 2010). Cities that are placed higher in the hierarchy are by assumption denser with regards to population and postcodes. This density in combination with smaller postcode areas reduces the heterogeneity within the postcode area by assumption.

Furthermore, both theories that are addressed above are abstract and explain distance through absolute numbers. The implementation of travel distance or relative distance from example given the CBD might explain the differences in land values better (Coe, Kelley and Young 2013; Cusack et al 2018). However to address the impact of relative distance on land values would be a fine thesis on its own. As this thesis focusses on the implementation of GWR in AVMs land values will not be explained further for complexity and volume reasons. With the strategy of hedonic pricing the influence of land values and the central place theory will be included as the building characteristics in higher ranked municipalities and closer to the CBD are by assumption higher with regards to the coefficients and reduce over distance both relative and absolute.

§2.1.2.

Built environment characteristics

To understand the differences between distances in both cultures, emphasis on the built environment characteristics is needed. Newman, Kosonen and Kenworthy (2016) distinguish three different kinds of urban fabrics: the walking city fabric, the transport/ public transport city fabric and the automobile city fabric. While Newman, Kosonen and Kenworthy

distinguish three different urban fabrics Muller (2004) breaks down the automobile urban fabric into two different urban fabrics namely the recreational automobile era and the freeway era. The urban fabric of cities is influenced by the era in which their growth took place. Through the different times of growth European cities tend to be denser as opposed to American cities.

Apart from the influence of growth periods on urban form, a more cultural difference plays a role. Cities in Europe tend to be more centralised and have a more tight structure. The

geographically weighted regression as implemented in this thesis includes absolute distances, the implementation of relative distance could be a further improvement in the methodology on assessing real estate, but beyond scope of this thesis. Furthermore, the central city remains more dominating in comparison with cities in the USA, Canada and Australia (Muller 2004). This means that the influence land value resembles more to the basic theory of von Thünen (von Thünen and Hall 1966). Although European regions have multi-centralized in the last decades.

(16)

15 As both distance and building characteristics play an important role in the calculations of the GWR and MRA it is necessary to keep these differences in mind while interpreting the results.

Thinking in broader terms the uniformity in building characteristics are result of either homogeneous or heterogeneous urban planning. The different cultures in urban planning thus play a role in the model performance of both the MRA and the GWR. In the Netherlands the building of Single family residential property in the twentieth century has been mostly plan based with development carried out by housing corporations (Van Der Cammen and De Klerk 2012; Heerikhuizen and Wilterdink 2012) . This culture of planning lots of houses at once all similar to each other is in great contrast with the development of single family residential property in the USA, Canada and Australia. In the new world planning is done mostly through zoning. With less restrictions people tend to self-build their houses resulting in a more

heterogeneous form of urban planning and thus more heterogeneous building characteristics, this was especially the case in earlier years (Hall 2014).

Building characteristics and postcode areas are important parts of the differences in regression types further on in this thesis. To understand the hypotheses that follows border errors need to be introduced. A border error occurs when an observation is not in line with the rest of the observations within the postcode area. These border errors do not occur, or are less influential in the geographically weighted regression as the area in which the observations are compared is fluent and alters between observations (Cusack et al. 2018).

(17)

16

§2.2. Automated valuations models

Within the academic world, the definition of automated valuation models (AVMs) differ. In line with choices further on, this thesis will apply the definition given by the International association of assessing officers (IAAO). For this thesis we consider AVM and CAMA-model (Computer Assisted Massa Appraisal) as synonyms.

The definition of an automated valuation model (AVM) are not unanimous in the literature. Before giving an overview of different definitions it is necessary to clarify hedonic price modelling. Hedonic price modelling is basically the explanation of a price, the dependant variable, by different attributes, the independent variables (Kauko and d’Amato 2008). This down breaking of the price is reversed within an AVM, because in the AVM it is not the breaking down that matters but the estimation which is in turn based on an earlier breaking down of prices.

As stated above, the definitions in the literature are not unanimous. Three of the most important definitions will be given in this short overview and the definition which is used in this thesis will be chosen at the end of that overview.

To start of a very abstract definition is presented by Kauko and d’Amato (2008). They state that an AVM is “Automated, often computerized, procedure for carrying out the task of valuating one or more properties.” (Kauko and d’Amato 2008, p.321). This definition is one of the most abstract definitions used in the literature as the only real restriction it proposes is the automated procedure.

Secondly Bidanset states that an AVM is a computing model that estimates the value of a property through the use of mathematical equations (Bidanset 2014). While Kauko and d’Amato leave much space in their definition, Bidanset restricts AVMs to the use of mathematical equations in the estimation of the value.

Finally the IAAO states:

“An automated valuation model (AVM) is a mathematical based computer software program that produces an estimate of market value based on market analysis of location, market conditions, and real estate characteristics from information that was previously and

separately collected. The distinguishing feature of an AVM is that it is an estimate of market value produced through mathematical modelling. Credibility of an AVM is dependent on the data used and the skills of the modeller producing the AVM.” (IAAO 2003)

While the two earlier mentioned definitions are somewhat broad in terms of qualifying as an AVM, the IAAO definition gives a few extra restrictions. Most importantly the IAAO definition prescribes the usage of location. Because of the more inclusive and clear boundaries set by the IAAO definition, this definition will be used in the thesis.

(18)

17

§2.3. Sampling

As overfitting, which is the state of a model in which it captures future values too good and thus indicates messing with the model, forms a great danger while creating AVMs, the data needs to be divided into a training and a testing sample (IAAO 2013). Overfitting might be a result of sales price chasing which occurs when values resulting from the model are changed separately after new information is available on unique sales (IAAO 2013). While not one ratio study is the same as the other a basic divider is found in transaction date in the data used for this thesis. The training sample or in sample dataset is formed by usable transactions (as coded by the municipalities within the datasets) that have taken place between January 1st 2017 and the 1st of July of 2018. The testing sample or out of sample dataset is in turn formed by usable transactions that have taken place after the 1st of July 2018. This type of forecasting is common practice in applied economics and econometric fields of research (I.E. Balcilar et al. 2015; Aye et al. 2013). A visual representation of this timeline can be found in Figure 1.

01-01-2017 01-07-2018 present day Figure 1. Visual representation of the sampling sets.

§2.4. Dummy coding and including location in automated valuation models

While dummy coding is a basic statistical method it can be said that it does not need this much attention, however while the implementation of location in automated valuation models is the core of this thesis dummy coding is addressed in depth.

Dummy coding is the most common way to include categorical predictors with more than two categories in linear models. This is done by creating new variables for each category with the values 0 and 1. The value 0 indicating not being part of the category and the value 1

indicating being part of the category. The amount of new variables is always 1 category less than the amount of categories that are being included in the model, the group that is not coded into a dummy variable acting as the reference category. When adding the dummy variables in the model each original predictor must be placed in the same block. When a category has less than ten observations the dummy variable will not be created as this might heavily influence the model (Field 2018).

In this thesis three categorical predictors are recoded into dummy variables. The first

categorical predictor which will be added in block one is the categorical variable for property object type. This variable has four categories, therefore three dummy variables are created. The reference category will be row house as this is the most common form of building type in The Netherlands (BZK 2016).

(19)

18 The second categorical predictor which is included in block two the model is the variable for year of construction. Earlier the years of construction have been regrouped in eight categories. Seven dummy variables are created, the group for construction years between 1971 and 1980 forms the reference category as this is the most common category of construction years (BZK 2016).

The third block and last categorical predictor is postcodes. This category includes location in the MRA-based AVM. The amount of dummy variables that are created differs between municipalities. If all postcodes have more than ten observations the amount of dummy variables which are created is one less than the amount of different postcodes in the

municipality. Postcode areas with less than ten observations will not be recoded into a dummy variable and thus they will be included in the reference category.

After running the MRA an automated valuation model is created using the beta values (b-values) from the MRA. The newly created AVM is then applied to both the in-sample and out of sample datasets after which the model performance can be carried out based on the

assessment to sales price ratio.

In the GWR-type model all variables except for the postcodes are used. Postcodes are left out because the location part of the AVM is already taken into account through the form of regression, which uses coordinates.

(20)

19

§2.5. Multiple regression analyses and Geographically weighted regression

The difference between regular multiple regression analysis (MRA) and geographically weighted regression (GWR) is best explained showing both formulas. However to build a more thorough understanding of the GWR an explanation of bandwidths and kernels need to be included. First, the formulas of both the “Traditional” MRA and GWR will be given and explained using a fictitious city. Once the basic explanation of both regression types have been given bandwidths and kernels will be explained further, but only superficially as this is complex material and beyond scope of this thesis.

§2.5.1. Multiple regression analyses

Moore and Myers state that current AVMs are most commonly based on linear and non-linear multiple regression (MRA) with the location aspect found in the definition of an AVM above being implemented as a dummy variable (Moore & Myers 2010). This “traditional” approach is being used since 1922 but has truly taken flight after World War II due to increased

availability of computing power (Moore and Myers 2010). Dutch AVMs also take the MRA-approach and handle location with dummy variables (Arentze & Sanders 2018). These models based on the “Traditional” MRA still qualify as an automated valuation model following the definition of the IAAO as stated in section 2.2.

The MRA formula:

𝑦𝑖 = 𝛽0+ ∑ 𝛽 𝑘 𝑘

𝑥𝑖𝑘 + 𝜀𝑖

The operators in the formula represent: Yi = the i-th dependant variable

β0 = the model intercept βk = the k-th coefficient

Xik = is the k-th variable for the i-th dependant variable

εi = the error term of the i-th dependant variable (Bidanset 2014). The output of the MRA includes estimates of beta values. These beta values can be thought of as the change in outcome with a marginal change in the predictor (Field 2018). Multiplying the beta values with the values of the predictors (independent variables) and summing that outcome with the model intercept gives an estimation of the property value or from here, model property value (MPV).

The MRA-based AVM incorporates location through dummy variables. Using the coefficients produced by this regression a model can be built to assess the properties within the

municipality. Through the incorporation of the location dummy this assessment model qualifies as an AVM following the definition of the IAAO.

(21)

20

Image 2. MRA-based AVM in the fictitious city.

Within the fictitious city the MRA-based AVM looks like the abstract map shown in image 2. With all blue-points being in sample (training dataset) transactions and the red-points

indicating out of sample (testing dataset) transactions. The thin black lines indicate different postcode areas which are included via dummy variables. One of these postcode areas is considered the base or reference category. Three dummy variables are used to express that a transaction is in one of the other three postcode areas. The MRA produces an assessment model (MRA-based AVM) which produces a model property value (MPV) for each property. With these MPVs the model performance can be indicated. The model performance indicators in the standard on ratio studies are based on the assessment to sales price ratio (ASR), which is formed by dividing the assessed value (or model property value (MPV) in this thesis by the transaction price.

§2.5.2. Geographically weighted regression

With developments such as the increase in computing power and the increased accessibility of geographic information systems (GIS) it is only logical the implementation of geography (location) is implemented more precise within AVMs (Bidanset 2018; Cusack et al. 2018; Moore & Myers 2010).

Geographic weighted regression (GWR) is different regression technique which uses

coordinates which can be obtained using a GIS most commonly through geocoding. In theory the GWR takes effects of location better into account compared to MRA-based-AVMs as they are not vulnerable for boundary value problems which can occur when using dummy

variables (Moore & Myers 2010). The main difference between MRA and GWR is that MRA has one set of regression coefficients for the model as a whole and GWR has a different set of regression coefficients over different spatial windows (Moore & Myers 2010).

(22)

21

The GWR formula:

𝑦𝑖 = 𝛽𝑜(𝑥𝑖, 𝑦𝑖) + ∑ 𝛽𝑘(𝑥𝑖, 𝑦𝑖)𝑋𝑖𝑘+ 𝜀𝑖

The operators in the formula represent: Yi = the i-th dependant variable

β0 = the model intercept

(xiyi)= the x,y coordinates of the i-th regression point βk = the k-th coefficient

Xik = is the k-th variable for the i-th dependant variable

εi = the error term of the i-th dependant variable (Bidanset 2014). Before using the GWR the weights have to be defined that are used to calculate the

coefficients in the GWR. Transactions on a short distance get a high weight and transactions further away get a low weight. To calculate the exact weight used in the GWR both the bandwidths and kernels are important. The kernel applies weight to the regression. These weights make sure closer objects are considered more important than objects further away. While a vast number of kernels exist, only two will be explained in this thesis. The formulas for both the Gaussian kernel and the Bi-square kernel are given and briefly explained. Gaussian weight kernel:

𝜔𝑖𝑗 = exp[−1/2(𝑑𝑖𝑗/𝑏)2] Bi-square weight

𝜔𝑖𝑗 = [1 − (𝑑𝑖𝑗 𝑏)

2]2 if dij<b if not ωij=0

The operators in the formula represent:

𝜔𝑖𝑗= applied weight to the j-th property at regression point i b= bandwidth

dij=geographic distance between regression point i and regression point j (Bidanset 2014). Both kernels apply weight to the different regression points. The most important difference is that the Gaussian weight kernel never results in 0 and the Bi-square weight kernels allows a 0 outcome resulting in no weight for observations past the borders of the bandwidth.

(23)

22

Image 3. GWR in the fictitious city, the dependant observation.

To show the working of the GWR the fictitious city will once again be used as example. The point indicated by the black arrow in image 3 is the dependant point on which the GWR is applied.

(24)

23 The thin black circle in image 4 indicates the bandwidth and can be thought of as a dome in which hangs an upside down funnel being the kernel. The narrow opening of the funnel is placed directly above the previously indicated point applying most weight to the most nearby points. As the distance increases the weight reduces and resembles the form of a funnel.

Image 5.. An abstract visualisation of a Gaussian kernel. (Source: McCaffrey 2014).

Kernels can either be fixed or adaptive. A fixed kernel means the bandwidth is based on the ideal distance. An adaptive kernel finds the best size of the bandwidth by looking for an ideal number of neighbours (Bidanset 2014).

For this thesis ArcMap is used. ArcMap allows the use of an adaptive kernel and finds the optimal bandwidth itself using scores of either corrected Akaike Information Criterion (AICc) scores and Cross Validation (CV) scores (desktop.arcgis.com, 2018). Both of these values are model performance indicators. In short AICc is a model performance indicator to compare between regressions the goodness of fit, a lower AICc-score indicating a better fit

(desktop.arcgis.com 2018). Cross Validation also is a comparing method for goodness of fit commonly used in model studies. This thesis will apply default settings in the GWR being a fixed kernel and a bandwidth optimization based on AICc-scores, (both indicators are not further explained as this might lead to unnecessary complexity of the thesis).

The main difference between the MRA and the GWR is that the GWR produces a different set of coefficients for each dependant variable. ArcMap produces a predicted value for each transaction which is comparable to the model property value in the TAVM. The predicted values can again be compared with the observed transaction prices, allowing the calculations of the ASRs, which in turn allow the calculations of the model performance indicators.

(25)

24

§2.6. Model performance

To capture the concept of model performance three topics need to be addressed.

These three topics are introduced and explained in the next sections alongside with their indicators. The indicators outlined below are all drawn from the IAAO’s Standard on ratio studies. This choice is made due to the fact this standard and these measurements defined in this standard are the only quantitative measurements on AVM used worldwide. Proving these measurements right is beyond the scope of this thesis. Furthermore there is enough practical proof these measurements can be used to look at model performances (I. E. Bidanset et al 2017).

Model performance can be explained by the concepts of accuracy and precision. In the ideal situation, a model is both accurate and precise. Most models however are not ideal, the next best option is a precise but less accurate model. For accuracy the concept of goodness of fit plays a role. For precision both horizontal and vertical equity need to be explained in depth.

(26)

25

§2.6.1. Goodness of fit

Following the IAAO standard on ratio studies goodness of fit will be measured by central tendency. Central tendency measures market level, an indicator for goodness of fit for the model. Measurements for central tendency are

-Median assessment to sales price ratio -Mean assessment to sales price ratio

-Weighted mean assessment to sales price ratio

First off, the difference between property value and the model property value is that the first is a given through transactions, the latter is the estimation of the value based on the models build in this thesis. The assessment to sales price ratio is the MPV divided by the transaction price observed. The median assessment to sales price ratio is the middle assessment to sales price ratio when all assessment to sales price ratios are sorted from high to low. The median assessment to sales price ratio as a measure is not affected by outliers. The mean assessment to sales price ratio is calculated by dividing the sum of all assessment to sales price ratios by the total count of assessment to sales price ratios. When the mean assessment to sales price ratio equals the median assessment to sales price ratio the assessment to sales price ratios are normally distributed. A higher mean assessment to sales price ratio means there is skewness to the right. The mean assessment to sales price ratio is affected more by outliers than the median assessment to sales price ratio. The weighted mean assessment to sales price ratio is measured by dividing the mean total assessed value by the mean transaction price. This

measurement is also used in calculating the price range differential (PRD). Due to the fact that median assessment to sales price ratio assesses every assessment to sales price ratio equal and is not hugely affected by outliers, the median assessment to sales price ratio is the preferred measure of central tendency for evaluating appraisal performance and this will be the indicator for model performance in this thesis (IAAO 2013).

(27)

26

§2.6.2. Horizontal equity

Horizontal inequity occurs when similar properties are assessed differently while their market value is equal (Sirmans and Gatzlaff 2008; Moore & Myers 2010). When these assessed values are used for taxation, the result of this is that some house owners pay more in terms of property tax while others pay less while having almost identical houses considering the market value of the houses. An indicator for horizontal equity is the coefficient of dispersion (COD). The coefficient of dispersion measures the average deviation from the median

assessment to sales price ratio. Where assessment to sales price ratio is the assessed value divided by the market value (Sirmans and Gatzlaff 2008; IAAO 2013).

While Moore and Myers use effective measurements for both inequities more applicable measurements can be found in the IAAO’s standard on ratio studies. For measuring horizontal inequity the standard suggests the use of coefficient of dispersion (COD) which measures the average deviation from the median assessment to sales price ratio and which can be calculated following these steps:

1. subtract the median from each assessment to sales price ratio 2. take the absolute value of the calculated differences

3. sum the absolute differences

4. divide by the number of assessment to sales price ratios to obtain the average absolute deviation

5. divide by the median

(28)

27

§2.6.3. Vertical equity

The second form of inequity is vertical inequity. Again, vertical inequity is explained by the example of using the assessed value for taxation. Vertical inequity occurs when there are differences in effective tax rates between properties with different market values when values double (Sirmans and Gatzlaff 2008; Moore & Myers 2010).

Vertical inequity can take on two different forms. First, there is regressive vertical inequity where residents of lower-valued properties are assessed relatively higher than higher-priced properties when values double (Sirmans and Gatzlaff 2008). Progressivity is the same situation but the other way around. There are two measurements for vertical inequity, the price range differential (PRD) and the price related bias (PRB). Both measurements and their calculations will be explained in the paragraph on methodology. As for their values: a PRD-value below 1.00 suggests progressivity and a PRD-PRD-value above 1.00 suggests regressivity (Gloudemans 2011). As for PRB-values, a PRB-value of -0.05 indicates a fall of 5% when values double and thus shows regressivity, a PRB-value of 0.05 suggests a rise of 5% when values double and thus shows progressivity (Denne 2015; Gloudemans 2011).

For measuring vertical inequity these two measurements need to be conducted following the standard provided by the IAAO. First of all the price range differential (PRD) measures vertical inequity. The PRD is calculated by dividing the mean assessment to sales price ratio by the weighted mean assessment to sales price ratio. One important notion is that the PRD is greatly influenced by outliers, therefor a scatter plot of assessment to sales price ratios versus valuations can be helpful (IAAO 2013). The second measurement suggested by the standard is the coefficient of price-related bias (PRB). “The PRB is obtained by regressing percentage difference from the median assessment to sales price ratio on percentage differences in value” (IAAO 2013, p.29).

§2.6.4. Indicator ranges

The IAAO has developed ranges for the indicators explained in the previous sections. The scores are set in a way that scoring under the ranges accepted by the IAAO might indicate sales price chasing and scores above the range are not acceptable due to too much variation from the market value.

Following the standard COD ranges should be between 5.0 and 10.0 for very large

municipalities (large number of transactions to be analysed), 5.0-15.0 for large and mid-sized municipalities and between 5.0 and 20.0 for small or rural municipalities. The PRD’s should be between 0.98 and 1.03 for vertical equity and the PRB should be between -0.05 and 0.05 (IAAO 2013, p.34). Values that are out of these bandwidths indicate unacceptable model performances, in other words the model is not precise enough (IAAO 2013).

(29)

28

§2.7.

Impact

As the previous section focused on the neerslag of urban planning this section will focus on the weerslag or impact with regards to society. The first part of this section aims to give an outline of important societal phenomena related to housing and housing prices in a broad perspective. The second part of this section is about the influence of the WOZ-value on rent ,in both the social and private sector, and its influence on Amsterdam’s system on lease hold. The last part of this section is about the meaning of a better representation of market value in the WOZ-value in both society and the academic world.

§2.7.1.

Housing wealth inequality

In the aftermath of Thomas Piketty’s (2014) Capital in the twenty-first century critiques started growing on his operationalization of capital. His captivation of capital includes home ownership only as a value and not as a means of production, this means Piketty sees a change in capital through housing only as a fluctuation in house value and leaves no room for the influence of rent on capital (Galbraith 2014; Piketty 2014). But while Galbraith makes a good point, the influence of real estate as a store of wealth cannot be diminished as it might be inherited over generations this combined with the access of home ownership puts housing right in the middle of the debate on inequality (Arundel 2017; Galbraith 2014). The debate on the relationship between poverty and housing was further amplified with Matthew Desmond’s (2016) heart breaking Evicted telling the stories of people who cannot achieve a steady living or home environment. Desmond (2016) tells the story of extreme poverty and housing

practices that are unthinkable in the Netherlands it does however put an emphasis on the social problems related to not having the security of a home. While Desmond draws a great picture of the American context the British context is captured well by the British Academic Film Award (BAFTA) award winning documentary Evicted: The hidden homeless (2006). The Netherlands has so called huurbescherming which translates into renters protection (Hochstenbach 2018). This renters protection provides protection from practices as displayed in Desmond’s book and the documentary mentioned above for renters in the Netherlands as they cannot be forced out of their homes without a decent reason (Hochstenbach 2018). Thus while the extremes do not, or should not, exist in the Netherlands there are important social phenomena related to home ownership which will be discussed below. But first a short introduction on home ownership and labor market in the past decades is given.

§2.7.2.

Housing and welfare

With an ageing population in most developed countries pressure on traditional welfare provision has increased, this combined with neo-liberalization further increased the primacy of housing as a more complex phenomenon than just physical shelter in terms of economic security (Doling and Ronald 2010). Buying a home and paying off the mortgage is a way to get by on a smaller pension and can act as a government pushed way of household saving, one might even call this a “pension in stone” (Doling and Ronald 2010). There are however a few problems with the asset-based welfare state as it assumes that people are rational economic agents. Secondly housing wealth is not as easily liquidized as other financial products and thirdly it rests on the assumption that housing values are ever increasing (Arundel 2017; Doling and Ronald 2010). The asset based welfare system further increases the divide between people who can access home ownership and people who cannot access home

(30)

29 ownership, ultimately resulting in a class divide between renters and “rentiers” (Aalbers and Christophers 2014; Arundel 2017; Doling and Ronald 2010).

§2.7.3.

Home ownership

Home ownership has not been the most popular tenure throughout history, up to World War II home ownership was not as widespread is it today (Arundel and Doling 2017). However after 1945 a new implementation of Keynesianism has become more influential in society. This new form called private Keynesianism places the debt not into the hands of the government but into the hands of individuals through mortgages used to buy homes (Aalbers and Christophers 2014; Arundel and Doling 2017). This private Keynesianism led to the

popularization of home ownership as tenure and the accessibility of home ownership for even the households with less income. Private Keynesianism thus resulted in a growth in debts among Dutch households in the period between 1985 and 2001 (van der Schaar 2006, p.296) In the years before the global financial crisis of 2007 more and more households with a marginal income gained access to a loan to buy a house, although these loans were in hindsight too risky to bear for these households. As a result of the global financial crisis unemployment rose and many people with subprime mortgages lost their homes. This resulted in a slight decline in home ownership in the European Union. In the years after the crisis access to home ownership has worsened further both between low and high incomes and generations (Arundel and Doling 2017). Furthermore after the global financial crisis of 2007 it became harder to get access to a mortgage without a stable and, at least to some extent, well-paying job. This resulted in social phenomena such as “boomerang kids” returning to their elderly homes and the emergence of a “generation rent”(Arundel; 2017; Arundel and Doling 2017). But while sub-prime loans were identified as a cause of the financial crisis of 2007 governments have not altered their promotion of home ownership (Dewilde and Ronald 2017).

(31)

30

§2.8. Conceptual model

Following this theoretical framework a conceptual model can be build which visualises the concept of the research in this thesis. At first the models (MRA-based AVM and GWR-based AVM) are build using the same dataset and the models are assessed on the three indicators of model performance. Thereafter the model performances of both models are compared with each other.

Figure 2. Conceptual model.

Table 1. Operationalization table.

Concepts Dimensions Variables

Model performance MRA-based AVM and GWR-based AVM

Horizontal equity Coefficient of dispersion Goodness of fit Median assessment to sales

price ratio

Vertical equity Price-related differential (PRD)

Vertical equity Price-related bias (PRB) Model performance MRA-based AVM Horizontal equity Vertical equity Goodness of fit Model performance GWR-based AVM Vertical equity Goodness of fit Horizontal equity

(32)

31

§2.9. Hypotheses

By interpreting the theory section above three hypotheses can be formed.

Appling the theory above to the both forms of regression a hypotheses can be stated. The basic difference between the two types of regression is how they handle location. The traditional MRA handles location based on postcodes and the geographically weighted regression handles location through the use of coordinates and thus absolute distance. With this in mind it is good to think about how the postcode areas are filled. Municipalities with a higher population per square kilometer tend to have more postcodes. The influence of housing corporations in these bigger municipalities is assumed to be higher resulting in more

homogenous building characteristics within these postcode areas. The added value of geographically weighted regression might become meaningless due to the possible good capturing of the location aspect by postcode in the traditional MRA. The hypotheses on the first sub-question thus is:

Hypothesis 1:

A more uniform municipality with regards to the built environment characteristics across postcode areas can be assessed better by a MRA-based AVM, ultimately making the added value of the GWR become meaningless.

Hypothesis 2:

The denser the municipality is with regards to population and postcodes the better the MRA-based AVM captures market value. Thus making the added value of the geographically weighted regression become meaningless. Reversed this hypotheses states that the

heterogeneous nature of less dense municipalities is better captured by the geographically weighted regression as this is not influenced by border errors of postcodes areas aggregation.

Hypothesis 3:

An improved connection between assessed value and market value results in a better assessed value. Because assessed value plays a enabling and controlling role in government policies a better connection between assessed value and market value leads toward less unjust outcomes of policy based on assessed value.

(33)

32

§3. Methodology, data and analyses:

To carry out this research, a few steps need to be taken. This section is therefore divided in a few different parts. First of all a clear explanation of the research design will be presented. Thereafter the cases where research is carried out are introduced briefly. After the

introduction of the cases the data gathering is explained. Following the gathering of data the data needs to be prepared to be used in different applications, this will be highlighted in the next subsection. To illustrate the data preparation and the analyses thereafter the municipality of Montferland is used as example. A fictitious set of transaction prices and model property values is used as an example for the model performance calculations due to privacy reasons. The raw data used for analyses of the other cases can be found in the appendixes in the following order:

Appendix B: Haren (Groningen) Appendix C: Hengelo

Appendix D: Nijmegen

A reading guide for the appendixes can be found in appendix G.

§3.1. Research design

The thesis will be quantitative and deductive in character due to its data driven test of a theory (Bryman 2012). The approach is verification of the proposed hypotheses that GWR does increase model performances of automated valuation models (AVMs) for residential real estate assessment in the Netherlands. Due to the fact that municipalities are responsible for carrying out the valuation for The Special Act for Real Estate assessment, they form the thesis research units.

The thesis set out to be of cross-sectional research design. Analysing the models for a

relatively large number of municipalities at one point in time provides a decent basis towards generalizing the conclusion of this research for the Netherlands as a whole. However, due to time limitations the amount of municipalities is not higher than four, in addition extra effort has been put into creating tools which help to speed up the analyses in the future. This can either be helpful for further research or to understand the working of the programs used. The selected municipalities are sampled from the circa 100 municipalities for which the data was available at the Council for Real Estate assessment (Appendix A).

For creating both an MRA-based AVM and a GWR-based AVM within a municipality two kinds of variables are needed, first the independent variables such as object characteristics are needed. And second the dependant variable transaction prices are needed. The data collection will be done in combination with the Netherlands Council for Real Estate Assessment. The data for about 100 municipalities are available for this research. The cases were selected at random from these 100 municipalities, but municipalities of different size are selected.

(34)

33

§3.2. Cases

The municipalities which form the cases in the thesis are displayed in image 5. The situation has changed after 1-1-2019 because the municipality of Haren has merged with the

municipality of Groningen. A brief introduction of the four cases will be given in the next four subsections.

Image 5. Location of analysed municipalities within The Netherlands. Source: BAG and BRT

(35)

34

§3.2.1. Montferland

The municipality of Montferland is situated in the province of Gelderland just a few kilometres east of the city of Nijmegen as shown in image 5. The municipality is multi-centric. The two biggest concentration of inhabitants can be found in Didam (13544 inhabitants) and ‘s Heerenberg (8208 inhabitants). To display this multi-centricity, the buildings are mapped within the municipal boundaries in image 6. A total of 35130 inhabitants live within

Montferland which are spread over a total area of 106.63 square kilometres resulting in a population density of 339 inhabitants per square kilometres on the 31st of May 2018 (CBS statline 2018). In the Dutch context the

municipality of Montferland can be described as rural.

§3.2.2. Haren (Groningen)

The municipality of Haren is situated just south of the city of Groningen. As of the 1st of January 2019 the municipality ceased to exist after merging with the municipality of Groningen. Haren has a slight polycentric character with Haren itself being the biggest centre. The

municipality as a whole had 19882 inhabitants by the 31st of May 2018 (CBS statline 2019). With an area of 50.7 square kilometres, this results in a population density of 392 inhabitants per square kilometres. In the Dutch context the municipality can be described as rural. The spreading of properties is visualised in image 7.

Image 6. Spreading of properties across Montferland.

Source: BAG Created by: writer.

Image 7. Spreading of properties across Haren.

Source: BAG Created by: writer.

(36)

35

§3.2.3. Hengelo

The municipality of Hengelo is situated in the east of the Netherlands in the region Twente. The municipality has a monocentric character with a bit of outskirt in the south-east as shown in image 8. Hengelo had 80626 inhabitants by the 31st of May 2018. With a surface of 61.83 square kilometres, this results in 1324 inhabitants per square kilometre (CBS statline 2019). Hengelo can be described as an urban municipality.

§3.2.4. Nijmegen

The municipality of Nijmegen is situated in the east of the Netherlands however somewhat more south compared to Hengelo. Nijmegen is the tenth city in the Netherlands and with that the biggest municipality with regards to inhabitants in this thesis. With 176162 inhabitants by the 31st of May 2018 and a

surface of 57.6 square kilometres, this results in a staggering 3285 inhabitants per square kilometre, almost ten times the population density of

Montferland (CBS statline 2019). The

distribution of properties is displayed in image 9.

Image 8. Spreading of properties across Hengelo.

Source: BAG Created by: writer.

Image 9. Spreading of properties across Nijmegen.

(37)

36

§3.3. Data gathering

Before starting the analysis the data needs to be gathered from different sources. To do the analysis two sets of data are needed. One set of data is needed with the characteristics of the houses and the second dataset is concerned with the geographic location of the houses. The first dataset is provided by the Netherlands council for real estate assessment (Appendix A). although this dataset contains the addresses of the properties, it does not include any further geographical information which is usable in a GIS. While geocoding could be an option for getting this data, another solution can be found within the Dutch system of “basisregistraties”. Linking the Basisregistratie WOZ with the Basisregistratie Adressen en Gebouwen (BAG) can provide the needed geometry. Linking both datasets is done by forming a new field in both registrations based on the postcode, house number, house letter and house number addition.

As stated above, the datasets from the base registration WOZ is provided by the Netherlands Council for Real Estate Assessment. The datasets for the base registration BAG is gathered using the BAG extract conversie tool from GEON a geo-information company based in the Netherlands. The application provides a way to access the raw data imported via INSPIRE. Furthermore, it provides a way to convert the data into a usable data format (shapefile). To minimize the amount of corrupt data within the BAG-dataset the options within the GEON application need to be altered so only the number points with the status ‘Naamgeving

uitgegeven” are included in the new formed shapefile. Even after adjusting the options some points may still be corrupt, BOX 1 explains this problem.

Referenties

GERELATEERDE DOCUMENTEN

An animacy bias was found in both the AB and the working memory task, providing evidence that the bias towards animate objects is not solely due to processing speed, but

However, at higher taper angles a dramatic decay in the jet pump pressure drop is observed, which serves as a starting point for the improvement of jet pump design criteria for

The next research question will therefore be examined: How are consumers’ attitudes with regard to corporate social irresponsible behaviour influenced by the valence

The rupture force for the second and third layers exhibits the same overall trend as that for the first layer, with an initial increase in force from a small addition of solvent, then

Unfortunately,  these  results  are  not  new:  limited  use  is  a  common  problem  in  PHR  evaluations  [27].  Several  recent  systematic  reviews  focusing 

A review of selected cases has revealed that courts have enforced executive policies giving effect to socio-economic rights based on the obligation imposed on government

The first talk of the day was presented by professor Simon Tavar´e, who is director of the Cancer Research UK Cambridge Institute and professor of Can- cer Research (Bioinformatics)

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of