• No results found

Predicting the real estate market in the Netherlands using Google trends

N/A
N/A
Protected

Academic year: 2021

Share "Predicting the real estate market in the Netherlands using Google trends"

Copied!
24
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Thesis

Name: Jaap Stolp

Student Number: 10688242

Specialization: Economics and Finance

Field: Finance, predicting the real estate market

Number of credits thesis: 12

Title of research proposal: Predicting the real estate market in the Netherlands using Google Trends.

(2)

2 Abstract

This paper studies the predictive power of Google Trends search queries on the change of house prices in the Netherlands. While most economic variables are made available with a lag, Google Trends data are published with high frequency. Especially during economic turbulent times predicting the real estate market becomes less accurate. The results show that the Google Trends index does improve the in-sample fit. However, after accounting for seasonal effects, the Google Trends index no longer has a significant effect on the change of the house price index.

Keywords: Google Trends, real estate market, house prices

Verklaring eigen werk

Hierbij verklaar ik, Jaap Stolp, dat ik deze scriptie zelf geschreven heb en dat ik de volledige verantwoordelijkheid op me neem voor de inhoud ervan.

Ik bevestig dat de tekst en het werk dat in deze scriptie gepresenteerd wordt origineel is en dat ik geen gebruik heb gemaakt van andere bronnen dan die welke in de tekst en in de referenties worden genoemd. De Faculteit Economie en Bedrijfskunde is alleen verantwoordelijk voor de begeleiding tot het inleveren van de scriptie, niet voor de inhoud.

(3)

3 Introduction

Since the real estate crisis of 2007 house prices have been unstable in the Netherlands. For a long time, forecasters thought that house prices were going to increase at a steady rate. However, from January 2009 to January 2014, house prices decreased due to the real estate crisis. At the time forecasters were not able to predict this. Especially in the early stage of the crisis, predicting house prices became less accurate using macroeconomic variables. This was caused by the fact that these variables were often made available to the public with a lag, which could take from weeks to months. During these times forecasters had limited data, and were therefore restricted in making an accurate assessment of the present or the future.

Google Trends has been tracking search queries since 2004, which they made available in 2009 through a publicly accessible interface. Google Trends provides a time series index of the volume of search queries entered in Google. Unlike conventional data, Google Trends data are made available with high frequency. Their data sets are updated hourly, easily accessible and can be specialized by date, country, region and city. If we want to make better decisions in the future, we have to make sure that the data sets we use are as accurate as possible. Easy access, short lag of publishing, selection of regions and cost efficiency are all important factors which determine the usefulness of data. That is why Google Trends data might be more useful than other macroeconomic variables and might help forecasters to make better decisions. internet usage is still growing, and since Google is the most used search engine, their search queries will give a good representation of society’s interest in real time. This can offer forecasters a new data source which is applicable in many scenarios.

The results of other empirical studies have shown promising results. McLaren & Shanbhogue (2011), Carrière-Swallow & Labbé (2013) and Vosen & Schmidt (2011) all found that the use of Google Trends data improved the predictive power of their models. McLaren & Shanbhogue (2011) concluded that for the UK, that search term variables can outperform some existing indicators over the period since 2004. These studies show promising results for the usefulness of Google Trends

(4)

4

data. Until this day, no study has been done to test the predictive power of Google Trends data on the housing market in the Netherlands. This thesis will try to answer this question about the predictive power of Google Trends data.

The remainder of this thesis is structured as follows. After the introduction the literature will be presented, followed by the data description and the methodology. Then the results will be presented, and any shortcomings and possible future improvements will be discussed. Finally, the thesis will be concluded.

Literature review

One of the first papers to suggest that web search data can help forecast economic statistics was written by Ettredge at al. (2005). They tried to find out whether rates of employment-related searches by internet users are associated with unemployment levels, by regressing official U.S. monthly unemployment data against web-based job search data with a lag of one week. Comparing the explanatory power of daily web search data to weekly U.S. data on new unemployment

insurance claims. The data for the dependant variable were gathered from The Bureau of Labor Statistics, which releases the unemployment report every month. For the explanatory variable they used data from WordTracker, which contains a list of the most web searched keywords. The keywords Ettredge at al. (2005) used were: “job search”, “jobs”, “monster.com”, “resume”,

“employment”, and “job listings”. Summing up the total number of search hits gave them the weekly job search activity. They used one-day rates and four-week moving averages of these one-day rates, to create a short-term usage rate. For the long-term usage rate they used a 60-day moving average. By comparing this data to official weekly unemployment data, they were able to test the incremental usefulness. The official weekly unemployment data were taken from the Department of Labor website. They defined a short-term claim variable and a long-term claim variable using a four-week moving average and an eight-week moving average of the seasonal adjusted initial claims data. Their initial results showed that long-term search term variables outperform the short-term search term variables. It could be the case that search terms differ throughout the week causing a bias for the short-term search term variables. However, long-term search term variables are potentially useful in forecasting monthly unemployment rates. They concluded that there exists a significant association between the job-search variables and the official unemployment data.

While Ettredge at al. (2005) only focused on unemployment, Choi & Varian (2012) tried to estimate the predictive power on four variables. They were interested in using search engine data to forecast automobile sales, unemployment claims, travel destination planning and consumer

(5)

5

and encourage readers to undertake their own analyses. For the automobile sales they used non-seasonal adjusted data from Google Trends. They ran a simple regression and found that adding Trucks & SUV and Automotive Insurance categories to this regression significantly improved the in-sample fit. To check if they could improve out-of-in-sample forecasting, they used a rolling window forecast. The results showed that the mean absolute error was reduced by 10.5%. The results during recession were even better, where the mean absolute error was reduced by 21.5% . To test whether Google Trends data could also improve the prediction of unemployment rates, they ran a simple baseline regression on the log of the initial claim without the search terms, and compared this to a model with the search terms. They found that this was only marginally significant and had little impact on in-sample fit. However, they did find that Google Trends data helped in identifying turning points. Looking more closely to four turning points, they found that the mean absolute error

decreased in every period, with particularly pronounced reduction in two of the four periods. Clearly using Google Trends data fits the model better during recessions. Choi & Varian (2012) also tested if they could improve the prediction of travel destination planning using data from the Hong Kong Tourism Board. The Hong Kong Tourism Board publishes the monthly visitor arrival statistics by country or residence. They used non-seasonal adjusted visitor and Google Trends data from US, Canada, UK, Germany, France, Italy, Australia, Japan and India. For all countries but Japan they found good in-sample fits. They concluded that simple seasonal autoregressive models that include

relevant Google Trends variables tend to outperform models that exclude these predictors by 5% to 20%.

Extending the research of Choi and Varian (2012), Carrière-Swallow & Labbé (2013) explored whether observing internet search terms can inform forecasters about consumer behaviour in an emerging market. They used data on the volume of car sales from the Instituto Nacional de Estadistica de Chile, which includes the sales of new and used vehicles. To eliminate the seasonal effects, they used the 12-month growth rate of the data, from January 2006 to May 2010. The keywords used for the Google Trends data were: “Chevrolet”, “Hyundai”, “Nissan”, “Kia”, “Toyota”, ‘’Suzuki’’, ‘’Ford’’, ‘’Mitsubishi’’ and ‘’Mazda’’, which together make up for 65% of the volume of sales of the Chilean National Automobile Association. A small problem arose with Google’s sampling procedure. Identical query requests differ from day to day. To prove this they downloaded the series for each keyword 50 times, in order to characterize the measurement error, and found a standard deviation of 5.8% and a kurtosis of over 10, which made it more difficult to reject their null

hypothesis. They estimated the beta of each search term on the car sales and computed the Google Trends Automotive Index as fitted values from the regression, excluding the constants. To investigate whether there exists a lag between searches and purchases they tested the in-sample predictive

(6)

6

power with a lag up to 30 days, and found that 18 days prior to the close of the sales period gave the smallest in-sample root mean square error. To test their null hypothesis, of no Granger causality, they used nowcasting results from an in-sample estimation and two out-of-sample estimations schemes. The introduction of Google Trends data reduced the root mean square error between 9% and 14%. Besides testing for the Granger causality they also wanted to test the hypothesis that the Google Trends Automotive Index can identify turning points in the data, using a sign test. The results are consistent with the baseline models they used. The improvement in in-sample accuracy suggests that the Google Trends Automotive Index indeed contains information beyond previous values of sales and macroeconomic variables. Carrière-Swallow & Labbé (2013) also wanted to investigate whether their index could improve nowcasting accuracy using out-of-sample estimations. They used a recursive scheme and a 24-month rolling window scheme, and found that the accuracy improved in both schemes for all models. Their results showed that models incorporating Google search results outperformed competing baseline models, both in-sample and out-of-sample nowcasting, by up to 14%.

Barreira et al. (2013) presented a study describing the use of internet search information to achieve an improved nowcasting ability with simple autoregressive models, using data from four countries on unemployment rate and car sales. The monthly unemployment data for Portugal, Spain, France and Italy were taken from the European Central Bank Statistical Data Warehouse website. They used seasonal adjusted data from January 2005 to August 2013. They used different search terms for each country, because a simple translation might not capture relevant searches in another country. For the Google Trends data there exists a measurement error. Requests for the same query will differ each day, because the sample will differ. Barreira et al. (2013) created an API that collects the data in a 14-day period, and calculates the average over these two weeks, similar to the method used by Carrière and Swallow (2013). For the Google Trends data on car sales, the top 20 car brands were used as keywords, representing more than 92% of the market share, again using the API to overcome the measurement error in daily differences. The two goals were to determine whether the use of the volume of search queries improved the ability of simple models to describe the behaviour of unemployment and car sales, and the ability to nowcast unemployment and car sales, in different periods. For the baseline model they used an ARMA(p,q) model. The extended model included the Google Trends data and some of its lagged values, if significantly different from zero. A Quandt Likelihood Ratio test, for structural breaks, was done for both the baseline model and the extended model. Whenever a structural break was found, all data was removed with 15% trimming at the beginning and end of a sample period, and the test was repeated. The results showed that Google Trends data led to the improvement of nowcasts in three out of the four considered countries, in the case of unemployment. For car sales they found that, in some cases, the volume of search queries

(7)

7

helps explaining the variance in car sales data, but found little support on improving predictions. They concluded that when Google Trends variables are significantly different from zero in-sample, they tend to lead to improvements in out-of-sample predictive ability.

The study of Vosen & Schmidt (2011) introduced a new indicator for private consumption based on Google Trends search queries. Their goal was to determine the predictive power of the Google Trends data relative to the two most common survey-based indicators, namely the University of Michigan Consumer Sentiment Index and the Conference Board Consumer Confidence Index. They expected that Google Trends data could be useful in times of macroeconomic turbulences. This can be explained by the fact that data on private consumption for the US are published with a lag of one month, while Google Trends data is updated with high frequency. They were not able to adjust for seasonal effects, because 2 out of the 4 years of the used data stemmed from economic turbulent times. They therefore used year-on-year growth rates, in levels of monthly growth rates. For their baseline model they estimated consumption growth using a simple autoregressive model. Next, they add the MCSI, the CCI or the Google Factors to the baseline model to see if the predictive power improved. Following, they used an extended baseline model which also includes three

macroeconomic variables, namely real personal income, interest rates and stock prices. This model is again extended with the MCSI, the CCI and the Google Factors, to check for an improvement in predictive power. The in-sample tests were done for the period from January 2005 to September 2009, while the out-of-sample tests were done over sub-periods, from January 2005 to December 2007 and from January 2008 to September 2009. Their results showed that in almost all conducted in-sample and out-of-sample forecasting experiments the Google indicator outperformed the survey-based indicators. Although Vosen & Schmidt only had 4 years of data, their study demonstrated the enormous potential that Google Trends data already offer today to forecasters of consumer

spending.

So far not much research has been done on the usefulness of Google Trends data for the real estate market. Brynjolfsson & Wu (2009) demonstrated how data on internet queries can be used to make reliable predictions about both prices and quantities, months before they actually change in the market place. They decided to focus on the real estate market to demonstrate how online search can be used to reveal present economic activities and predict future economic trends. This became an important topic for economists, politicians and investors because of the recent real estate bubble that caused the economic crisis. Economists, politicians and investors all try to assess the current real estate market and predict its recovery as accurate as possible. Assessing the current economic conditions can be more difficult because government data have been made available with a lag of months or more. Brynjolfsson & Wu (2009) used the category “Real Estate” and the sub-category

(8)

8

“Real Estate Agencies” for their Google Trends data from 2004 to 2009. A house price index on a quarterly basis since 1975 was used, which was obtained from the Office of Federal Housing Enterprise Oversight. Using a simple seasonal autoregressive model, they tried to estimate the relation between the search index and real estate market indicators. Firstly they estimated the baseline model, then added the search index to see if it can predict sales volume and price index for the real estate market, followed by adding a lagged search index to test if it has any predictive power to forecast economic trends. They fail to detect any statistically significant relationship between the present search index and real estate sales. However the three month lagged “Real Estate Agencies” category positively correlated with real estate sales. Interestingly, both current and past search index were positively correlated with the contemporaneous house price index. Brynjolfsson & Wu (2009) concluded that Google Trends search terms do correlate with real estate sales and the house price index, lending credibility to the hypothesis that web search can be used to predict future economic activities.

In addition to Brynjolfsson & Wu (2009), Mclaren & Shanbhogue (2011) considered the potential usefulness of internet search data as economic indicators. They evaluated the usefulness of the data for two specific markets: the labour market and the real estate market. For their

unemployment search terms they used: ‘’jobs’’, ‘’jobseeker’s allowance’’, ‘’JSA’’, ‘’unemployment benefit’’, ‘’unemployed’’ and ‘’unemployment’’. Firstly, a baseline model was estimated on the change in unemployment using lagged values of the change. Then they added the internet search data to compare their performance to the baseline model, using data from June 2004 to January 2011. Initial results suggested that internet search data can help predict changes in unemployment in the UK. An out-of-sample test was also conducted, to test its predictive power in nowcasting current unemployment data. In line with the in-sample results, the out-of-sample test outperformed the baseline model. For real estate prices, the search terms they used were: ‘’house prices’’, ‘’buy house’’, ‘’sell house’’, ‘’mortgage’’ and ‘’estate agents’’. Again a baseline model was estimated first, and compared to the extended model including the Google Trends data. For real estate prices their results were somewhat stronger. The out-of-sample results outperformed the baseline model, which indicates that the search data can improve the understanding of the current state of the real estate market. They also found evidence that these data may be used to provide additional insight on a wider range of issues which traditional business surveys might not cover.

(9)

9 Data description

Google Trends provides monthly reports on the volume of search queries with high frequency. These data are easily accessible, can be specified by geographical area and give a good representation of societies interests in real time. Weekly data will be obtained from Google, which will be interpolated to monthly data using an excel template. The template will multiply the search index number of a certain week, by the number of days it overlaps in the according month. If the first two days of a week belong to the month before the required month, the index is multiplied by five. The total sum will be divided by the number of days of that month. The Google Trends index is created by using the equal weighted average of the four keywords. The keywords which will be used as search queries for the Google Trends index are “Huis te koop”, “Funda”, “Huis verkopen” and “Makelaar”. ‘’Funda’’ is the best known website in the Netherlands where you can search for houses that are for sale. ‘’Makelaar’’ is the Dutch word for broker, “huis te koop’’ means house for sale and ‘’huis verkopen’’ means selling a house. These keywords will be used by Dutch agents who are looking to buy or sell a house, and will give a good representation of the Dutch interests in the real estate market in real time.

30 35 40 45 50 55 60 65 70 75 jan -04 au g-04 mrt -05 o kt -05 me i-06 d e c-06 ju l-07 feb -08 se p -08 ap r-09 n o v-09 ju n -10 jan -11 au g-11 mrt -12 o kt -12 me i-13 d e c-13 ju l-14 feb -15 se p -15 ap r-16 G o o gle T re n d s in d ex Date

(10)

10

The graph above shows the average popularity of the search queries. Barreira et al. (2013) already showed that Google Trends data exhibit some seasonality. To filter out these seasonal effects, monthly dummy variables will be added to the model. Besides the seasonal effects, some data might not be stationary. A Dickey-Fuller test will be performed to check for stationarity on all variables.

As Verbruggen & Kranendonk (2005) show, the interest rate, number of households and the lagged house price change are the most important factors in deciding in-sample fit for house price changes. Mortgage loans usually have a long time to maturity, so the interest rate that will be used should have a comparable time to maturity. The 10-year Dutch government bonds has a comparable time to maturity. This interest rate is linked to variable mortgage rates and personal loans. These data are available on the website of DNB. The house price index and the number of households are available from the CBS. The CBS keeps track of all the economic statistics for the Netherlands. The number of households will be interpolated from yearly data to monthly. This will be done by calculating the growth of each year, which is then used to calculate the monthly growth using the following formulas:

∆𝐻𝐻𝑡 = ln(𝐻𝐻𝑡) − ln(𝐻𝐻𝑡−1) (1)

∆𝐻𝐻𝑚𝑜𝑛𝑡ℎ = ∆𝐻𝐻𝑦𝑒𝑎𝑟 (1

12) (2)

The gross domestic product will also be obtained from the CBS, these data are also not available for each month. To get the monthly gross domestic product, the quarterly GDP will be interpolated to monthly data in the same way as households.

80 85 90 95 100 105 110 jan -04 au g-04 mrt -05 o kt -05 me i-06 d e c-06 ju l-07 feb -08 se p -08 ap r-09 n o v-09 ju n -10 jan -11 au g-11 mrt -12 o kt -12 me i-13 d e c-13 ju l-14 feb -15 se p -15 ap r-16 Price in d ex Date

(11)

11

Figure 2 shows the trend of the house price index from January 2004 to April 2016, with January 2010 as reference year with an index of 100. It clearly shows that the mortgage crisis of 2007 had an impact on the house price index. In the years 2009 and 2010 house prices didn’t increase nor decrease. In the beginning of 2011 house prices decreased further due to the economic crisis, up to the beginning of 2013.

The figure above shows the GDP of the Netherlands from January 2004 to January 2014. Just like the house price index, around September 2008 the mortgage crisis caused a decrease in the GDP. The GDP almost fully recovers from the mortgage crisis, but after the economic crisis of 2011 the GDP starts to slowly decrease again.

Figure 4 shows the interest rate of the 10-year Dutch government bonds. The interest rate was somewhat stable before the real estate crisis hovering around 4%. However, after the crisis the

120000 125000 130000 135000 140000 145000 jan -04 ju l-04 jan -05 ju l-05 jan -06 ju l-06 jan -07 ju l-07 jan -08 ju l-08 jan -09 ju l-09 jan -10 ju l-10 jan -11 ju l-11 jan -12 ju l-12 jan -13 ju l-13 jan -14 G DP (m ill ion s) Date

Figure 3: GDP

0,0% 0,5% 1,0% 1,5% 2,0% 2,5% 3,0% 3,5% 4,0% 4,5% 5,0% jan -04 au g-04 mrt -05 o kt -05 me i-06 d e c-06 ju l-07 feb -08 se p -08 ap r-09 n o v-09 ju n -10 jan -11 au g-11 mrt -12 o kt -12 me i-13 d e c-13 ju l-14 feb -15 se p -15 ap r-16 In tere st Rat e Date

Figure 4: Interest Rate on 10-y Dutch

Government Bonds

(12)

12

interest rate decreased until it reached a record low in April 2015. The ECB purposely lowered the interest rate, in oreder to stimulate the demand for money, lower the inflation and to decrease the effects of the economic crisis.

Figure 5 shows the increase in households from January 2004 to January 2014. It shows a steady increase during the period of 10 years. There are however, two points where the increase staggered just slightly, namely during the mortgage crisis and the economic crisis, respectively in 2009 and 2012. In the table below the used variables are summarized.

Table 1

Syntax Description Mean Min. Max. σ Source

∆𝐻𝑃 House Price 95.1777 84 107 6.7163 CBS

∆𝐺𝐷𝑃 GDP 134944 124855 141198 4393.31 CBS

∆𝑅 Interest rate 0.0289 0.0031 0.0473 0.0123 DNB

∆𝐻𝐻 Households 7286118 6990000 7548000 164562 CBS

∆𝐺𝑇 Google Trends 51.7548 38.3629 68.7500 5.9646 Google 6900000 7000000 7100000 7200000 7300000 7400000 7500000 7600000 jan -04 ju l-04 jan -05 ju l-05 jan -06 ju l-06 jan -07 ju l-07 jan -08 ju l-08 jan -09 ju l-09 jan -10 ju l-10 jan -11 ju l-11 jan -12 ju l-12 jan -13 ju l-13 jan -14 H o u se h o ld s Date

Figure 5: Households

(13)

13 Methodology

Firstly, the relationship between the change in house prices and its lagged variables is estimated, using a simple autoregressive model in the time period from 2004 to 2014. The significant lagged variables will be used to form the baseline model (A1). Where 𝜀𝑡 denotes the white noise error term, ∆𝐻𝑃𝑡 denotes the monthly change in the house price index at time t, calculated as follows:

∆𝐻𝑃𝑡 = ln(𝐻𝑃𝑡) − ln(𝐻𝑃𝑡−1) (3)

∆𝐻𝑃𝑡 = 𝛽0+ 𝛽𝑖∑𝑝𝑖=1∆𝐻𝑃𝑡−𝑖+ 𝜀𝑡 (A1)

A second model (A2) will be estimated using the significant lagged variables of house price index differences and the control variables: interest rate, gross domestic product and households, to test its in-sample fit on the change of house prices. These are the same variables as Verbruggen & Kranendonk (2005) used in their study. The lagged variables will be tested for significance and removed if not significant.

∆𝐻𝑃𝑡 = 𝛽0+ 𝛽𝑖∑𝑝𝑖=1∆𝐻𝑃𝑡−𝑖 + 𝛽𝑗∑𝑗=1𝑞 ∆𝑅𝑡−𝑗+ 𝛽𝑘∑𝑠𝑘=1∆𝐺𝐷𝑃𝑡−𝑘+ 𝛽𝑙∑𝑛𝑙=1∆𝐻𝐻𝑡−𝑙+ 𝜀𝑡 (A2)

Next the Google Trends index is added to the models to see if it increases the in-sample fit, creating the following model, where ∆𝐺𝑇𝑡 denotes the monthly change in popularity of the

combined search terms at time t, calculated as follows:

∆𝐺𝑇𝑡= ln(𝐺𝑇𝑡) − ln(𝐺𝑇𝑡−1) (4)

∆𝐻𝑃𝑡 = 𝛽0+ 𝛽𝑖∑𝑝𝑖=1∆𝐻𝑃𝑡−𝑖+ 𝛽𝑚∑𝑢𝑚=−1∆𝐺𝑇𝑡−𝑚+ 𝜀𝑡 (B1)

the second extended model will also include the significant control variables found in model A2. ∆𝐻𝑃𝑡= 𝛽0+ 𝛽𝑖∑𝑝𝑖=1∆𝐻𝑃𝑡−𝑖+ 𝛽𝑗∑𝑗=1𝑞 ∆𝑅𝑡−𝑗+ 𝛽𝑘∑𝑠𝑘=1∆𝐺𝐷𝑃𝑡−𝑘+ 𝛽𝑙∑𝑛𝑙=1∆𝐻𝐻𝑡−𝑙+

(14)

14

Because this is a time-series analysis, the seasonal effects can cause a bias. To remove the seasonal effects, monthly dummy variables will be added to the extended models, with December as omitted variable. ∆𝐻𝑃𝑡 = 𝛽0+ 𝛽𝑖∑𝑖=1𝑝 ∆𝐻𝑃𝑡−𝑖+ 𝛽𝑚∑𝑢𝑚=−1∆𝐺𝑇𝑡−𝑚+ 𝛽3𝐽𝑎𝑛𝑡+ 𝛽4𝐹𝑒𝑏𝑡+ 𝛽5𝑀𝑎𝑟𝑡+ 𝛽6𝐴𝑝𝑟𝑡+ 𝛽7𝑀𝑎𝑦𝑡+ 𝛽8𝐽𝑢𝑛𝑡+ 𝛽9𝐽𝑢𝑙𝑡+ 𝛽10𝐴𝑢𝑔𝑡+ 𝛽11𝑆𝑒𝑝𝑡+ 𝛽12𝑂𝑐𝑡𝑡+ 𝛽13𝑁𝑜𝑣𝑡+ 𝜀𝑡 (C1) ∆𝐻𝑃𝑡= 𝛽0+ 𝛽𝑖∑𝑝𝑖=1∆𝐻𝑃𝑡−𝑖+ 𝛽𝑗∑𝑞𝑗=1∆𝑅 𝑡−𝑗+ 𝛽𝑘∑ ∆𝐺𝐷𝑃 𝑠 𝑘=1 𝑡−𝑘+ 𝛽𝑙∑𝑛𝑙=1∆𝐻𝐻𝑡−𝑙+ 𝛽𝑚∑𝑢𝑚=−1∆𝐺𝑇𝑡−𝑚+ 𝛽6𝐽𝑎𝑛𝑡+ 𝛽7𝐹𝑒𝑏𝑡+ 𝛽8𝑀𝑎𝑟𝑡+ 𝛽9𝐴𝑝𝑟𝑡+ 𝛽10𝑀𝑎𝑦𝑡+ 𝛽11𝐽𝑢𝑛𝑡+ 𝛽12𝐽𝑢𝑙𝑡+ 𝛽13𝐴𝑢𝑔𝑡+ 𝛽14𝑆𝑒𝑝𝑡+ 𝛽15𝑂𝑐𝑡𝑡+ 𝛽16𝑁𝑜𝑣𝑡+ 𝜀𝑡 (C2)

(15)

15 Results

The Dickey Fuller results are discussed before the regression results are analysed. Then the models without the control variables are discussed, followed by the models including the control variables. Finally the models with the monthly dummy variables are presented. The models are presented in table 2. A Dickey Fuller test was performed for all the variables, except for the monthly dummy variables, which are presented in the appendix. For the house price index, interest rate, gross domestic product and households a unit root is found; these variables are non-stationary. Only the Google Trends index does not contain a unit root, suggesting that the Google Trends index is stationary.

The regression on the lagged variables of change in house price index results into three significant lags, namely the fourth, fifth and sixth month, all with a P-value of less than 0.01. The results suggest that an increase of 1% in the house price index of four months ago, leads to an increase of 0.2206% on the house price index now. While an increase of 1% on the fifth and sixth lagged variables would lead to an increase of, respectively, 0.2532% and 0.2664%.

Model B1 is an extension of model A1, where the Google Trends index is added to the model. Again the fourth, fifth and sixth lagged house price index variables are all significant, as well as the Google Trends index. None of the lagged Google Trends variables are significant, however the

variable at time t=0 is. A 1% increase in the Google Trends index leads to a house price index increase of 0.0167%. The addition of this variable did improve the in-sample fit of the regression, the adjusted R-squared increased from 0.2097 to 0.2509. Model B1 explains 4.12% more of the change in the house price index, compared to model A1.

The third model, C1, included the monthly dummy variables in order to remove any seasonal effects. Only the dummy variables January, February and July have a positive significant effect on the change of the house price index, suggesting that the house price index rises around 0.5% more in these months, compared to the month December. An F-test is performed to see if the dummy variables reduce the in-sample fit of the model. An F-value of 0.87 is found, which is lower than the critical value of 3.86, suggesting that the seasonal dummy variables contribute to the in-sample fit with a probability of 1%. The Google Trends index variable is removed from this model because it no

(16)

16

longer has a significant effect. Model C1 has both a lower adjusted R-squared and a lower F-value compared to model A1 and B1.

Table 2 Model A1 B1 C1 A2 B2 C2 ∆HP(t-1) -0.2135** (0.0877) -0.2038** (0.0857) -0.2514*** (0.0923) ∆HP(t-2) -0.1720** (0.0838) -0.1537* (0.0821) -0.1791** (0.0882) ∆HP(t-4) 0.2206*** (0.0764) 0.2477*** (0.0749) 0.2512*** (0.0791) 0.1787** (0.0907) 0.2027** (0.0798) 0.2085** (0.0842) ∆HP(t-5) 0.2532*** (0.0766) 0.2601*** (0.0746) 0.2690*** (0.0791) 0.2966*** (0.0812) 0.2916*** (0.0793) 0.3199*** (0.0851) ∆HP(t-6) 0.2664*** (0.0762) 0.2164*** (0.0762) 0.2329*** (0.0791) 0.3303*** (0.0830) 0.2822*** (0.0832) 0.3186*** (0.0867) ∆R(t-2) 0.0231** (0.0109) 0.0221** (0.0107) 0.0244** (0.0117) ∆GDP(t-2) 0.8230*** (0.2265) 0.8259*** (0.2210) 0.8770*** (0.5873) ∆GT(t) 0.0167*** (0.0057) 0.0151** (0.0060) Jan(t) 0.0056** (0.0025) 0.0050* (0.0027) Feb(t) 0.0048* (0.0025) 0.0063** (0.0027) Mar(t) 0.0038 (0.0025) 0.0053* (0.0027) Apr(t) 0.0036 (0.0025) 0.0024 (0.0028) May(t) 0.0014 (0.0026) 0.0014 (0.0028) Jun(t) 0.0031 (0.0026) 0.0020 (0.0028) Jul(t) 0.0047* (0.0026) 0.0041 (0.0028) Aug(t) 0.0038 (0.0025) 0.0047* (0.0027) Sep(t) 0.0014 (0.0025) 0.0023 (0.0027) Oct(t) 0.0035 (0.0025) 0.0045 (0.0027) Nov(t) 0.0013 (0.0025) 0.0020 (0.0027) Obs. 141 141 141 116 116 116 Adjusted R-squared 0.2097 0.2509 0.2017 0.3290 0.3610 0.3268 F-value 13.38 12.72 3.53 9.05 9.12 4.10

(17)

17

Model A2 is an extension of model A1; where the control variables are added. None of the lagged household control variables are significant, so they are removed from every model. For the gross domestic product and the interest rate, the only lagged variables that are significant are the two month lagged variables, which both have a positive effect on the change in the house price index. An increase of 1% in the gross domestic product of two months ago, leads to an increase of 0.8230% on the house price index now; an increase of 1% in the interest rate of two months ago, leads to an increase of 0.0231% on the house price index now. The first, second, fourth fifth and sixth month lagged house price index variables all have a significant effect. There are, however, some contradictive effects. The first two lagged variables have a negative effect, while the other three have a positive effect. This may be caused by multicollinearity.

In model B2 the Google Trend index is added to model A2, to see if the in-sample fit increases. There is a small decrease in the significance of the two month lagged house price index variable. Again the control variables and the lagged house price variables are all significant, as well as the Google Trends index. Adding the Google Trends index did improve the in-sample fit of the model. A 1% increase in the Google Trends index leads to a house price index increase of 0.0151%. Model B2 has a higher F-value compared to model A2, 9.12 > 9.05, and the adjusted R-squared increased from 0.3290 to 0.3610. Model B2 explains 3.20% more of the change in the house price index than to model A2.

Again the seasonal effects have to be accounted for, doing so by adding the monthly dummy variables to the model. Model C2 shows that again there are three months that have a positive effect on the change of the house price index, namely January, February and August. The results suggest that the house price index rises 0.6% more in February, compared to December. An F-test is

performed to see if the dummy variables reduce the in-sample fit of the model. An F-value of 0.97 is found, which is lower than the critical value of 3.43, suggesting that the seasonal dummy variables contribute to the in-sample fit with a probability of 1%. The first and second month lagged variables of the house price index are more significant compared to model B2, while the significance of the other three lagged variables doesn’t change. Again the Google Trends variable has no significant effect, and therefore is removed. Model C2 has both a lower adjusted R-squared and a lower F-value compared to model A2 and B2.

(18)

18 Conclusion and Discussion

This thesis studied the potential predictive power of internet search data on the real estate market in the Netherlands, by constructing a Google Trends index and testing if this index could improve the in-sample fit of simple autoregressive models. There do remain some limitations

concerning the data. The gross domestic product had to be interpolated to monthly data which might cause a bias. Also the keywords used for the Google Trends index are somewhat arbitrary. Despite these limitations, the results show that models including the Google Trends index outperform the survey-based models. A positive significant relation is found between the Google Trends index and the change in the house price index in models B1 and B2, where both models increased their in-sample fit. This can be explained by the fact that Google Trends data give a good representation of societies interest in real time, is easy accessible and is published with high frequency. However, after accounting for seasonal effects, the Google Trends index no longer had a significant effect on the change of the house price index.

(19)

19 Reference list

Barreira, N., Godinho, P., & Melo, P. (2013). Nowcasting unemployment rate and new car sales in south-western Europe with Google Trends. Netnomics. 14. 129-165.

Brynjolfsson, E., & Wu, L. (2009). The Future of Prediction: How Google Searches Foreshadow Housing Prices and Quantities. International Conference on Information Systems. 147, 1-14. Carrière-Swallow, Y., & Labbé, F. (2013). Nowcasting with Google Trends in an Emerging Market.

Journal of Forecasting. 32, 289-298.

Choi, H., & Varian, H. (2012). Predicting the Present with Google Trends. The Economic Record. 88, 2-9.

Ettredge, J., Gerdes, J., & Karuga, G. (2005). Using web-based search data to predict macroeconomic statistics. Communications of the ACM. 48(11), 87-92.

Mclaren, N., & Shanbhogue, R. (2011). Using internet search data as economic indicators. Bank of

England Quarterly Bulletin. 2, 134-140.

Malik, T.M., Gumel, A., Thompson, L.H., Strome, T., Mahmud, M.S. (2011). “Google Flu Trends” and Emergency Department Triage Data Predicted the 2009 Pandemic H1N1 Waves in Manitoba.

Canadian Journal of Public Health. 102(4), 294-297.

Verbruggen, J., & Kranendonk, H. (2005). Welke factoren bepalen de ontwikkeling van de huizenprijs in Nederland? CPB document. 81, 1-42.

Vosen, S., & Schmidt, T. (2011). Forecasting Private Consumption: Survey-Based Indicators vs. Google Trends. Journal of Forecasting. 30, 565-578.

(20)

20 Appendix

Dickey Fuller Tests (DF1)

t-value: -2.23 > -3.12, House price index contains a unit root, it is non-stationary.

_cons .023622 .0096129 2.46 0.015 .0046214 .0426226 trend -.0000452 .0000149 -3.03 0.003 -.0000747 -.0000157 L1HP -.0002107 .0000944 -2.23 0.027 -.0003973 -.0000242 dlnHP Coef. Std. Err. t P>|t| [95% Conf. Interval] Total .006766351 146 .000046345 Root MSE = .00663 Adj R-squared = 0.0509 Residual .006333846 144 .000043985 R-squared = 0.0639 Model .000432505 2 .000216253 Prob > F = 0.0086 F(2, 144) = 4.92 Source SS df MS Number of obs = 147 . reg dlnHP L1HP trend

_cons 101.0675 .9614052 105.12 0.000 99.16739 102.9675 trend -.0790571 .0111947 -7.06 0.000 -.1011816 -.0569326 HP Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 6630.99642 147 45.1088192 Root MSE = 5.8184 Adj R-squared = 0.2495 Residual 4942.63305 146 33.8536511 R-squared = 0.2546 Model 1688.36336 1 1688.36336 Prob > F = 0.0000 F(1, 146) = 49.87 Source SS df MS Number of obs = 148 . reg HP trend

(21)

21 (DF2)

t-value: -5.02 < -3.12, Google Trends does not contain a unit root, it is stationary.

_cons .3128402 .0635222 4.92 0.000 .1872984 .438382 trend .0001474 .0001666 0.88 0.378 -.0001819 .0004767 L1GT -.0062086 .0012357 -5.02 0.000 -.0086508 -.0037663 dlnGT Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 1.27946525 148 .008645035 Root MSE = .08644 Adj R-squared = 0.1358 Residual 1.09077847 146 .007471085 R-squared = 0.1475 Model .188686779 2 .09434339 Prob > F = 0.0000 F(2, 146) = 12.63 Source SS df MS Number of obs = 149 . reg dlnGT L1GT trend

_cons 49.88349 .9660737 51.64 0.000 47.97441 51.79257 trend .024786 .0110998 2.23 0.027 .0028515 .0467205 GT Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 5300.95795 149 35.576899 Root MSE = 5.8864 Adj R-squared = 0.0261 Residual 5128.18063 148 34.6498691 R-squared = 0.0326 Model 172.777316 1 172.777316 Prob > F = 0.0270 F(1, 148) = 4.99 Source SS df MS Number of obs = 150 . reg GT trend

(22)

22 (DF3)

t-value: -0.44 > -3.12, Interest rate contains a unit root, it is non-stationary.

_cons .0463289 .0816299 0.57 0.571 -.1150188 .2076766 trend -.0005505 .0004732 -1.16 0.247 -.0014858 .0003848 L1R -.7261456 1.657823 -0.44 0.662 -4.002957 2.550666 dlnR Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 2.03658743 146 .013949229 Root MSE = .11779 Adj R-squared = 0.0054 Residual 1.99785946 144 .013874024 R-squared = 0.0190 Model .038727977 2 .019363988 Prob > F = 0.2510 F(2, 144) = 1.40 Source SS df MS Number of obs = 147 . reg dlnR L1R trend

_cons .0476191 .0009757 48.81 0.000 .0456908 .0495474 trend -.0002516 .0000114 -22.15 0.000 -.0002741 -.0002291 R Coef. Std. Err. t P>|t| [95% Conf. Interval] Total .022191315 147 .000150961 Root MSE = .0059 Adj R-squared = 0.7690 Residual .005090586 146 .000034867 R-squared = 0.7706 Model .017100729 1 .017100729 Prob > F = 0.0000 F(1, 146) = 490.46 Source SS df MS Number of obs = 148 . reg R trend

(23)

23 (DF4)

t-value: -0.42 > -3.12, Households contains a unit root, it is non-stationary.

_cons .0051879 .010585 0.49 0.625 -.015775 .0261509 trend 5.66e-07 7.11e-06 0.08 0.937 -.0000135 .0000146 L1HH -6.29e-10 1.51e-09 -0.42 0.678 -3.63e-09 2.37e-09 dlnHH Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 5.3378e-06 119 4.4855e-08 Root MSE = .0002 Adj R-squared = 0.1400 Residual 4.5134e-06 117 3.8576e-08 R-squared = 0.1544 Model 8.2434e-07 2 4.1217e-07 Prob > F = 0.0001 F(2, 117) = 10.68 Source SS df MS Number of obs = 120 . reg dlnHH L1HH trend

_cons 7000671 2200.712 3181.09 0.000 6996313 7005029 trend 4679.46 31.30803 149.47 0.000 4617.467 4741.453 HH Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 3.2497e+12 120 2.7081e+10 Root MSE = 12029 Adj R-squared = 0.9947 Residual 1.7219e+10 119 144696053 R-squared = 0.9947 Model 3.2325e+12 1 3.2325e+12 Prob > F = 0.0000 F(1, 119) = 22339.83 Source SS df MS Number of obs = 121 . reg HH trend

(24)

24 (DF5)

t-value: -0.85 > -3.12, Gross Domestic Product contains a unit root, it is non-stationary.

_cons .0102092 .0088961 1.15 0.253 -.0074091 .0278274 trend -.0000276 8.67e-06 -3.18 0.002 -.0000448 -.0000104 L1GDP -5.83e-08 6.84e-08 -0.85 0.396 -1.94e-07 7.71e-08 dlnGDP Coef. Std. Err. t P>|t| [95% Conf. Interval] Total .000893087 119 7.5049e-06 Root MSE = .00251 Adj R-squared = 0.1601 Residual .000737497 117 6.3034e-06 R-squared = 0.1742 Model .00015559 2 .000077795 Prob > F = 0.0000 F(2, 117) = 12.34 Source SS df MS Number of obs = 120 . reg dlnGDP L1GDP trend

_cons 130065.7 621.2334 209.37 0.000 128835.6 131295.8 trend 79.96905 8.837865 9.05 0.000 62.4692 97.46891 GDP Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 2.3161e+09 120 19301172.8 Root MSE = 3395.6 Adj R-squared = 0.4026 Residual 1.3721e+09 119 11530282 R-squared = 0.4076 Model 944037175 1 944037175 Prob > F = 0.0000 F(1, 119) = 81.87 Source SS df MS Number of obs = 121 . reg GDP trend

Referenties

GERELATEERDE DOCUMENTEN

In the paper, we demonstrated the generation of ultrasound fields at therapeutically relevant acoustic pressures and frequencies; compatibility of the devices with ultra high

A review of selected cases has revealed that courts have enforced executive policies giving effect to socio-economic rights based on the obligation imposed on government

9 Building upon the constellations of state and non-state actors to induce regulation at transnational level, Börzel and Risse (2005) distinguish four types of

However, at higher taper angles a dramatic decay in the jet pump pressure drop is observed, which serves as a starting point for the improvement of jet pump design criteria for

From our experiments we conclude in the first place that energy barrier as well as the theoretical switching field in the absence of thermal fluctuations are always larger for

This is the so-called voluntary Transparency Register and it was seen as an enhancement to transparency, because it made it possible for European citizens to

T he good relationships between the branches of catalysis can also be found in The Netherlands Institute for Catalysis Re- search (NIOK), of which almost all catalysis research

Het project was zo succesvol dat we dit jaar weer een project wilden doen waarin B2 studenten een MEMS chip kunnen ontwerpen die dan ook echt gemaakt wordt in de cleanroom.. Maar