Nowcasting with many predictors : predicting U.S. GDP using Bayesian Structural Time-Series Model

(1)

Faculty of Economics and Business

MASTER THESIS

Nowcasting with many predictors:

Predicting U.S. GDP Using Bayesian

Structural Time-Series Model

Author: Jan Svit´ak (student id: 10605231) Supervisor: dr. N.P.A. (Noud) van Giersbergen Second marker: prof. dr. H.P. (Peter) Boswijk Academic Year: 2013/2014

(2)

Declaration of Authorship

The author hereby declares that he compiled this thesis independently, using only the listed resources and literature.

The author grants to the University of Amsterdam permission to reproduce and to distribute copies of this thesis document in whole or in part.

(3)

I would like to thank my thesis supervisor, Noud van Giersbergen, for his help with the choice of the thesis topic and his valuable comments during the process of writing. Furthermore, I am grateful to Steve Scott for very useful remarks on the estimation of the Bayesian structural time-series model and Marta Ba´nbura and Marek Rusn´ak for clarifications regarding the dynamic factor model.

(4)

Abstract

This thesis attempts to deal with the problem of nowcasting GDP using a vast number of explanatory variables. To simulate a situation comparable to now-casting in the real world we conduct a real-time nownow-casting exercise. The issues associated with the number of predictors, dependent and explanatory variables being published at different frequencies, and the unavailability of some data in real-time datasets have been commonly solved using a version of a dynamic factor model in the literature. This approach is, therefore, used also in this thesis and its performance is compared to the Bayesian structural time-series model which, to our knowledge, has not been used for nowcasting GDP yet. Furthermore, we are not aware of other study incorporating a solution for mixed frequencies of time-series into the model framework.

According to the results of the conducted nowcasting exercise, the accuracy of the Bayesian structural time-series model is comparable to the dynamic fac-tor model and, hence, proves to be a valid alternative to frequently used facfac-tor models. Finally, when dealing with balanced datasets the Bayesian structural time-series outperforms the dynamic factor model.

The main contribution of this thesis is, therefore, providing the evidence for the performance of the Bayesian structural time-series model in the nowcasting setting with intertemporal aggregation issues. Moreover, it adds to a limited literature dealing with real-time nowcasting and illustrates that the number of factors plays a crucial role in the precision of the dynamic factor model.

JEL Classification C01, C11, C53, C55, E37, E52

Keywords Bayesian structural time-series model, Dynamic factor model, GDP, nowcasting

Author’s e-mail jan.svitak@student.uva.nl Supervisor’s e-mail N.P.A.vanGiersbergen@uva.nl

(5)

List of Tables vii

List of Figures viii

Acronyms ix

Introduction 1

1 General Framework 4

1.1 State-space models . . . 4

1.2 Kalman filter . . . 5

1.2.1 Kalman filter recursions . . . 5

1.2.2 State smoothing . . . 5

1.2.3 Treating missing values . . . 6

2 Dynamic Factor Model 7 2.1 Dealing with mixed frequency . . . 10

2.2 EM algorithm . . . 12

2.2.1 M-step . . . 14

2.2.2 Initialization and the E-step . . . 17

3 Bayesian Structural Time-Series 21 3.1 Structural Time-Series model . . . 21

3.1.1 Dealing with mixed frequency . . . 22

3.1.2 Prior distributions of initial values . . . 24

3.1.3 State smoothing . . . 26

3.1.4 Posterior distributions . . . 27

3.2 Spike and Slab Regression . . . 27

3.2.1 Prior distributions . . . 27

(6)

Contents vi

3.3 MCMC algorithm . . . 30

4 Data 31

5 Empirical Results 33

5.1 Dynamic factor model . . . 34 5.2 Bayesian structural time-series . . . 35 5.2.1 Frequently selected predictors . . . 38

6 Conclusion 40

Bibliography 48

A Data on Predictors and Their Transformations I B Derivation of the factor loadings estimator for monthly

vari-ables VI

(7)

5.1 RMSFEs of DFM nowcasts . . . 35 5.2 RMSFEs of BSTS nowcasts . . . 38 C.1 Numbers of factors and iterations of the EM algorithm (# of

(8)

List of Figures

5.1 Fit of the local linear trend part of BSTS . . . 36 5.2 Fit of the regression part of BSTS . . . 36 5.3 Overall fit of BSTS . . . 37 5.4 Inclusion probabilities of 5 most frequently included variables . . 39

(9)

ALFRED Archival Federal Reserve Economic Data

AR Autoregressive (model)

AIC Akaike’s Information Criterion BEA Bureau for Economic Analysis BIC Bayesian Information Criterion

BSTS Bayesian Structural Time-Series

DFM Dynamic Factor Model

ECB European Central Bank

EM Expectation-Maximisation

FED Federal Reserve

FRED Federal Reserve Economic Data

GDP Gross Domestic Product

MCMC Markov Chain Monte Carlo

ML Maximum Likelihood

QML Quasi-Maximum Likelihood

RMSFE Root Mean Squared Forecast Error

SSVS Stochastic Search Variable Selection

(10)

Introduction

“In view of all that we have said in the forgoing sections, the many obstacles we appear to have surmounted, what casts the pall over our victory celebration? It is the curse of dimensionality, a male-diction that has plagued the scientist from the earliest days.”

Bellman et al. (1961)

Gross domestic product is, without a doubt, one of the most important measures of the economic situation in a given geographic area. Therefore, it plays a prominent role in decision-making processes of key policymakers and serves as an input into numerous macroeconomic models, which are used to fur-ther assess the state of the economy or provide forecasts of future developments of the economic climate. Therefore, it is crucial to have accurate estimates of GDP available as soon as possible. Unfortunately, official GDP data are pub-lished with substantial publication lags and are, moreover, further revised in the future. Hence, researchers have been trying to come up with methods able to yield timely estimates of the currect GDP, i.e. nowcasts.

While nowcasting, a researcher faces several difficulties. For example, data on GDP, our dependent variable, are announced at different frequencies (quar-terly) than the variables used for its modelling (monthly). Furthermore, GDP is quite often subject to sizable revisions as documented, inter alia, by Aruoba (2008), Faust et al. (2005), Fernandez et al. (2011) or Garratt & Vahey (2006). It is natural that forecasts utilizing revised data may yield very different results then estimates based on real-time data. This has been shown e.g. by Faust et al. (2003), Kugler et al. (2005), Molodtsova et al. (2008), Orphanides (2001) and Robertson & Tallman (1998). Hence, every nowcasting should utilize real-time data in order to produce reliable and relevant results. Nevertheless, as noted by Rusn´ak (2013), studies conducting real-time nowcasting exercises are

(11)

still relatively rare. Finally, because the data on macroeconomic variables are published at different times, one has to deal with datasets with missing obser-vations at the end of the real-time sample (so called ragged-end) if we aim to utilize all the information available.

Two approaches addressing the aforementioned issues are applied in this thesis. Arguably the most common techniques are based on a dynamic factor model (DFM) (Geweke 1976), which is able to handle a high-dimensional data-flow and avoid parameter proliferation by capturing the bulk of co-movements among the numerous variables with only a few factors forming a parsimonious model. A good forecasting performance of the factor models has been docu-mented, among others, by Stock & Watson (2002a), Boivin & Ng (2005), Forni et al. (2005) or Marcellino et al. (2003). According to Bańbura et al. (2013) the two-step approach of Giannone et al. (2008) has become a very popular now-casting method. It was first implemented for nownow-casting GDP at the Board of Governors of the Federal Reserve in 2003. It was later also used by ECB (2008) and the Reserve Bank of New Zealand (Matheson 2007). Other applica-tions include Lahiri & Monokroussos (2013) for the U.S., Angelini et al. (2008), Angelini et al. (2011), Bańbura & Modugno (2014) and Bańbura & Rünstler (2011) for the euro area and many others. Nevertheless, Bańbura et al. (2013) note that most of the studies utilize only revised data because of the unavail-ability of vintage datasets.

Doz et al. (2012) show that large dynamic factor models can be consis-tently estimated also using maximum likelihood by applying the Expectation-Maximization algorithm. This quasi-maximum likelihood approach proposed for nowcasting by Bańbura & Modugno (2014), has become increasingly pop-ular in recent studies, see e.g. Bańbura et al. (2013), Rusnák (2013) or Schu-macher & Breitung (2008). Due to its efficiency, the flexibility with which it deals with missing data, the possibility to impose restrictions on paramaters and its robustness to misspecification the EM algorithm is also utilized in this thesis.

A completely different approach dealing with the so-called ”curse of di-mensionality” is the Bayesian structural time-series (BSTS) model, recently proposed by Scott & Varian (2013) for nowcasting economic time-series using Google Search queries. The technique combines a structural time-series model

(12)

Introduction 3

with spike and slab regression on predictors and Bayesian model averaging. Aim of this study is to find out how BSTS performs when used for nowcasting GDP, how it compares to DFM and whether it can be considered a valid alter-native to the factor model based techniques.

The thesis is structured as follows: chapter 1 introduces the general frame-work of state-space representation of the models. chapter 2 gives details on the version of dynamic factor model used as a benchmark. chapter 3 explains in detail the Bayesian structural time-series model. Finally, chapter 4 introduces the utilized data, chapter 5 presents the empirical results from the presented models and chapter 6 summarizes the findings.

(13)

General Framework

At the beginning, it might be useful to introduce a couple of general concepts which form essential building blocks of both examined models. These are the state-space representation of both models and Kalman filtering and smoothing usually applied to models in state-space form.

1.1 State-space models

State-space representation is a useful way of formulating a wide range of time-series models in a unified manner enabling an application of state-space mod-elling techniques. For an extensive coverage of state-space models see e.g. Durbin & Koopman (2012). Under state-space models, it is assumed that a time development of a system in question depends on unobserved states α. The model consists of two equations. The observation equation (1.1) links the observed series to the unobserved states, and the transition (or state) equation (1.2) specifies an autoregressive process for the states.

yt= Ztαt+ t, t ∼ N (0, Ht) (1.1)

αt = Ttαt−1+ Ntηt, ηt ∼ N (0, Vt) (1.2)

Additionally, a distribution for the initial state may be specified as α1 ∼

N (a1, P1). Many applications of the model utilize model matrices Zt, Ht, Tt, Nt, Vt

constant over time so that the time subscripts may be dropped. Moreover, Nt

(14)

1. General Framework 5

1.2 Kalman filter

The unobserved states αt are usually obtained using Kalman filter (Kalman

1960) and, eventually, smoothed using Kalman smoother. These tools enable us to take advantage of the compact form of the models provided by their state-space representations. Denoting y1:t = y1, . . . , yt, the set of observations

up to time t ∈ {1, . . . , T }, the Kalman filter recursively computes a predictive distribution of the individual states p(αt+1|y1:t) using the distribution of the

previous state p(αt|y1:t−1) and yt. The Kalman state smoother then improves

the output of the Kalman filter by utilizing all the available data and produces p(αt, y1:T). The computations are specified in below.

1.2.1 Kalman filter recursions

The standard Kalman filtering algorithm (e.g. Jazwinski (1970)) can be sum-marized as follows. For derivations of the stated formulae see Durbin & Koop-man (2012). vt = yt− Ztαt Ft = ZtPtZt0+ Ht at|t= at+ PtZt0F −1 t vt Pt|t = Pt− PtZt0F −1 t ZtPt at+1 = Ttat− Ktvt Pt+1 = TtPt(Tt− KtZt)0+ NtVtNt0 (1.3) where at|t= E[αt|y1:t] Pt|t = V[αt|y1:t] at+1= E[αt+1|y1:t] Pt+1 = V[αt+1|y1:t] and Kt = TtPtZt0F −1

t is usually referred to as Kalman gain.

1.2.2 State smoothing

To utilize all the available information the Kalman smoother proceeds back-wards through the series (as opposed to the Kalman filter). Once again follow-ing Durbin & Koopman (2012), the smoothfollow-ing equations are collected below.

rt−1 = Zt0F −1 t vt+ L0trt Mt−1= Zt0F −1 t Zt+ L0tMt0Lt ˆ αt= at+ Ptrt−1 P Pt= Pt− PtMt−1Pt (1.4)

where ˆαt = E[αt|y1:T] is the smoothed state vector, P Pt = V[ˆαt|y1:T] is its

(15)

after time t specified by rt= Zt+10 F −1 t+1vt+1+ L0t+1Z 0 t+2F −1 t+2vt+2+ · · · + L0tL 0 t+1· · · L 0 T −1Z 0 TF −1 T vT

with variance Mt = V[rt]. As there are no innovations available after time T ,

rT = MT = 0. Finally, Lt= Tt− KtZt.

1.2.3 Treating missing values

One of the great advantages of Kalman filtering techniques in our application is their ability to easily cope with missing observations as shown by Durbin & Koopman (2012). Let y∗_t be a vector of available observations in yt. Then

we can define a matrix Wt such that yt∗ = Wtyt. When we apply the same

transformation to Zt, t and Ht in a straightforward manner, we arrive at a

modified observation equation:

y_t∗ = Z_t∗αt+ ∗t, t ∼ N (0, Ht∗) (1.5)

where Z_t∗ = WtZt, ∗t = Wtt and Ht∗ = WtHtWt0. Utilization of the modified

parameters in the Kalman filter and smoother then does not affect the validity of the results.

(16)

Chapter 2 Dynamic Factor Model

The main idea behind dynamic factor models follows from the assumption that a bulk of co-movements among the numerous macroeconomic variables can be captured by a handful of factors, which can be used to form a parsimonious model. Based on Giannone et al. (2005) this assumption seems to be realistic in the United States. Sargent et al. (1977) show that already two factors are capa-ble of capturing a major part of the variance of important U.S. macroeconomic variables. The dynamic factor model can be specified as follows:

xt= µ + Λft+ et, et∼ N (0, R) (2.1)

where xt is an n-dimensional vector of variables transformed to ensure

station-arity and ft is a K × 1 vector of unobserved factors. The n × K matrix of

parameters Λ is usually referred to as a matrix of factor loadings and Λifit is

sometimes called the common component of the i-th series. The idiosyncratic term et= [e1,t, . . . , en,t]0 is assumed to be normally distributed and uncorrelated

with the factors at all leads and lags, i.e. E[etft−l0 ] = 0 for all l. Moreover, under

a so-called exact factor model, et is assumed to be cross-sectionally

uncorre-lated, i.e. E[eitejs] = 0 for all s, i 6= j.

Nevertheless, Stock & Watson (2002a) and Doz et al. (2012) argue that the last assumption on cross-correlation of et may be too restrictive, especially in

the case of large models. Chamberlain & Rothschild (1984) refer to models with weakly cross-correlated idiosyncratic terms as approximate factor models. However, Doz et al. (2011) show that for T, n → ∞ a quasi-maximum likeli-hood estimation based on a missspecified exact factor model is valid also for the approximate factor model. Hence, the estimators used in the EM algorithm

(17)

are asymptotically valid in either case.

Equation 2.1 forms a so-called static model, which does not take into ac-count any dynamics in the factors. The static factor model is considered, inter alia, in the studies of Rubin & Thayer (1982) and Stock & Watson (2002b) who develop their own EM algorithms. However, as noted by Ba´nbura & Modugno (2014), taking the likely dynamics of the factors into account may be of great importance, especially in applications concerning forecasting or missing data, as in our case. Therefore, we allow the factors ft to follow a VAR process

of order p. Hence, we arrive at the model considered by the EM algorithm proposed by Ba´nbura & Modugno (2014).

ft= A1ft−1+ A2ft−2+ · · · + Apft−p+ ut, ut∼ N (0, Q) (2.2)

where A1, . . . , Ap are r × r matrices of autoregressive coefficients. In our

ap-plication we assume a VAR(1) process following Ba´nbura & Modugno (2014) and we denote the matrix of coefficients A.

As noted by Stock & Watson (2002b), for the purposes of forecasting it is also useful to model the idiosyncratic dynamics. Ba´nbura & Modugno (2014) point out that such an approach enables us to estimate the error component and improve the efficiency of the factor estimates in real-time applications with missing data at the end of the sample. They propose to model the error terms as an AR(1) process using the following decomposition:

ei,t = ¯ei,t+ ξi,t, ξi,t ∼ N (0, κ) (2.3)

¯

ei,t = αie¯i,t−1+ wi,t, wi,t ∼ N (0, σ2i) (2.4)

Both ξt = [ξ1,t, . . . , ξn,t]0 and ¯et = [¯e1,t, . . . , ¯en,t]0 are assumed to be

cross-sectionally uncorrelated and κ is a very small number. e¯i,t is then moved

to the state vector along with the factors which results in the following state space representation:

xt= µ + ¯Λ ¯ft+ ξt ξt ∼ N (0, ¯R)

¯

ft = ¯A ¯ft−1+ ¯ut u¯t∼ N (0, ¯Q)

(18)

2. Dynamic Factor Model 9 where ¯ ft = ft ¯ et ! , u¯t= ut wt ! , Λ =¯ Λ In , R = diag(κ)¯ ¯ A = A 0 0 diag(α1, . . . , αn) ! , ¯Q = Q 0 0 diag(σ1, . . . , σn) !

According to the survey by Stock & Watson (2011), there are basically three generations of estimation approaches to dynamic factor models. The first generation utilized low-dimensional parametric models estimated by Gaussian maximum likelihood and the Kalman filter. This method could yield optimal estimates of Ft, but the necessary non-linear optimization significantly reduces

the number of series that can be handled, which is a serious drawback because, as pointed out by Bernanke & Boivin (2003), policymakers usually work with a large number of different series. Studies dealing with this generation of fac-tor models include, inter alia, Engle & Watson (1981; 1983); Stock & Watson (1989); Sargent (1989); Quah & Sargent (1993); Mariano & Murasawa (2003); Camacho & Perez-Quiros (2010).

The second generation relies on non-parametric techniques like principal components and related methods. Results from these models suggest that large cross-sections can be successfully summarized by few common factors and these can be estimated precisely enough to be treated as data in subsequent regressions (see e.g. Stock & Watson (2002a); Connor & Korajczyk (1986); Chamberlain & Rothschild (1984); Bai (2003); Bai & Ng (2006)). However, principal components are unable to deal with data irregularities.

The third generation then combines the previous approaches and applies the ML estimation of the model parameters using the non-parametric estimates of the common factors, which solves the dimensionality problem of the first gen-eration approaches and utilization of state space methods offers solutions to the problem of missing observations (see e.g. Rünstler et al. (2009); Giannone et al. (2008); Bańbura & Rünstler (2011)). The asymptotic properties of the resulting estimators are provided by Doz et al. (2011).

(19)

The most recent applications of dynamic factor models then make use of the EM algorithm due to its efficiency, ability to handle arbitrary pattern of missing data, a possibility to conveniently impose restrictions on parameters and its robustness to misspecification. This automated approach has been used e.g in Bańbura et al. (2013); Bańbura & Modugno (2014); Schumacher & Breitung (2008); Rusnák (2013) and the asymptotic theory is addressed by Doz et al. (2012). This most recent approach is utilized also in this thesis.

2.1 Dealing with mixed frequency

One of the obvious difficulties associated with modelling GDP is that it is announced at quarterly frequency unlike most of the predictors which are usu-ally available on a monthly basis. We proceed with a solution suggested by the recent papers using the EM algorithm (Bańbura et al. (2013); Bańbura & Modugno (2014); Schumacher & Breitung (2008); Rusnák (2013).

We follow the convention of assigning the quarterly level of GDP (GDP_tQ) to the last month of the given quarter and define its monthly counterparts (GDP_tQ) in the following manner.

GDP_tQ= GDP_tM + GDP_t−1M + GDP_t−2M , t = 3, 6, 9, . . . (2.6) Furthermore, we define Y_tQ = 100 × log(GDP_tQ) YM t = 100 × log(GDPtM) (2.7)

Then, we construct the corresponding growth rates

y_tQ= Y_tQ− Y_t−3Q y_tM = Y_tM − YM t−1

(2.8)

This results in a partially observed monthly series of GDP growth rates

y_tQ =    Y_tQ− Y_t−3Q t = 3, 6, 9, . . . unobserved otherwise (2.9)

(20)

2. Dynamic Factor Model 11

monthly ones using the approximation proposed by Mariano & Murasawa (2003) yQ_t = Y_tQ− Y_t−3Q ≈ (YM t + Yt−1M + Yt−2M ) − (Yt−3M + Yt−4M + Yt−5M) = yM t + 2yt−1M + 3yMt−2+ 2yt−3M + yt−4M , t = 3, 6, 9, . . . (2.10)

Finally, we assume that the newly created monthly series admits the same factor model as the other monthly variables. The above approximation then implies the following form of the observation equation of the final specification of our state-space model:

˜ xM t ˜ y_tQ ! | {z } ˜ xt = ΛM 0 0 0 0 In−1 0 0 0 0 0 ΛQ 2ΛQ 3ΛQ 2ΛQ ΛQ 0 1 2 3 2 1 ! | {z } ˜ Λ                        ft ft−1 ft−2 ft−3 ft−4 ¯ eM_t ¯ eQ_t ¯ eQ_t−1 ¯ eQ_t−2 ¯ eQ_t−3 ¯ eQ_t−4                        | {z } ˜ ft + ξ M t ξ_tQ ! | {z } ˜ ξt (2.11) The intercept present in (2.5) is taken care of by standardizing the observa-tions as in Ba´nbura & Modugno (2014). Hence, ˜xM

t denotes the standardized

monthly variables, forming the first n − 1 series in our dataset, and the GDP series ˜y_tQ is the standardized GDP series obtained from quarterly data. Fur-thermore, ΛM _{and Λ}Q _{are the factor loadings for the monthly and quarterly}

variables, respectively. ¯eM t , ¯e

Q

t , ξtM and ξ Q

t are terms resulting from the

de-composition of the idiosyncratic terms for the monthly and quarterly variables, respectively, corresponding to (2.3). Hence, ξ_tM and ξ_tQ have small fixed vari-ance.

The transition matrix taking into account the structure of the state vector implied by (2.11) and the autoregressive process of the error terms as in (2.4)

(21)

is specified as below. αM _{is a matrix with n − 1 autoregressive coefficients}

corresponding to the monthly variables on the diagonal and αQ _{is the}

autore-gressive parameter for the GDP series. wM

t and w Q

t are normally distributed

idiosyncratic terms with variances diag(σM

1 , . . . , σn−1M ) and σQ, respectively.                        ft ft−1 ft−2 ft−3 ft−4 ˆ eM_t ¯ eQ_t ¯ eQ_t−1 ¯ eQ_t−2 ¯ eQ_t−3 ¯ eQ_t−4                        | {z } ˜ ft =                        A 0 0 0 0 0 0 0 0 0 0 Ir 0 0 0 0 0 0 0 0 0 0 0 Ir 0 0 0 0 0 0 0 0 0 0 0 Ir 0 0 0 0 0 0 0 0 0 0 0 Ir 0 0 0 0 0 0 0 0 0 0 0 0 αM 0 0 0 0 0 0 0 0 0 0 0 αQ 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0                        | {z } ˜ A                        ft−1 ft−2 ft−3 ft−4 ft−5 ¯ eM_t−1 ¯ eQ_t−1 ¯ eQ_t−2 ¯ eQ_t−3 ¯ eQ_t−4 ¯ eQ_t−5                        | {z } ˜ ft−1 +                        ut 0 0 0 0 wM_t w_tQ 0 0 0 0                        | {z } ˜ ut (2.12) Finally, let use denote the variances of the error terms from the above state-space representation as follows

˜ ξt ∼ N (0, ˜R), ˜R = diag(κ) ˜ ut∼ N (0, ˜Q), ˜Q =          Q · · · 0 .. . . .. ... .. . diag(σM₁ , . . . , σ_n−1M , σQ) ... .. . . .. ... 0 · · · 0         

2.2 EM algorithm

In this section, we will introduce the individual steps of the applied procedure based on the EM algorithm initially developed by Dempster et al. (1977) as a general iterative method solving the problems where missing or latent data cause an intractability of the likelihood. The first step is to write the log-likelihood in terms of the missing data and latent factors as if those were

(22)

known. Following Ba´nbura & Modugno (2014), based on the model (2.5) and the assumption of idiosyncratic components following an AR(1) process, the joint log-likelihood for data ˜X and latent factors ˜F = ( ˜f1, . . . , ˜fT) is derived

below. The parameters are collected in θ = ( ˜Λ, ˜A, ˜Q), ˜R = diag(κ) with κ very small and ˜Σ being a covariance matrix of the initial state ˜f1. We exploit the

form used, inter alia, in Engle & Watson (1983) which was shown to hold by Schweppe (1965). Following Ghahramani & Hinton (1996), we can write the conditional probability densities for the data ˜xt and the states ˜ft

p(˜xt| ˜ft) = exp −1 2(˜xt− ˜Λ ˜ft) 0_˜ R−1(˜xt− ˜Λ ˜ft) (2π)−n2| ˜R|− 1 2 p( ˜ft| ˜ft−1) = exp −1 2( ˜ft− ˜A ˜ft−1) 0_˜ Q−1( ˜ft− ˜A ˜ft−1) (2π)−5k+n+42 | ˜Q|− 1 2

Finally, we also assume Normal density for the initial state ˜f1

p( ˜f1) = exp −1 2 ˜ f₁0Σ˜−1f˜1 (2π)−5k+n+42 | ˜Σ|− 1 2

The joint likelihood then follows from Markov property

p( ˜X, ˜F ) = p( ˜ft) T Y t=2 p( ˜ft| ˜ft−1) T Y t=1 p(xt| ˜ft)) (2.13)

Taking logarithms of Equation 2.13 then yields the joint log-likelihood

l( ˜X, ˜F , θ) = const −1 2log | ˜Σ| − 1 2 ˜_f0 1Σ˜ −1_˜ f1 − T − 1 2 log | ˜Q| − 1 2tr ˜ Q−1 T X t=2 ( ˜ft− Ã ˜ft−1)( ˜ft− Ã ˜ft−1)0 ! − T 2 log | ¯R| − 1 2tr ˜ R−1 T X t=1 (xt− ˜Λ ˜ft)(xt− ˜Λ ˜ft)0 ! (2.14) A straightfoward maximization of the log-likelihood with respect to individ-ual parameters yields the estimators of the EM algorithm. For derivations, see the appendix of Bańbura & Modugno (2014). The specific expressions for the estimators are given later as we first state the iterations forming the EM algo-rithm. Until convergence, it iterates between Expectation- and Maximization-steps:

(23)

In the E-step, the expectation of the likelihood is calculated using the parameter estimates from the previous iteration, i.e. we calculate the expected values of the states in our state-space model given by (2.11) and (2.12).

In the M-step, we maximize the expected likelihood obtained from the E-step to arrive at parameter estimates.

The two steps are iterated until convergence defined by the stopping rule pro-posed by Doz et al. (2012). Let l(Ω, θ(j)) denote the log-likelihood of the available data Ω using the parameters from the j-th step and define a vari-able indicating a magnitude of an increase in the likelihood compared to the previous iteration, i.e.

cj =

l(Ω, θ(j)) − l(Ω, θ(j − 1))

1

2(|l(Ω, θ(j))| + |l(Ω, θ(j − 1))|)

The algorithm is stopped after the iteration J when cJ < 10−4.

The following subsections describe in detail the initialization of the whole procedure and the techniques used in the two steps.

2.2.1 M-step

In this subsection, we give the expressions for the parameters forming the model given by (2.11) and (2.12). They were derived from the likelihood (2.14). For derivations, see the appendix of Ba´nbura & Modugno (2014). The estimators are for the iteration (j + 1) and they make use of the available data Ω and the knowledge of parameter estimates from the previous iteration j, i.e. ˆθ(j). Following Ba´nbura & Modugno (2014) we introduce selection matrices WM t

and W_tQ similar to the one used to give (1.5), i.e. with zeros corresponding to missing values. As our only quarterly variable is GDP, W_tQ is scalar, i.e. either 0 or 1.

(24)

The matrix of loadings for the monthly variables is given by:

vec(ΛM(j + 1)) = T X t=1 E_θ(j)ˆ [ftft0|Ω] ⊗ WtM !−1 vec T X t=1 W_tMx˜M_t _E_θ(j)ˆ [ft0|Ω] − W M t E_θ(j)ˆ [¯eMt f 0 t|Ω] ! (2.15)

Note that we take into account that the residuals are also modelled and are part of the state vector. Hence, the loadings estimator basically treats ˜xM

t − ¯eMt as

the dependent variable. The sign of the last term in the expression is negative unlike in Ba´nbura & Modugno (2014). To show the correctness of the above estimator we provide its derivation in the appendix.

The approximation (2.10) and the consequent observation equation (2.11) im-ply that ˆy_tQ depends also on the first 4 lags of the factors and residuals. There-fore, we define ˜f_tQ = (f_t0, f_t−10 , f_t−20 , f_t−30 , f_t−40 , ¯eQ_t, ¯eQ_t−1, ¯eQ_t−2, ¯eQ_t−3, ¯eQ_t−4)0. Conse-quently, the unrestricted ˜Λur

Q, can be written as vec( ˜Λur_Q(j + 1)) = T X t=1 E_θ(j)ˆ [ ˜ftQf˜ Q0 t |Ω]W Q t !−1 _T X t=1 W_tQxˆQ_t _E_θ(j)ˆ [ ˜ftQ|Ω] (2.16)

To impose the desired structure of the parameters ˜

ΛQ = (ΛQ2ΛQ 3ΛQ 2ΛQΛQ1 2 3 2 1)

corresponding to the approximation (2.10), we construct a matrix HΛ and a

vector κΛ applying the restrictions such that HΛvec( ˜ΛQ) = κΛ and

HΛ =            Ir − 1 2Ir 0 0 0 0 Ir 0 − 1 3Ir 0 0 0 Ir 0 0 − 1 2Ir 0 0 Ir 0 0 0 −Ir 0 0 0 0 0 0 I5            κΛ =                 0 .. . 0 1 2 3 2 1                 (2.17)

(25)

the effect of residuals implied by (2.11), unlike the estimators for monthly loadings. Moreover, the alternative estimator given by Ba´nbura & Modugno (2014) seems to contain typos. Therefore, we consult Wu et al. (1996) and Bork (2010) to obtain a correct expression for the estimator. A convincing derivation is provided by Bork (2010). Denoting C =

T P t=1E ˆ θ(j)[ ˜f Q t f˜ Q0 t |Ω]W Q t ,

we can write the restricted estimator of non-zero elements of the observation matrix for ˆxQ_t as follows

vec( ˜ΛQ(j + 1)) = vec( ˜ΛurQ(j + 1)) + C −1 H_Λ0(HΛC−1HΛ0) −1 κΛ− HΛΛ˜urQ(j + 1) (2.18) The variance terms present in the expressions provided by Wu et al. (1996) and Bork (2010) cancel out in our case as we model only single quarterly variable and, hence, the variance is scalar.

Furthermore, we specify the estimator for the coefficient matrix A from the VAR of the factors (2.2):

A(j + 1) = T X t=2 E_θ(j)ˆ [ftft−10 |Ω] ! _T X t=2 E_θ(j)ˆ [ft−1ft−10 |Ω] !−1 (2.19)

The covariance matrix from the factor model can be estimated as:

Q(j + 1) = 1 T − 1 T X t=2 E_θ(j)ˆ [ftft0|Ω] − A(j + 1) T X t=2 E_θ(j)ˆ [ft−1ft0|Ω] ! (2.20)

Regarding the AR representation of the idiosyncratic terms (2.4), we can write the estimators of the autoregressive parameters and the variances of the error terms, respectively, for series i = 1, . . . , n as follows:

αi(j + 1) = T X t=2 E_θ(j)ˆ [¯ei,t¯ei,t−1|Ω] ! _T X t=2 E_θ(j)ˆ [(¯ei,t−1)2|Ω] !−1 (2.21) σi(j + 1) = 1 T − 1 T X t=2 E_θ(j)ˆ [(¯ei,t)2|Ω] − αi(j + 1) T X t=2 E_θ(j)ˆ [¯ei,t−1e¯i,t|Ω] ! (2.22) All variance estimators are obtained by maximizing the log-likelihood with θ = ( ˜Λ(j + 1), ˜A(j + 1), ˜Q), i.e. we first estimate the parameter matrices and only after that the variance. Finally, the variance of the error terms from the

(26)

observation equation (2.10) is specified by setting κ = 0.0001, following a hint from Marta Ba´nbura. The necessary conditional expectations are provided by the Kalman smoother in the E-step.

2.2.2 Initialization and the E-step

The initialization of the algorithm is based on the two-step approach proposed by Giannone et al. (2008). Following Rusn´ak (2013); Giannone et al. (2008) we select the balanced part of the dataset and then perform the first E-step and calculate our first estimates of the latent factors using principal compo-nents. Ba´nbura & Modugno (2014) perform the initialization on the dataset with missing values replaced by draws from the standard normal distribution. Nevertheless, they apply the method to the general pattern of missing data, whereas we have a balanced block of data available. Therefore, we base our analysis only on the available data and avoid any exogenous effects.

Principal Components

The principal component analysis is applied to the model of the form (2.1), only with standardized data ˜Xt, hence with no intercept. Our goal is to arrive

at the estimates ˆft of the r latent factors ft. Stock & Watson (2011) motivate

the use of principal components by showing that in a weighted average of ˜Xt

idiosyncratic noise converges to zero and only linear combinations of the factors remain. For that, only relatively mild conditions are needed. In particular, these are:

1 nΛ

0

Λ → DΛ, where DΛ is an r × r matrix with full rank (2.23)

maxeval (Σe) ≤ c < ∞ , ∀n, where maxeval (Σe) is the largest

eigenvalue of Σe = Eete0t

(2.24)

Condition (2.23) ensures that the factors affect most of the series in ˜Xt and

condition (2.24) limits the cross-correlation among the error terms.

Hence, we attempt to find the factors as weighted averages of data

ˆ ft 1 nW = 1 nW 0 _˜ Xt

(27)

where W is a weighting matrix normalized as 1 nW 0_{W = I} n. If 1 nW

0_{Λ → H where H is a r × r matrix with full rank and both conditions}

(2.23) and (2.24) hold we can show the consistency of ˆft

1 nW as follows ˆ ft 1 nW = 1 nW 0 (Λft+ et) p → Hft (2.25) because 1 nW 0_e t p

→ 0 by the law of large numbers. As H has full rank, ˆft

1 nW

consistently estimates the space spanned by the factors.

The principal components method offers a way to find W using an approach equivalent to the least squares estimation:

min f 1,...,fT,Λ 1 T n( ˜Xt− Λft) 0 ( ˜Xt− Λft) (2.26)

We can concentrate ft out by defining its least squares estimate given Λ, i.e.

ˆ

ft = (Λ0Λ)−1Λ0X˜t (2.27)

The objective function can then be rewritten as

min Λ 1 T T X t=1 ˜ X_t0(In− Λ(Λ0Λ)−1Λ) ˜Xt (2.28)

This problem is equivalent to

max Λ tr Λ(Λ 0_Λ)−1₂_Λ0 1 T T X t=1 ˜ XtX˜t0 ! Λ(Λ0Λ)−12 ! (2.29)

which can be further simplified to

max Λ Λ 0 1 T T X t=1 ˜ XtX˜t0 ! Λ (2.30)

With Λ subject to the normalization 1 nΛ

0_{Λ = I}

r, we can find ˆΛ as the scaled

eigenvectors of ˆΣXX = 1 T T P t=1 ˜

(28)

Con-2. Dynamic Factor Model 19

sequently, the first r principal components of ˜Xt are given by ˆft=

1 n

ˆ

Λ0X˜t, and

these are our initial estimates of the factors, i.e. the first E-step is made. Following E-steps then use parameter estimates from the previous M-step and factor estimates for t = 1 as inputs for Kalman filter and smoother, which yield improved estimates of the factors.

Number of Factors

One issue related to the principal components analysis described above still needs to be addressed. As we can estimate as many factors as there are variables and use as little as one single factor, we have to decide on the number of factors r used in the estimation. The researcher seem to apply quite a wide range of values. For instance, Bernanke et al. (2005),Hallin & Liˇska (2007) and Bai & Ng (2007) use 4 factors, Giannone et al. (2002) chooses two factors, Stock & Watson (2005b) opts for 7 and 9 factors, whereas Bork (2009) and Houssa et al. (2008) utilize 8 and 9 factors, respectively. A formal approach to estimate the number of factors has been developed by Bai & Ng (2002). They specify two possible information criteria to be minimized with respect to the estimated number of factors k. P C(k) = Vk+ kg(n, T ) (2.31) IC(k) = log(Vk) + kg(n, T ) (2.32) where Vk = 1 T n( ˜Xt− Λft) 0 ( ˜Xt− Λft)

is the criterion minimized also during the estimation and g(n, T ) is a term penalizing large values of k given by

g(n, T ) = log(min(n, T ))n + T nT

Bai & Ng (2002) also show that minimization of both criteria leads to a consis-tent estimate of the number of factors unlike the older methods based on AIC or BIC.

(29)

E-step

The factors obtained using principal components are used to estimate the initial parameters of the state space model and to provide the initial state for the Kalman filter. The E-step basically calculates the improved expectation of the likelihood (2.14) and provides the conditional expectations essential for the calculation of the M-step estimators. We will illustrate how this is done on an example of one conditional expectation. All the others are obtained analogically or are subsets of this one. We start with a basic identity for an expectation of a product.

E_θ(j)ˆ [ ˜ftf˜t0|Ω] = E_θ(j)ˆ [ ˜ft|Ω]E_θ(j)ˆ [ ˜ft0|Ω] + V_θ(j)ˆ [ ˜ft|Ω]

All the necessary terms are provided by Kalman smoother (1.4) denoted as ˆαt

and P Pt. Nevertheless, we need an additional recursion for moments involving

covariance of ˜ftand ˜ft−1, i.e. for estimates of the parameters from the transition

equation. This is taken from Wu et al. (1996) and is given by:

Pt,t−1 = (I − PtMt−1)Lt−1Pt (2.33)

The recursions (1.4) together with (2.33) were proposed by De Jong (1989) and they should provide us with more computationally efficient calculations compared to the version of smoother used earlier by Shumway & Stoffer (1982). With all the necessary moments available, the estimators in the E-step can be calculated easily.

(30)

Chapter 3 Bayesian Structural Time-Series

The alternative approach to nowcasting economic time-series using many pre-dictors has recently been proposed by Scott & Varian (2013; 2014), who use Google Search queries to nowcast economic variables, for which the data are published on a less timely basis. The dimensionality problem is not solved by capturing the variance in the predictors using few variables, like in the case of the dynamic factor model, but using variable selection and then averaging the outcomes of numerous parsimonious models. The method is referred to as Bayesian Structural Time-Series (BSTS) and it combines three techniques commonly found in the literature:

Structural model for trend and seasonality estimated using Kalman filter Spike and slab regression for variable selection

Bayesian model averaging

Kalman filtering techniques as well as other components of BSTS have useful Bayesian interpretations, so they can form an integrated system.

3.1 Structural Time-Series model

To model the dynamics of the dependent variable we utilize a structural time-series model. In particular, following Scott & Varian (2013; 2014) we utilize a so-called local linear trend model which ”whitens” the series by specifying an AR(1) processes for the level and the trend of the series. The model is then augmented by a regression component which attempts to explain the remaining variation in the dependent variable using available predictors. Hence, the model is given by the following expressions:

(31)

Observation equation: yM

lvl,t = µt+ mt+ εt = level + regression

State 1: µt= µt−1+ γt−1+ νt= random walk + trend

State 2: γt= γt−1+ ζt= random walk for trend

State 3: mt= β0xMt = regression

y_lvl,tM denotes the dependent variable. In our case unobserved monthly GDP, µt

and γt are a level and a trend, respectively, and εt, νt and ζt are idiosyncratic

error terms.

The model can also easily be written in a familiar state-space form as below. Scott & Varian (2013; 2014) propose adding a regression component by ex-tending the state vector with a constant 1 and adding the parameters to the observation matrix, as such an augmentation does not significantly affect the computational efficiency of the Kalman filter. The constant in the state vector then implies the third line of the transition equation to be simply 1 = 1.

yM_lvl,t= 1 0 β0xM_t | {z } Zt    µt γt 1    | {z } αt +εt (3.1)    µt γt 1    | {z } αt =    1 1 0 0 1 0 0 0 1    | {z } Tt    µt−1 γt−1 1    | {z } αt−1 +    1 0 0 0 1 0 0 0 1    | {z } N    νt ζt 0    | {z } ϑt (3.2)

3.1.1 Dealing with mixed frequency

We follow the convention of assigning the quarterly value of GDP to the last month of the quarter and create a partially observed monthly series of GDP. Hence, we are dealing with a temporal aggregation problem. Unlike in case of the dynamic factor model, we work with the levels of GDP and not with growth rates. One of the solutions to the temporal aggregation problem was proposed by Harvey (1990). The quarterly level of GDP can be decomposed as a sum of its monthly counterparts as shown by (2.6). Following the hint from Steve Scott we construct a new variable sometimes referred to as the Harvey cumulator as follows:

(32)

3. Bayesian Structural Time-Series 23 yC_t+1= ψtytC+ y M lvl,t (3.3) where ψt=    0 for t = δ(τ − 1) + 1, τ = 1, . . . , T /δ 1 otherwise where yC

t is the cumulative amount produced in the given quarter up-to and

including the month t, τ specifies a quarter and δ indicates the number of fine intervals in one coarse interval, i.e. δ = 3. Hence, ψt is zero for the first month

of the quarter and equal to one, otherwise. Consequently, in the first month of the quarter the cumulator variable is equal to the monthly value, for the second month it equals the sum of the previous month and the current month, and for the third and final month of the given quarter it returns the aggregated quarterly value, i.e.

yC_3l+1= y_lvl,3l+1M

yC_3l+2= y_lvl,3l+1M + y_lvl,3l+2M l = 0, 1, . . . , T /δ − 3 yC_3l+3= y_lvl,3l+1M + y_lvl,3l+2M + y_lvl,3l+1M

Then, we augment the state vector by the cumulator variable yC

t defined by

(3.3). As we can see above, it consists of unobserved monthly GDP levels yM_lvl,t specified by (3.1). These can be linked to the inital states as follows:

yM_lvl,t= Zt(Ttαt−1+ N ϑt)

| {z }

αt

+ε (3.4)

Additionally, we use yC

t as our new dependent variable. It follows that the

observation equation of the new state-space representation of the model for yC_t is given by: y_tC =0 0 0 1 | {z } ZC t       µt γt 1 yC t       | {z } αC t (3.5)

(33)

The transition equation for the augmented state vector αC

t then follows from

(3.1),(3.2),(3.3) and (3.4) and can be written as follows:       µt γt 1 yC t       | {z } αC t = Tt 0 ZtTt ψt ! | {z } TC t       µt−1 γt−1 1 yC t−1       | {z } αC t−1 +       νt ζt 0 Ztϑ + εt       | {z } ϑC t (3.6)

Hence, plugging the expressions for Zt and Tt, observation and transition

ma-trices from the state space representation for the monthly model, we arrive at:       µt γt 1 yC t       | {z } αC t =       1 1 0 0 0 1 0 0 0 0 1 0 1 1 β0xM t ψt       | {z } TC t       µt−1 γt−1 1 yC t−1       | {z } αC t−1 +       νt ζt 0 νt+ εt       | {z } ϑC t (3.7)

3.1.2 Prior distributions of initial values

To initiliaze the Kalman filter (see section 1.2) we need model matrices ZC t and

TC

t , initial value and variance for the initial state αC1, and variance of the error

terms from both equations. We can see that our observation equation (3.5) does not have an error term, therefore, its variance is zero. Furthermore, elements of ZC

t are known except the regression component, which will be addressed in

the following section. Hence, we need to specify prior distributions for mean and variance of α₁C and for the variance of ϑC_t. Following Scott & Varian (2013; 2014), we specify means to have Normal priors and variances to have inverse Gamma priors, which yield non-negative draws and can be very flat with the right choice of input parameters. These choices are common in the literature (Fahrmeir et al. 2007) for Gaussian likelihood as they enable us to form conjugate priors (see Ra¨ıffa & Schlaifer (1961) and/or Wetherill (1961)). Conjugacy ensures that both prior and posterior distributions are from the same family. In our case, the conjugate distribution is from the normal-inverse-gamma distribution family, i.e. generally the joint prior distribution for mean µ and variance σ2 can be written as:

(34)

3. Bayesian Structural Time-Series 25

The initial level µ1 and y1C both correspond to the level of GDP in the first

month. Consequently, they are set equal to each other and drawn from the prior distribution centered around the first quarterly observation of GDP divided by 3 with variance corresponding to the variance of the GDP series divided by 9, i.e. µ1 = y1C ∼ N 1 3y Q lvl,1, σ2_y 9 (3.8) Furthermore, the initial slope parameter γ1 has a prior distribution centered

around zero, i.e.

γ1 ∼ N 0,σ 2 y 9 (3.9) Finally, the third element of the state vector enabling the inclusion of the re-gression component is fixed.

The vector of idiosyncratic terms ϑC

t follows Normal distribution centered

around zero with variance matrix consisting of the variances of its elements on the diagonal, i.e.

ϑC_t ∼ N (0, H) with H =       σ2_ν 0 0 0 0 σ_ζ2 0 0 0 0 0 0 0 0 0 σ2 ν+ε       (3.10)

Hence, we can obtain H by drawing from prior distribution of its elements. Priors of the first two elements are specified as follows

1 σ2 ν ∼ Γ df1 2 , ss1 2 , 1 σ2 ζ ∼ Γ df2 2 , ss2 2 (3.11)

where Γ(r, s) denotes the Gamma distribution with mean r

s and variance r s2.

The parameters dfm and ssm can be interpreted as prior sample size and prior

sum of squares in the regression m, respectively. According to Scott & Varian (2013; 2014), these can be set by asking the researcher for an expected R2_m from the given regression m and a number of observations worth a weight they wish to give to their expectation (dfm). Then ss_df_mm = (1 − R2m)s2y, where s2y is

the standard deviation of the response. Scaling by s2

y obviously means that

our priors are data-determined. Scott & Varian (2014) admit that this is a violation of Bayesian paradigm, but state that it proved to be useful in their

(35)

applications. As we do not have much information about the right form of the priors we use the default values suggested by Scott & Varian (2014), i.e. R2

m = 0.5, dfm = 0.01.

As both ν and ε are assumed to be independent and normally distributed the variance of their sum equals the sum of their variances. Hence, all we need to do is to draw σ_ε2 from the prior distribution discussed in the section on spike-and-slab regression.

3.1.3 State smoothing

In the preceding subsections, we have arrived at the prior distributions for pa-rameters for the initialization of the Kalman filter. As specified in section 1.2, Kalman filter provides us with mean and variance of αC

t . We assume Normal

distributions for the states, but it would be impetuous to draw individual αC t ’s

from their respective distributions because such an approach would neglect the obvious serial correlation in the states. Therefore, we do not simply use the Kalman filtering techniques described in section 1.2 to calculate our state es-timates. Instead, following Scott & Varian (2013; 2014), we use a simulation smoother developed by Durbin & Koopman (2002) to draw the whole vector αC _{= (α}C0

1 , . . . αC

0

T )

0_{, which takes into account the dynamics in the states.}

We start by drawing the vector of disturbances ϑC _{= (ϑ}C0

1 , . . . , ϑC

0

T ) 0 _from

N (0, IT⊗ H). Then, we sample the initial state αC1 as described above. Hence,

when we obtain β from the spike-and-slab regression we can iterate forward through Equation 3.7 to obtain states and fitted values α+ _{and y}+_,

respec-tively. In our case, of course, the observations are the fourth element of the state vector. Furthermore, we create artificial observations y∗ = yC − y+ _and

use these new data to obtain states α∗. We use forward recursions of Kalman filter, backward recursions of Kalman smoother to obtain rtas defined in (1.4)

and an extra forward recursion derived in Durbin & Koopman (2002) given by

˜

αC_t+1 = T_tCα˜C_t + Hrt (3.12)

The last recursion follows from one of the equations forming the standard dis-turbance smoother proposed by Koopman (1993). In our case it has a simple form because the matrix imposing the structure of the disturbances is an

(36)

iden-3. Bayesian Structural Time-Series 27

tity matrix.

This way we generate random noise with the same covariance as p(αC_|yC₎

because we are using the same state-space model. The final smoothed state vector is then calculated as ˜αC = α++ α∗, i.e. by adding the generated noise to the appropriate mean (Scott & Varian 2014).

3.1.4 Posterior distributions

Following Scott & Varian (2013), we specify the joint posterior distribution of the variances σ_ν2 and σ_ζ2 conditional on the observed states αC as a multiple of two independent Gamma distributions:

p 1 σ2 ν , 1 σ2 ζ ! = Γ df1+ T − 1 2 , SS1 2 Γ df2+ T − 1 2 , SS2 2 (3.13) where SS1 = ss1+ T P t=2 (µt− µt−1− γt−1)2 and SS2 = ss2+ T P t=2 (γt− γt−1)2. The posterior distribution of σ2

ε will be specified in the following section as it is

linked to the regression component of the model.

3.2 Spike and Slab Regression

The spike-and-slab approach is a method for variable selection developed by George & McCulloch (1997) and Madigan & Raftery (1994). Firstly, we define a vector ρ of the same length as a list of eventual regressors with zeros and ones indicating exclusion or inclusion of corresponding predictors, respectively. The remaining variables of interest in the spike-and-slab regression are the vector of parameters β and the variance of the error term σ_ε2.

3.2.1 Prior distributions

The joint spike-and-slab prior can then be specified as

p β, σ_ε2, ρ = p βρ|ρ, σ2ε p σε2|ρ p(ρ) (3.14)

where βρ contains only the parameters selected by ρ, i.e βρ = Dρβ with Dρ

(37)

The marginal distibution p(ρ) provides the spike as it places positive proba-bility mass at zero. p (βρ|ρ, σ2ε) and p (σ2ε|ρ) have Normal and Gamma priors,

respectively, as discussed in the previous section. The input parameters may capture the prior beliefs of the researcher, but often, as in our case, are specified as quite flat and not very informative as we do not have any specific knowledge about them without data.

For each element ρn of the vector ρ we assume a simple Bernoulli distribution

as a prior. Hence, the prior distribution of ρ can be written as a product of Bernoullis p(ρ) = N Y n=1 πρn n (1 − πn)1−ρn (3.15)

Hence, it gives a probability to each of the possible 2N values of ρ. Without any prior knowledge it is reasonable to set all πn’s equal to the same number

π. We utilize the default value suggested by Scott & Varian (2014) and set π = 0.5. Other choices of prior distributions are possible, but, as noted by Scott & Varian (2013), the Markov Chain Monte Carlo (MCMC) algorithm is robust to the specific choice of the prior on ρ.

Furthermore, we specify the slab part of the prior in a way analogical to our approach in the previous section. Therefore,

βρ|σ2ε, ρ ∼ N bρ, σε2 Ω −1 ρ −1 (3.16)

where bρ is a vector of zeros with length equal to the number of ones in ρ

and Ω−1_ρ = DρΩ−1Dρ. It is conventional to assume Ω−1 ∝ X0X, in which

case the prior distribution of β is referred to as Zellner’s g-prior (Zellner 1986; Chipman et al. 2001; Liang et al. 2008). As X_σ02X

ε is a full information matrix for

the ordinal regression, it is reasonable to parametrize as follows Ω−1 = KX_N0X. This way, we place K observations worth of weight to the prior mean b. Scott & Varian (2013; 2014) then note that, with a large number of predictors, there is risk of perfect collinearity among the columns of X, which would cause X0X to be rank-deficient and priors on ρ, β and σ2

ε improper. We guard against this

potential issue by averaging X0X with its diagonal.

Ω−1 = KwX

0_{X + (1 − w)diag(X}0_X)

(38)

3. Bayesian Structural Time-Series 29

We use K = 1 and w = 0.5 following Scott & Varian (2014). Finally, we draw σ2

ε in the same manner as σν2 and σ2ζ in the previous section.

1 σ2 ε |ρ ∼ Γ df3 2 , ss3 2 (3.17)

3.2.2 Posterior distributions

When we combine the prior distributions with the available data, we can arrive at appropriate posterior distributions through a straightforward pro-cess of Bayesian updating (Gelman et al. 2013). While doing so we define y_t∗ = y_tC − µt−1− µ − ψtyt−1C , i.e. we work with the variation remaining after

whitening the series with the Kalman filter, which we try to explain with our regression component. The above definition is implied by the last line of (3.7). Additionally, let y∗ = y∗_1:T. Then, we can specify the posterior distributions for β and σ2

ε as given in Scott & Varian (2013).

βρ|σε2, ρ, y ∗ _{∼ N} ˜ βρ, σ2ε V −1 ρ −1 1 σ2 ε |ρ, y∗ ∼ Γ DF 2 , SSρ 2 (3.18)

with the following sufficient statistics

V_ρ−1 = (X0X)ρ+ Ω−1ρ β˜ρ= Vρ−1 −1 (X_ρ0y∗+ Ω−1_ρ bρ) DF = df3+ T SSρ = ss + y∗0y∗+ b0ρΩ −1 ρ bρ− ˜βρ 0 V_ρ−1β˜ρ (3.19)

Finally, we can analytically marginalize over βρ and

1 σ2 ε to obtain ρ|y∗ ∼ C(y∗) |Ω −1 ρ | 1 2p(ρ) |V−1 ρ | 1 2SS DF 2 −1 ρ (3.20)

where C(y∗) is a normalizing constant depending on y∗.

As shown by Scott & Varian (2013), we can denote the number of included regressors |ρ|, and using the parametrization of Ω−1_ρ implied by Zellner’s g-prior we can derive

|Ω−1 ρ | |V−1 ρ | = K N 1 + K_N !|ρ| (3.21)

and see that (3.21) decreases with |ρ| and, because generally Ω−1_ρ ≤ Ω−1 ρ +

(39)

few predictors and small variance of residuals.

The draws from the above posterior distributions are done using the stochastic search variable selection (SSVS) algorithm proposed by George & McCulloch (1997). Scott & Varian (2014) note that SSVS is basically a Gibbs sampler (Geman & Geman 1984) consisting of subsequent draws of ρ, σ_ε2 and β. All the posterior distributions have closed forms, as shown above, and all the inputs are known at the time when the draws are needed, so the sampling can be carried out easily.

3.3 MCMC algorithm

The prior and posterior distributions derived in the previous sections are drawn from in Markov Chain Monte Carlo algorithm. Following Scott & Varian (2014), we can specify the steps of the MCMC algorithm as follows:

First, we draw from the prior distributions of the parameters ρ, β and variances σ2

ε, σν2 and σζ2 to obtain the matrices forming the model given

by (3.5),(3.7) and (3.10)

In the second step, we use the simulation smoother developed by Durbin & Koopman (2002) to sample the whole set of states ˜αC _{and, hence, take}

into account the serial correlation between αC

t and αCt+1

Conditional on ˜αC and the data we can simulate the error variances σ_ν2 and σ_ζ2 from their posterior distributions

Next, we draw β and σ2

using the SSVS algorithm.

Finally, we calculate the nowcast yC

T by iterating Equation 3.7.

We repeat the above steps 11000 times and discard the first 1000 iterations as burn-in. We use a high number of repetitions to reduce the uncertainty associated with the resulting mean and median of the posterior distribution of y_TC, which we report as the nowcasts produced by BSTS.

(40)

Chapter 4 Data

”One of the challenges we face as policymakers is the availability of data to assess the state of the economy in real time. Many economic data series...are subject to subsequent revisions that can be quite sizable and can alter our perceptions of the economic situation.” James Bullard, President and CEO, Federal Reserve Bank of St. Louis, 2011

To obtain our predictors, we utilize a dataset used in Stock & Watson (2005a) available in the Federal Reserve Economic Data (FRED) database1 _of

the St.Louis FED. To be concrete, in order to obtain vintage data we down-load the data from the Archival Federal Reserve Economic Data (ALFRED) database. The dataset includes 131 macroeconomic variables, which basically correspond to the variables used in all recent studies dealing with forecasting GDP using many predictors. Nevertheless, we do not use all of them. Some of the series completely miss recent observations, e.g. Help-Wanted Advertising In Newspapers for United States is available till 1966, Dividend Yield of Com-mon Stocks On The New York Stock Exchange, Composite Index for United States was calculated only till 1969 etc. Furthermore, we wish to work with a dataset balanced ’at the bottom’, i.e. we use data available starting the second quarter of 1992. This way we obtain a relatively long time-series and at the same time still keep 123 predictors. Therefore, we still have a great number of explanatory variables, but minimize the uncertainty which necessarily follows from incorporating series with many missing observations. Even though the EM algorithm could be able to deal with a large number of missing values, the eventual gain from adding a handful of additional predictors with a lot of

(41)

unobserved values is very questionable.

For both DFM and BSTS, we need to ensure stationarity of the utilized time-series. Therefore, we transform the data following suggestions by Stock & Watson (2005a). Generally, we use first differences of logarithms for real quantity variables, second differences of logarithms for series of prices and first differences of levels for nominal interest rates as is common in the literature. The complete list of the predictors and applied transformations can be exam-ined in the Appendix A.

The data on GDP for the corresponding time-span comes naturally from the same source, the ALFRED database. We aim to nowcast quarterly GDP figures for 2011, 2012 and 2013. We tried to choose as recent a period as possible. Nevertheless, we want to able to compare the resulting nowcasts with the more reliable revised GDP data, which are available at the end of July the following year, when a large annual revision of economic data takes place2. We make a nowcast for data available at every month, so we work with 36 datasets in total. The amount of datasets is naturally a reason why we demonstrate performances of both models on data for only three years.

2_{Our knowledge of publication dates of economic data has been acquired on the}

website of the Bureau for Economic Analysis (BEA) of U.S. Department of Commerce (http://www.bea.gov/index.htm)

(42)

Chapter 5 Empirical Results

In this chapter, we present the results describing the performance of both meth-ods. As BSTS is unable to deal with missing values of predictors on its own, we also calculate the nowcasts based on balanced datasets, which exclude variables available with a lag of more than one month. The data used for these now-casts is, therefore, available during the month after the forecasted quarter but still earlier than the first official GDP estimates are published. The balanced datasets include 112 variables so they still offer a large number of predictors. For the monthly nowcasts, we fill in the missing values using the output of the DFM, which treats both GDP and all the other variables as dependant vari-ables in the Kalman filter and, therefore, calculates nowcasts for all of them.

The performance of both models is evaluated using the root mean square forecast error (RMSFE) computed with respect to the first official GDP esti-mate and the revised estiesti-mate made available after the annual revision in July. The annual revisions often include changes in definitions of the variables in question, which leads to substantial shifts in values of revised data. For exam-ple, the revision in July 2013 included a change in the units of GDP data from chained 2005 dollars to chained 2009 dollars, which made the revised data for 2012 and earlier very different from the real-time observations. This obviously makes any attempts to make forecasts of levels using vintage datasets without time-consuming conversions destined to fail. Therefore, we compare quarterly growth rates implied by our forecasts and official data as in all referenced ar-ticles dealing with the forecasting of GDP.

(43)

require some additional effort. At the first glance, the variance of the MCMC draws might seems as a simple indicator of precision. Nevertheless, at least in our application, the variance are very large and not very informative. This is probably the reason why none of the studies dealing with BSTS utilizes this information.

5.1 Dynamic factor model

If we looked at the fit of the model throughout the whole time-series in a figure, we would see that it (almost) perfectly coincides with the real GDP series. As both lines cannot be distinguished from each other we do not include the figure here. This remarkable fit is, however, implied by the structure of the model as we include both the factors and the residuals in the state vector. The assumed very small variance of the remaining idiosyncratic term is, hence, proved to be justified as the differences between the two series are negligible. How the model performs if the real GDP value is unavailable is yet to be seen.

We perform the nowcasting exercise using two versions of the dynamic factor model. The two versions differ in the assumed number of factors. As mentioned earlier, two information criteria (IC (2.32) and PC (2.31)) are frequently used to determine the correct number of factors. As we have not found any evidence favoring one criterion against the other one, we utilize both. Generally, IC leads to a higher number of factors (in our case 8 or 9) compared to PC (in our case 4). Table C.1 in the Appendix C states the numbers of factors and iterations of EM algorithm used to arrive at the forecasts. Furthermore, the RMSFEs are calculated with respect to the first official estimates and revised estimates of GDP and, as mentioned earlier, they utilize implied quarterly growth rates. In addition to that, we also report forecast errors for estimates made during the first, the second, and the third month of the quarter, respectively. The growth rates used in the calculations are measured in percentage points so that the resulting RMSFEs are not too small and are, therefore, easy to compare. Table 5.1 reports the resulting forecast errors.

We can notice that DFM largely benefits from the additional variables in unbalanced datasets as the forecast errors in any month within a quarter are lower than when the balanced dataset available after the end of the quarter is used. Apparently, the loss of some predictors is not compensated by the

(44)

5. Empirical Results 35

Table 5.1: RMSFEs of DFM nowcasts

IC PC

Compared to: 1st est. revised est. 1st est. revised est. Balanced datasets 0.656 0.767 0.449 0.577 Unbalanced datasets 0.264 0.452 0.327 0.510 1st month 0.314 0.489 0.317 0.515 2nd month 0.271 0.448 0.325 0.494 3rd month 0.190 0.414 0.337 0.518

non-existence of missing values in the datasets as these are seemingly dealt with efficiently in the EM algorithm framework. Furthermore, we can notice that with the lower number of predictors, the version of DFM utilizing a lower number of factors (chosen according to PC) performs better than the larger model. On the other hand, the model with a higher number of factors is able to make the most of the additional data as evidenced also by decreasing forecast errors as more and more missing values at the end of the sample become known in the second and third month of a quarter. Hence, the dynamic factor model based on the IC criterion outperforms the other specification in all cases when unbalanced datasets with a larger set of variables are used. Finally, the nowcasts are much closer to the first official estimates than to the revised figures.

5.2 Bayesian structural time-series

We first provide the reader with an illustration of how important the predic-tors are for the estimation of GDP dynamics and how precise the fit of the model is throughout the whole time-series. Figure 5.1 depicts the fit of the model when the regression component is excluded. It is, therefore, basically an output of the local linear trend model. Figure 5.2 show the dynamics of the regression components and Figure 5.3 provides us with the final output of BSTS. The fit is obtained from the estimation using the latest balanced dataset.

(45)

Figure 5.1: Fit of the local linear trend part of BSTS

We can notice that the model under this specification may lead to correct fore-casts at some points, but it is unable to capture any significant deviations from the overall trend.

(46)

5. Empirical Results 37

On the other hand, Figure 5.2 shows that the regression component moves substantially at times when deviations from the trend take place. This gives us hope that the regression component could be able to mitigate the forecast errors made by the local linear trend model and Figure 5.3 shows that this is, indeed, the case. Consequently, the overall fit of the model (Figure 5.3) seems to be very good.

Figure 5.3: Overall fit of BSTS

Table 5.2 reports the root mean squared forecast errors for the nowcasts in the examined period. As the MCMC algorithm of BSTS provides us with a number of forecasts we take the mean and the median of the simulated values, so we have two alternative estimates available. The nowcasts based on the monthly unbalanced datasets then also depend on the version of DFM utilized to obtain the estimates of the missing values of predictors. The two versions differ in the number of factors used in the state space model as we use the numbers of factors implied by both criteria IC (2.32) and PC (2.31).

As expected, also BSTS performs significantly better when compared to the first official GDP figures. After all, the official estimates likely utilize a quite similar set of information. Furthermore, the differences between forecasts based on means and medians are quite small and also the results for all three utilized types of datasets are comparable. This might hint at a good performance of the DFM used to fill in the missing observations in predictors. On the

(47)

Table 5.2: RMSFEs of BSTS nowcasts

Mean Median

Compared to: 1st est. revised est. 1st est. revised est. Balanced datasets 0.350 0.529 0.345 0.515 Filled using DFM-IC 0.341 0.518 0.340 0.518 1st month 0.333 0.529 0.333 0.528 2nd month 0.333 0.492 0.333 0.494 3rd month 0.357 0.532 0.357 0.530 Filled using DFM-PC 0.338 0.517 0.349 0.531 1st month 0.331 0.527 0.332 0.530 2nd month 0.334 0.497 0.348 0.526 3rd month 0.349 0.526 0.365 0.535

other hand, eventual losses caused by the possible imprecision of the filled-in values may be compensated by the larger number of available predictors in the unbalanced datasets. Finally, our results show that it is possible to arrive at precise nowcasts already during the quarter in question.

5.2.1 Frequently selected predictors

The BSTS framework provides us with a simple way to quickly assess which variables from the extensive dataset are the most important. All we need is to calculate a fraction of MCMC draws, which include the variable in question. Then, we can sort the variables according to their inclusion probabilities and we get a clear picture of which variables are included in majority of the simu-lated parsimonious models. Figure 5.4 depicts the five most frequently included variables from the balanced dataset. The horizontal axis gives inclusion proba-bility and the colors of the bars indicate a sign of the coefficients corresponding to the individual predictors; white color hints at a positive sign whereas black implies a negative effect on GDP.