• No results found

ASSESSING THE RELIABILIY OF HISTORICAL REVISIONS: The Case of the Penn World Tables.

N/A
N/A
Protected

Academic year: 2021

Share "ASSESSING THE RELIABILIY OF HISTORICAL REVISIONS: The Case of the Penn World Tables."

Copied!
34
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

ASSESSING THE RELIABILIY OF HISTORICAL REVISIONS:

The Case of the Penn World Tables.

University of Groningen Faculty of Economics and Business

Master Thesis International Economics and Business

Name Student: Marko Baas Student ID number: 2762773

Student email: m.r.baas@student.rug.nl Date Thesis: 18-06-2019

(2)

Table of Contents

Abstract ... 1

I – Introduction ... 2

II – The PWT: Vintages and Revisions ... 4

Data ... 6

III – Literature Review ... 7

IV – Properties of Historical Revisions ... 8

Methodology... 9

Implementation & Results ... 11

V – State Space Representation ... 15

(3)

Abstract

(4)

I – Introduction

Much of our understanding of the relationships between economic variables such as poverty, productivity and standards of living in an international context hinges on the ability to compare countries over time. One of the main statistics that researchers use to make international comparisons is the gross domestic product (GDP), which measures the value added in an economy in a certain year and thereby allows researchers to evaluate the size of that economy and its growth over time.

The Penn World Table (PWT) has become the most widely used GDP database in empirical research, as about 70% of cross-country empirical research is based on the PWT (Johnson & Papageorgiou, 2018). The PWT makes GDP statistics internationally comparable by using Purchasing Power Parities (PPPs) to convert GDP statistics expressed in national currencies to international dollars. The PPPs applied in the PWT are produced by the World Bank as part of the International Comparisons Program (ICP) (World Bank, 2013). As of the latest vintage, vintage 9.1, the PWT covers 182 countries starting in 19501.

The accuracy of the statistics reported in the PWT and the underlying ICP are, due to the popularity of the PWT in empirical research, of great importance for our understanding of economics and, consequently, for economic policy. Therefore, the PWT and ICP statistics are periodically revised as methodologies are further refined and data is updated. Where older versions focused primarily on standards of living, starting with vintage 8.0 the PWT also reports output-side GDP, which measures the productive capacity of an economy. Vintage 8.0 can thus be considered the start of a new generation of the PWT (Feenstra et al, 2015a). Despite all of the methodological improvements, however, the impact of revisions on reported GDP statistics remain large.

To see this, consider 𝑌𝑌𝑣𝑣,𝑖𝑖,𝑡𝑡 as a generic representation of the level of output-side real GDP per

capita (henceforth referred to as RGDPO) in vintage 𝑣𝑣 = 1,2, … , 𝑉𝑉 for individual 𝑖𝑖 = 1,2, … , 𝑁𝑁 at time 𝑡𝑡 = 1,2, … , 𝑇𝑇. The size of any cumulative revision can then be defined as ∆𝑌𝑌𝑖𝑖,𝑡𝑡 ≡ �𝑌𝑌𝑉𝑉,𝑖𝑖,𝑡𝑡−𝑌𝑌𝑣𝑣,𝑖𝑖,𝑡𝑡

𝑌𝑌𝑣𝑣,𝑖𝑖,𝑡𝑡 , with , with 𝑉𝑉 representing vintage 9.1 (with ∆𝑌𝑌𝑖𝑖,𝑡𝑡 > 0 representing upward

revisions and vice versa for downward revisions)2. Figure 1 plots the Mean Absolute Revisions

(MARs), and Mean Revisions (MRs) for each pair of the last three vintages (that is, vintage 8.1, 9.0 and 9.1) for the period 1960 – 2011, which is covered by all three vintages. Panel A shows the MARs and MRs only for developing economies, while Panel B shows the same for developed economies3. As can be seen by comparing the Panels, revision sizes differ markedly

between developed- and developing economies, being much larger in general for developing nations. For example, from 1960 to 2005, developing nations were about 25% richer each year in vintage 9.1 than in vintage 8.1, while the difference is only about 15% for developed economies.

1 Early data, especially before 1960, is mostly limited to industrialized economies. 2 The cumulative revision size statistic ∆𝑌𝑌

𝑣𝑣,𝑖𝑖,𝑡𝑡 can then be aggregated across countries by means of Mean Absolute

Revisions (MARs), calculated as ∆𝑌𝑌𝑣𝑣,𝑡𝑡= ∑ �∆𝑌𝑌𝑣𝑣,𝑖𝑖,𝑡𝑡�

𝑁𝑁 𝑖𝑖=0

𝑁𝑁 and Mean Revisions (MRs), calculated as ∆𝑌𝑌𝑣𝑣,𝑡𝑡=

∑𝑁𝑁𝑖𝑖=0∆𝑌𝑌𝑣𝑣,𝑖𝑖,𝑡𝑡

𝑁𝑁 .

By comparing MARs and MRs, one can determine whether the revisions show a systematic pattern of downward or upward revisions over time. Furthermore, by plotting the MARs and MRs in each year, a visible trend pattern of ∆𝑌𝑌𝑣𝑣,𝑡𝑡 over time can be identified.

(5)

Figure 1: Mean Absolute Revisions (MARs) and Mean Revisions (MRs) for each (cumulative) revision

Panel A Panel B

Notes: (1.) This graph shows the unweighted mean absolute revision (MAR) and the unweighted mean revision

(MR) for each pair of PWT vintages since vintage 8.1 for the period 1960 - 2011 (2.) Panel A shows MARs and MRs for developing economies, while Panel B shows them for developed economies. (3.) MAR8191 denotes the mean average revision for the 8.1 to 9.1 cumulative revision. The other abbreviations follow the same notation. (4.) The distinction between developed- and developing economies, as well as the countries included in the sample is according to the Taxonomy (Table 5) in the Appendix.

Furthermore, the MARs and MRs are almost equal and generally positive, which tells us that the revisions were almost unambiguously upward. Finally, the period after 2005 is characterized by large revisions for developing economies, while this period did not get revised much differently compared to previous periods in the case of developed economies.

The descriptive statistics discussed so far paint a clear picture: revisions have a significant impact on the levels of RGDPO per capita reported in the PWT. Further, these revisions do not impact every economy equally, as developing nations appear to be more strongly affected. Since the Penn World Table forms the foundation of the majority of cross-country macroeconomic empirical research, it is critical to understand how these revisions affect the general picture of GDP per capita presented in the PWT and, consequently, how empirical results may be affected. Moreover, while it may be natural to assume that researchers should always use the latest vintage, it is not clear whether this is actually the case. The aim of this thesis is to further examine the behavior of these revisions, and to explore the possibility of combining different vintages into a composite GDP database that exploits the available information more optimally than any single vintage by itself.

(6)

if the vintages turn out not to be cointegrated, then it is not clear which vintage should be used in empirical studies, as it is not clear that any vintage definitely outperforms the others and it may be that each vintage contains some information about the true evolution of GDP per capita. This brings into the picture the state-space model which is the second aspect of the methodology employed in this thesis. Since the “true” GDP of a country is unobservable, GDP statistics are always estimations based on a limited sample. As a result, each vintage contains some measurement error (noise) that inevitably arises when one performs inference about a population based on a limited sample, regardless of how good the methodology or data collection is. Furthermore, some vintages may contain more or less of this noise than the other vintages. A state-space model will allow me to identify which vintages are the noisiest, and use this information to create a weighted average of the three vintages, with the weights inversely related to the amount of noise in the vintage. This weighted average will be less noisy than the three individual vintages and more applicable to empirical research. Using weighted series when there is “uncertainty” about the best way to measure a variable is not something unheard of in economics. The Fisher index, a price index that is calculated as the geometric mean of two other indices (Diewert, 1992) is perhaps the best example. The approach devised in this thesis is in the same spirit.

To illustrate the relevance of this approach, I will provide a set of descriptive statistics comparing the weighted average estimated by the state-space model to the latest vintage, vintage 9.1. Due to the reduced amount of noise in the weighted average, I expect it to be significantly different from vintage 9.1 and less variable.

This thesis will be structured as follows: Section II will provide more information on the Penn World Table, what makes each vintage different and the data that will be used throughout this thesis. Section III provides a literature overview of previous research that has been done on the subject of PWT revisions. Section IV describes the cointegration test and its results, while Section V outlines the state-space model. Finally, Section VI will illustrate the impact of the weighted average by analyzing the changes to relative GDP per capita growth. Section VII concludes.

II – The PWT: Vintages and Revisions

The Penn World Table has been a widely applied in macroeconomic research ever since it was first launched about 40 years ago. Besides GDP statistics, the table presents a host of other economic variables, such as employment, human- and physical capital, total factor productivity (TFP) and exchange rates. The PWT is periodically updated in order to stay up to date, to incorporate newly developed methodologies and to fix issues with previous versions. The features of the latest vintages, starting with vintage 8.0, are presented in Table 1.

(7)

Table 1: Vintages of the Penn World Table

Vintage Benchmark Year ICP Phase/Benchmark Countries Countries in the PWT Release Date

8.0 2005 7/146 167 2013

8.1 Corrections to 2005 7/146 167 2015

9 2011 8/199 182 2016

9.1 Corrections to 2011 8/199 182 2019

Notes: (1.) The abbreviations ICP and PWT denote International Comparisons Project and Penn World Table

respectively. (2.) The number of benchmark countries is the number of countries in which price data is collected by the ICP.

Sources: Feenstra et al. (2015a, 2015b, 2016) and Inklaar & Woltjer (2019).

outside the benchmark year4. Most importantly however, vintage 8.0 also introduced an entirely

new estimator of GDP alongside expenditure-side GDP (henceforth, RGDPE) that had always been the focal point of the PWT. While RGDPE per capita could be seen as a measure of standards of living in a particular country, this new measure, called output-side GDP (RGDPO), is interpreted as a measure of the productive capacity of a country. The incorporation of this variable was previously unfeasible as its computation requires quality adjusted prices of exports and imports and the methodologies to do so were previously unavailable (Feenstra et al, 2015a). Soon after the release of ICP 2005 on which vintage 8.0 of the PWT was based, questions about its reliability were raised, as the PPPs in ICP 2005 seemed overvalued in developing nations, making them appear poorer than previously thought (Feenstra et al, 2015b). When ICP 2011 was released and the PPPs in developing nations were revised downwards, it became clear that there was indeed a bias5 in ICP 2005. Inklaar & Rao (2014) therefore constructed a set of

counterfactual PPPs that accounted for this bias and that, while still based on ICP 2005, were methodologically in line with ICP 2011. These counterfactuals were implemented in vintage 8.1 of the PWT6, which also featured the addition of several new variables related to standards

of living (Feenstra et al, 2015b).

The first vintage of the PWT to directly make use of ICP 2011 rather than counterfactuals was vintage 9.07. This round of the ICP featured fifteen countries that had never participated in an

ICP round before, thus increasing the number of countries featured in the PWT. Revised national accounts data also became available, extending the time period covered by the PWT from 1950 - 2011 in the vintages 8.0 and 8.1 to 1950 – 2014 in vintage 9.0. Although methodologically largely equivalent to vintage 8.1 due to the fact that vintage 8.1 used counterfactual PPPs, the shift in reference year still caused a large revision, as all pre-2005

4 See Feenstra et al. (2015a) for a complete overview of the improvements made by PWT vintage 8.0.

5 The bias was mainly caused by the fact that there were goods included in the global basket of goods that were normal goods in developed nations, but high priced luxury goods in developing nations (Feenstra et al, 2016). 6 Note that the PWT has never directly incorporated the PPPs as developed in the ICP. PPPs are constructed using the price data gathered by the ICP, but the PPPs resulting from these prices are computed differently by the PWT than by the World Bank, see Feenstra et al. (2015a).

(8)

values shifted up by 12%. Post-2005 values were also significantly revised, as these were previously extrapolated from ICP 2005 but became interpolated between ICP 2005 and ICP 2011 (Feenstra et al, 2016).

The revision from vintage 9.0 to 9.1 was motivated primarily by new methodologies regarding the measurement of capital that are beyond the scope of this thesis8. However, certain

significant changes to GDP data were also made, as new price data became available for certain former USSR states and 37 European countries. Moreover, national accounts data were revised, which caused especially large revisions for African countries and increased the covered time period to 1950 - 2017 (Inklaar & Woltjers, 2019).

Data

Although it is part of the new generation of the Penn World Table, vintage 8.0 was methodologically flawed. Since newer vintages that are objectively more reliable are now available, vintage 8.0 should no longer be of interest to researchers today. Therefore, this thesis will focus on the last three vintages of the PWT, 8.1, 9.0 and 9.1. These three vintages are methodologically comparable, despite the fact that vintage 8.1 was based on an older ICP version. As output-side GDP is the main feature that sets this new generation of the PWT apart from its predecessors, I will take output-side GDP (RGDPO) per capita as the flagship variable in this thesis9.

As the cointegration analysis applied in Section IV requires the number of time periods to be large relative to the number of countries, it is necessary to drop certain countries for which data is present relatively late, in order to maximize the time dimension. In particular, I will drop all countries that have data starting after 1960, as this is the year in which the data starts for many developing nations, with earlier data mostly exclusively available for developed economies. Also, countries that were included in vintages 9.0 and 9.1 but not in vintage 8.1 have to be excluded as well, as it would not be possible to analyze the revisions for those countries. Imposing these restrictions leaves 107 countries in the sample10. Furthermore, it is necessary

that the years covered in the sample are present in each of the three vintages, so the years 2012 - 2014 had to be dropped from vintage 9.0 and the years 2012-2017 were dropped from vintage 9.1. This leaves 1960-2011 as the total time period under consideration (52 years), with each country in each vintage having data in each year.

One of the main objectives of this papers is to assess whether or not developing nations are affected differently by the revisions than developed economies. In order to be able to do so, I will split the sample into a developed group containing 27 countries and a developing group containing the remaining 80 nations. As the group of developing nations is still large relative to the number of time periods (80 > 52), I have split this group further according to continent. This completes the taxonomy, which is presented in Table 5 in the Appendix. Europe contains four borderline developing nations: Greece, Romania, Malta and Cyprus. In my analyses, I will add these to the developed subsample as a robustness check and call this new group “Developed large”. Moreover, as can be seen in Table 5, data prior to 1960 is available for almost all countries in the developed subsample. Therefore, as another robustness check, I will remove

8 See Inklaar & Woltjers (2019) for an overview of the additions to vintage 9.1. 9 The main results will also be replicated using expenditure-side GDP per capita.

(9)

Hong Kong and Singapore (the only two developed countries with data starting in 1960) from the group to form a new subsample that I call “Developed long”, which extends the time period to 1954 - 2011.

To summarize, the dataset employed in this thesis contains data on output-side GDP per capita from PWT vintages 8.1, 9.0 and 9.1 for 107 countries over the period 1960 to 2011. The countries are divided into subgroups according to development status and continent and the full Taxonomy is presented in Table 5 in the Appendix.

III – Literature Review

Given the popularity of the PWT in academic research, it is not surprising that the impact of PWT revisions on cross-country empirical research has been extensively covered in the literature. PWT revisions are the combination of revisions to national accounts data, updated methodologies and revised prices collected by the ICP. This last component was a major influence on the revision from vintage 8.1 to 9.0, as the base year shifted from 2005 to 2011. Deaton & Aten (2017) find that PPPs in Africa and Asia were overestimated by 20-25%, consequently underestimating the purchasing power in those regions. They suggest that this was caused primarily by the way in which ICP regions were linked to each other. The countries used to do this, so called “ring countries”, were not representative of the regions that they were supposed to represent. Consequently, the goods in the consumption basket that was used to calculate the PPPs was also not representative of the consumption patterns in developing nations, as many goods in the basket were only widely available in developed economies. This linking method was abandoned in ICP 2011 (Deaton & Aten, 2017).

Inklaar & Rao (2017) draw upon the findings of Deaton & Aten (2017) and extend it by delving deeper into the quantitative implications of the differences between ICP 2005 and 2011. They find that the ICP revision implied that the average country suddenly became 24% richer compared to the United States, which caused a major decrease of global inequality. This difference was found to be only 5% if India would be taken as the base country rather than the United States, confirming the bias towards underestimating purchasing power in developing countries in ICP 2005 (Inklaar & Rao, 2017). The same bias was found in other databases that are based on ICP PPPs, such as the GDP database of the World Bank (Ram, 2016) This underestimation of purchasing power in the developing world had immediate implications on the number of people estimated to live in poverty. Kharas & Chandy (2014) estimate that the number of people living in absolute poverty in 2010 was 1,215 million according to extrapolations from ICP 2005. Doing the same estimation using ICP 2011 interpolated to 2010 yielded 570 million people living in poverty, less than half of the number according to ICP 2005.

(10)

inconsistencies do not appear to be resolved by more recent vintages, as shown by Chen et al (2018), who apply functional data analysis to vintages 6.3, 7.1, 8.0 and 8.1 and find that the distribution functions of these vintages are not equal.

If newer does not necessarily mean better, then we may wonder what vintage is the best. The difficulty of determining this stems from the fact that there is no completely independent measure of GDP to compare the PWT to, as other GDP databases are based on many of the same sources of data. Pinkovskiy & Sala-i-Martin (2016) take the nighttime lights measured by satellites as an independent measure of GDP. By comparing nighttime lights to PWT vintages 7.1, 8.0 and 8.1, Pinkovskiy & Sala-i-Martin (2016) come to the surprising conclusion that the vintage 7.1 significantly outperforms vintages 8.0 and 8.1, despite its reliance on national accounts data for growth rates outside of the benchmark year.

The general consensus in the literature is that it matters which vintage is used in empirical research, but also that the newest vintage may not necessarily be the best option. Moreover, the literature confirms the observation in the descriptive statistics that developing economies are more strongly affected by PWT revisions. The revision from ICP 2005 to ICP 2011 appears to have introduced large changes in the way that the historical evolution of GDP per capita is presented in the PWT. However, this has not yet been tested in an econometric framework. Furthermore, as it is unclear which vintage is the best, we may not want to choose only a single vintage in the first place. The possibility of optimally using information embedded in multiple vintages of the PWT by constructing a composite GDP per capita database is another issue that as of yet has not been explored. This thesis aims to fill these gaps in the literature in two ways. First, by testing for cointegration among the ICP 2005 based vintage 8.1 and ICP 2011 based vintages 9.0 and 9.1, I will investigate whether these vintages convey the same story about the evolution of GDP per capita. Secondly, I will explore the possibility of optimally combining the different PWT vintages into a single composite that outperforms any individual vintage by applying a state-space model that can filter out measurement error.

IV – Properties of Historical Revisions

The first main objective of this thesis is to determine whether or not the revisions of the PWT are well behaved. I take well behaved to mean that the different vintages share a common stochastic trend, meaning that they convey the same story about the historical evolution of GDP per capita. The econometric test used to examine whether or not some time series share a common stochastic trend is a test for cointegration. Two series that are integrated of order 1 (I(1)) are said to be cointegrated when there exists a linear combination of the two series that is stationary, or in other words, integrated of order zero (I(0)). These variables are therefore in a long run equilibrium. (Maddala & Kim, 1991). If the different vintages are indeed in such a long run equilibrium and consequently convey the same story about GDP, it should not matter which vintage is used in empirical research. Researchers may therefore be indifferent between the various vintages of the PWT.

(11)

yield two testable hypotheses: firstly, I hypothesize that the three latest vintages are not cointegrated in the case of developing economies. Secondly, I hypothesize that the three latest vintages are cointegrated in the case of developed economies.

Since the PWT includes a time dimension as well as a cross-sectional one, a panel data cointegration test is required to test these hypotheses. Panel data tests have become increasingly popular in recent years and developments in the field are quick11. This is because panel data

tests can exploit not only the information provided by the time dimension as in a pure time series model, but also information embedded in the cross sectional dimension. With this added information, the power of econometric tests can be improved (Breitung & Pesaran, 2009). However, early panel cointegration tests suffered from serious complications. Two issues that are particularly significant when looking at macro-economic data such as the PWT are the issues of cross-sectional dependence and structural breaks. The first panel data cointegration tests were built on the assumption that the individual time series within the panel were independent of each other. Given today’s highly integrated world economy, this assumption is highly likely to be violated in GDP data (Pesaran, 2007). Furthermore, the concept of cointegration does not preclude the possibility that the long run equilibrium relationship between the cointegrated variables, such as the mean and trend, can change over time. Failure to account for such changes, known as structural breaks, can affect the inference of cointegration tests due to misspecification errors (Banerjee & Carrion-i-Silvestre, 2006). Given the long time period (1960 - 2011) under consideration in the PWT, it is likely that one or more structural breaks may have occurred during this time.

In order to reliably test for cointegration of RGDPO per capita in the last three vintages of the PWT, the issues of cross-sectional dependence and structural breaks can clearly not be ignored. Thankfully, recent developments in the field of panel cointegration tests have produced suitable tests that take these issues into account. A particularly powerful cointegration test was developed by Banerjee & Carrion-i-Silvestre (2015). This test allows for the presence of up to two structural breaks that may affect the level or the trend of the cointegrating vector and models cross-sectional dependence as a common factors model. These properties make it highly suitable for my application and therefore I have applied this model. I will first give a short overview of the methodology and then present the results. For a more in-depth discussion of the cointegration test, see Banerjee & Carrion-i-Silvestre (2015).

Methodology

As a starting point, Banerjee & Carrion-i-Silvestre (2015) define the following data generating process that generates 𝑌𝑌𝑖𝑖,𝑡𝑡 = (𝑦𝑦𝑖𝑖,𝑡𝑡, 𝑥𝑥′𝑖𝑖,𝑡𝑡)′, which is an (m × 1) vector of non-stationary stochastic

processes for which we are testing for cointegration (in this case, the different vintages of the PWT): 𝑦𝑦𝑖𝑖,𝑡𝑡 = 𝐷𝐷𝑖𝑖,𝑡𝑡+ 𝑥𝑥′𝑖𝑖,𝑡𝑡𝛿𝛿𝑖𝑖,𝑡𝑡+ 𝑢𝑢𝑖𝑖,𝑡𝑡 (1) 𝑢𝑢𝑖𝑖,𝑡𝑡 = 𝐹𝐹′𝑡𝑡𝜋𝜋𝑖𝑖+ 𝑒𝑒𝑖𝑖,𝑡𝑡 (2) (𝐼𝐼 − 𝐿𝐿)𝐹𝐹𝑡𝑡= 𝐶𝐶(𝐿𝐿)𝑤𝑤𝑡𝑡 (3) (1 − 𝜌𝜌𝑖𝑖𝐿𝐿)𝑒𝑒𝑖𝑖,𝑡𝑡 = 𝐻𝐻𝑖𝑖(𝐿𝐿)𝜀𝜀𝑖𝑖,𝑡𝑡 (4)

(12)

𝑥𝑥𝑖𝑖,𝑡𝑡 = 𝜅𝜅𝑖𝑖 + 𝑥𝑥𝑖𝑖,𝑡𝑡−1+ 𝐺𝐺′𝑡𝑡𝜍𝜍𝑖𝑖 + Ξ𝑖𝑖(𝐿𝐿)𝑣𝑣𝑖𝑖,𝑡𝑡 (5)

𝐺𝐺𝑡𝑡 = Γ(𝐿𝐿)𝜛𝜛𝑡𝑡 (6)

Where i denotes the number of countries, t the number of time periods and the deterministic term 𝐷𝐷𝑖𝑖,𝑡𝑡 is defined as:

𝐷𝐷𝑖𝑖,𝑡𝑡 = 𝜇𝜇𝑖𝑖+ 𝛽𝛽𝑖𝑖𝑡𝑡 + � 𝜃𝜃𝑖𝑖,𝑗𝑗𝐷𝐷𝑈𝑈𝑖𝑖,𝑗𝑗,𝑡𝑡+ � 𝛾𝛾𝑖𝑖,𝑗𝑗𝐷𝐷𝑇𝑇𝑖𝑖,𝑗𝑗,𝑡𝑡 𝑚𝑚𝑖𝑖 𝑗𝑗=1 𝑚𝑚𝑖𝑖 𝑗𝑗=1 (7) Where 𝐷𝐷𝑈𝑈𝑖𝑖,𝑗𝑗,𝑡𝑡 = 1 and 𝐷𝐷𝑇𝑇𝑖𝑖,𝑗𝑗,𝑡𝑡 is equal to the time since the jth structural break and 0 if there has

not been a structural break yet. The cointegrating vector 𝛿𝛿𝑖𝑖,𝑡𝑡 is a function of time and may differ

before and after structural breaks. The common factors are modelled as the (r × 1) vectors 𝐹𝐹′𝑡𝑡

and 𝐺𝐺′𝑡𝑡 (with r denoting the number of common factors) and their loadings as 𝜋𝜋𝑖𝑖 and 𝜍𝜍𝑖𝑖

respectively, with 𝜍𝜍𝑖𝑖 = 0 representing the possibility of cross-section independence or 𝜍𝜍𝑖𝑖 ≠ 0

otherwise. The vector of common factors 𝐹𝐹𝑡𝑡 is not assumed to be I(0), meaning that it may also

capture effects from outside the model, such as omitted variables. Furthermore, the common factors affecting 𝑦𝑦𝑖𝑖,𝑡𝑡 and 𝑥𝑥′𝑖𝑖,𝑡𝑡 are assumed to be different (Banerjee & Carrion-i-Silvestre,

2015).

The framework outlined above is highly general. Structural breaks may affect either the deterministic component or the cointegrating vector. Moreover, breaks affecting the cointegrating vector do not need to occur at the same times as the breaks affecting the deterministic component. Banerjee & Carrion-i-Silvestre (2015) note that estimation of the model in its most general form as shown above requires the timing of the breaks to be known a priori, which is not the case in the context of the PWT. It is therefore necessary to impose certain restrictions on the model. Following Banerjee & Carrion-i-Silvestre (2015), I impose the restriction that the trend is stable, meaning 𝛾𝛾𝑖𝑖,𝑗𝑗 = 0 and 𝛽𝛽𝑖𝑖 ≠ 0 in equation (7). There may still

be multiple heterogeneous structural breaks that may affect the level and the cointegrating vector.

Recall that stochastic variables are cointegrated when there exists a linear combination of the variables that is stationary. In Banerjee & Carrion-i-Silvestre (2015)’s model, this entails testing for stationarity of 𝑒𝑒𝑖𝑖,𝑡𝑡. That is, given the common factors 𝐹𝐹𝑡𝑡 that control for cross-sectional

dependence, the cointegrating vector 𝛿𝛿𝑖𝑖,𝑡𝑡, which may be changed by structural breaks, should

define the linear combination of 𝑦𝑦𝑖𝑖,𝑡𝑡 and 𝑥𝑥𝑖𝑖,𝑡𝑡 that is stationary if they are cointegrated.

Conversely, finding the 𝑒𝑒𝑖𝑖,𝑡𝑡 is I(1) would indicate that there is no cointegration and 𝑦𝑦𝑖𝑖,𝑡𝑡 and 𝑥𝑥𝑖𝑖,𝑡𝑡

are divergent.

Estimation of this model consists of three components: first, the number of common factors, their loadings and the break dates have to be estimated, since these are not known a priori. This is done by writing equations (1) and (2) in first differences and then using OLS and principal components as in Bai & Ng (2004) in an iterative process, using previous estimates of the factors, loadings and break dates to get updated estimates until convergence is achieved12.

(13)

Secondly, Banerjee & Carrion-i-Silvestre (2015)’s model assumes that the idiosyncratic errors 𝜀𝜀𝑖𝑖,𝑡𝑡, 𝑤𝑤𝑡𝑡, 𝑣𝑣𝑖𝑖,𝑡𝑡 and 𝜛𝜛𝑡𝑡 are independent of the factor loadings 𝜋𝜋𝑖𝑖 and 𝜍𝜍𝑖𝑖. If this would not be the

case, the limiting distributions of the statistics would depend on the stochastic regressors and the presence of cross-sectional dependence would disrupt the cointegration test. This assumption must therefore be verified before interpreting the results of the cointegration test, as a rejection of this assumption would mean that the model is misspecified and it would not be possible to interpret the test statistic. Following Banerjee & Carrion-i-Silvestre (2015), I test this assumption of independence using Pesaran’s (2015) weak cross-sectional dependence (WCD) test statistic. The third and final part of the estimation procedure concerns the estimation of the cointegration test statistic. This is done in two parts. First, individual Augmented Dickey Fuller (ADF) tests are used to test for the stationarity of the idiosyncratic error terms 𝑒𝑒𝑖𝑖,𝑡𝑡, given

the common factors and structural breaks13. Then, these individual ADF tests are pooled to

form a panel test statistic 𝑍𝑍𝑗𝑗(𝜆𝜆):

𝑍𝑍𝑗𝑗(𝜆𝜆) = 𝑁𝑁

−12𝑁𝑁 𝐴𝐴𝐷𝐷𝐹𝐹𝑖𝑖,𝑗𝑗(𝜆𝜆) − Θ𝑗𝑗𝑒𝑒(𝜆𝜆)√𝑁𝑁 𝑖𝑖=1

�Ψ𝑗𝑗𝑒𝑒(𝜆𝜆)

⇒ 𝑁𝑁(0,1) (8)

Where Θ𝑗𝑗𝑒𝑒 and Ψ𝑗𝑗𝑒𝑒 are empirical moments of the limiting distributions of the test statistic, which

Banerjee & Carrion-i-Silvestre (2015) have simulated by means of Monte Carlo simulation. The 𝑍𝑍𝑗𝑗(𝜆𝜆) statistic converges to a normal distribution as N, T → ∞ with N/T → 0. Therefore, in

order to be able to interpret the test statistic, it is necessary that N < T. Given that T = 52 in the PWT and the total number of countries in the sample is 107, it is not possible to test for cointegration of the entire panel. Moreover, the developing nations need to be further divided, which I have done according to continent, as described in Section II and shown in the Taxonomy in the Appendix. The test statistic described above operates under the null hypothesis of no cointegration.

Implementation & Results

I have applied the cointegration test as described above to real output-side GDP per capita in the latest three PWT vintages (9.1, 9.0 and 8.1). For each subsample, I have applied two specifications: one that allows for only a single structural break and another that allows up to two structural breaks. The subsamples and time periods are as described in Section II and displayed in the Taxonomy in the Appendix.

I will now turn to the discussion of Table 2, which displays the results of Banerjee & Carrion-i-Silvestre (2015)’s cointegration test. To start, the Table presents Pesaran’s (2015) WCD specification test for different values of autoregressive correction (k). This test statistic follows a normal distribution and has the null hypothesis of independence of the idiosyncratic error terms. The specification test is clearly rejected for all values of k when I allow for only one structural break for all subsamples except South America. Since the results of the cointegration test cannot be interpreted when the specification test is rejected, I have redacted the results in these cases. When allowing for up to two structural breaks, the specification test can no longer

(14)

reject the null hypothesis of independent idiosyncratic errors and therefore the cointegration test results are valid. The only exception is again South America, where the WCD test statistic is significant for small values of k. Following Banerjee & Carrion-i-Silvestre (2015), I do not redact the results of the specification with two structural breaks for the South American subsample because the majority of WCD statistics are insignificant. Overall, the results of the specification test indicate that it is likely that more than one structural break occurred in the cointegration relationship of the last three PWT vintages.

The model makes several estimations of the locations of the structural breaks and I report the mean of these estimations in Table 2 for each subsample. The structural breaks appear to have happened at roughly the same time in each subsample, namely at the end of the 1980’s and the end of the 1990’s. The revisions, especially from vintage 8.1 to 9.0, tend to be larger towards the end of the time period, so it is sensible that the structural breaks in the cointegration relationship or in the level of the deterministic term happened at that time as well.

The next statistic reported by Table 2 is the percentage of individual rejections of the null hypothesis of no cointegration. That is, the percentage of countries for which no cointegration can be rejected when using the ADF type individual statistics before pooling them into a single panel data statistic. The resulting percentages are strikingly low, especially in the case of the developed nations, where only about 16% to 20% of the countries are individually cointegrated across vintages. It is possible that these results are due to the low power of time series cointegration tests that lack the cross-sectional information that panel data tests can make use of. In that case, the different vintages of the PWT would actually be cointegrated, but due to the lack of information available in time series cointegration tests, the test is making a type II error (Banerjee & Carrion-i-Silvestre, 2015). This would therefore be a clear example of why panel data cointegration and unit root tests have become increasingly popular in recent years. Nevertheless, the low percentages do give a strong indication that there are most likely many countries in the PWT, even in the industrialized world, for which the revisions significantly changed the individual stochastic trend of RGDPO per capita.

Before turning to the main test statistic of this cointegration test, the panel data test statistic 𝑍𝑍𝑗𝑗,

a word of caution about the interpretation of panel data cointegration tests is necessary. Rejecting the null hypothesis of no cointegration does not imply that all panels in the sample are cointegrated, nor can we conclude exactly which ones are cointegrated. Rather, the only conclusion can be that a significant proportion of the cross section units in the sample is cointegrated (Breitung & Pesaran, 2008). In this particular application, finding cointegration among the vintages of the PWT will therefore mean that the revisions have not changed the overall story of the evolution of GDP per capita of the subsample as a whole, but this does not mean that the revisions did not significantly change the story for some individual countries. Similarly, if we cannot reject the null hypothesis, we can conclude that not all cross section units are cointegrated, but this does not imply that there are no individual countries that are in fact cointegrated.

Having said that, I will now turn to the discussion of the 𝑍𝑍𝑗𝑗 statistics. As expected, the statistics

(15)

Table 2: Cointegration test results

Notes: (1.) These results were obtained by applying the methodology of Banerjee & Carrion-i-Silvestre (2015) to real output-side GDP per capita in PWT vintages 9.1, 9.0 and 8.1. (2.) This

Table reports the Pesaran WCD statistic for different values of autoregressive correction (k), the mean locations of the structural breaks, the percentage of individual rejections of the null hypothesis of no cointegration, the panel data cointegration test statistic 𝑍𝑍𝑗𝑗 and the number of common factors according to a parametric test (𝑟𝑟̂1) and a non-parametric test (𝑟𝑟̂1𝑁𝑁𝑁𝑁) (3.) The Pesaran

WCD statistic follows a normal distribution, meaning that its critical values at 5% significance are ±1.96. (4.) The panel data test statistics are also N(0,1) distributed and the statistics that are significant at the 5% level are denoted with a †. (6.) The results of the specifications for which the null-hypothesis of independent idiosyncratic errors is rejected are redacted. (7.) Categories are according to the Taxonomy in the appendix.

Africa Asia South America Developed Long Developed Large Developed

k 1 Break 2 Breaks 1 Break 2 Breaks 1 Break 2 Breaks 1 Break 2 Breaks 1 Break 2 Breaks 1 Break 2 Breaks 0 5.135 1.563 12.166 0.155 0.559 3.402 -3.305 -1.706 -3.079 -0.402 -3.582 0.117 1 4.852 1.750 12.275 0.174 0.161 3.166 -3.452 -1.579 -2.848 -0.402 -3.621 -0.099 2 4.717 0.997 12.511 0.257 0.048 3.269 -3.583 -1.692 -2.711 0.086 -3.559 -0.129 3 4.920 0.791 12.547 -0.013 0.068 3.131 -3.665 -1.546 -2.895 0.536 -3.444 -0.102 4 4.617 0.348 12.369 0.029 0.085 2.385 -3.292 -1.696 -2.740 0.846 -3.342 -0.328 5 4.297 0.169 12.183 0.091 -0.081 2.495 -3.226 -2.058 -2.234 -0.151 -3.363 0.005 6 3.787 0.133 12.016 -0.090 -0.145 1.857 -3.097 -1.817 -2.264 0.072 -3.342 0.672 7 3.590 0.199 12.042 0.351 -0.378 1.479 -3.150 -1.474 -2.133 -0.940 -3.346 0.831 8 3.899 0.353 11.815 0.659 -0.482 1.553 -2.925 -1.225 -2.041 -0.384 -3.366 1.127 9 3.937 0.401 11.902 0.696 -0.552 1.379 -2.732 -1.529 -2.097 -0.177 -3.126 1.067 10 3.832 0.774 11.279 0.796 -0.865 1.127 -2.608 -1.561 -1.936 0.099 -2.854 0.654 11 3.769 1.063 10.671 0.918 -0.893 1.474 -2.528 -1.233 -1.546 0.338 -2.357 0.868 12 2.372 1.157 10.819 0.882 -0.981 1.760 -2.318 -0.798 -1.472 0.675 -2.732 0.633 1st break 1984 1986 1987 1987 1986 1988 1977 1989 1984 1990 1983 1990 2nd break 1997 2000 1997 2000 2000 2000

% individual rejection at 5% sig. 24.39% 19.51% 13.33% 40.00% 45.00% 30.00% 20.00% 20.00% 6.45% 16.13% 40.74% 18.52% Panel data test statistic -0.347 1.702 2.574 −3.043† −1.995 −0.995 -0.032 4.4502.414 5.243-1.509 2.905

𝑟𝑟̂1 12 3 12 2 12 3 12 1 12 3 12 1

(16)

three vintages of the PWT. Again, this does not imply that there are no developed economies for which the stochastic trend did change, and, given the low percentages of individual rejections, there is reason to believe that at least some individual developed economies are in fact not cointegrated.

The results for the developing world are more mixed. I find significant statistics for the developing countries in Asia when allowing for two structural breaks and for South America when I allow for one structural break. However, this South American statistic becomes insignificant in the specification that allows for two structural breaks. The test can also not reject the null hypothesis of no cointegration for the African subsample, confirming my prior expectations. These results lead me to the following conclusions regarding the developing nations: the developing nations in Asia appear to have been affected by the revisions in a similar manner as the developed economies. The test can reject the null hypothesis of no cointegration and therefore the stochastic trend of RGDPO per capita of developing nations in Asia as a group has not changed from vintage to vintage. Regarding the African nations, I come to the opposite conclusion, as the null hypothesis of no cointegration cannot be rejected. This means that the RGDPO per capita series in the three latest vintages of the PWT diverge and do not convey the same message about the evolution of productivity in Africa. The results of the South American subsample are not robust to changes in the specification and I therefore regard this as a borderline case where I cannot definitively conclude that the latest vintages of the PWT do or do not present the same overall evolution of productivity in South America.

Finally, Table 2 presents the number of common factors among the countries, estimated using a parametric test (𝑟𝑟̂) and a non-parametric test (𝑟𝑟̂1𝑁𝑁𝑁𝑁)14. Note that, in all cases, the tests detect at

least one common factor. Therefore, it is clear that the different PWT vintages do not cointegrate by themselves alone and the long run relationship can only be established when taking non-stationary common factors into account (Banerjee & Carrion-i-Silvestre, 2015). For completeness, I have also applied the cointegration test to expenditure-side GDP per capita, using the same samples and specifications. The results of this test are presented in Table 6 in the Appendix and are similar to the results using output-side GDP, albeit slightly weaker, as for developed nations the null-hypothesis of no cointegration can only be rejected for the largest subsample. The results and conclusions for the developing nations remain unchanged.

The results of the cointegration tests imply that researchers may be indifferent between using any of the three latest versions of the PWT when they consider panels of only developed economies or developing Asian nations, since these vintages represent the same overall evolution of output side GDP per capita. However, this is not the case for African panels and may not be the true for South American samples either. Moreover, there is strong evidence that, while revisions did not change the stochastic trend of GDP per capita for developed countries as a group, there are at least some developed countries for which the trend did change. In these cases where cointegration does not seem to be present, it is not clear which vintage of the PWT represents the “true” evolution of GDP per capita or indeed if any of the vintages alone does. All three vintages may contain some information about the true evolution of GDP per capita and logically, it may be the case that the truth lies somewhere in between these latest vintages.

14 The difference lies in the number of lags used in the estimation of the number of common factors, which is set at 4𝑐𝑐𝑒𝑒𝑖𝑖𝑐𝑐[min [𝑁𝑁/𝑇𝑇]/100]1/4 in the non-parametric estimation and determined by the Bayesian Information

(17)

For this reason, I will construct a weighted average of the last three PWT vintages that filters out as much of the measurement error as possible and optimally makes use of the available information. The next section outlines the state-space model that I have applied to achieve this.

V – State Space Representation

The cointegration analysis above concludes that, especially when one considers groups of developing nations, it is not clear which vintage of the PWT to use in empirical research. However, one is not forced to use only one vintage. Rather than using a single vintage as the representation of the true evolution of GDP per capita, we can treat true GDP per capita as an unobservable variable, with three estimations of it, namely vintages 9.1, 9.0 and 8.1 of the PWT. These estimations are based on limited samples and therefore contain some amount of measurement error. This measurement error does not necessarily imply flawed methodologies. Rather, it is a statistical property that inevitably arises when one makes inference about a population based on a limited sample. Furthermore, it may be that the amount of measurement error is not equal across vintages (Aruoba et al, 2013).

The aim of this section is to identify the amount of measurement error in each vintage and to use this information estimate the unobservable true GDP per capita. I will then construct a weighted average of the three latest vintages that approximates this latent variable as closely as possible, with the weights thus inversely proportional to the amount of measurement error in each respective vintage. This weighted average will optimally make use of the available information embedded in the three latest vintages and should outperform any single vintage. Given the results of the cointegration test presented in the previous section, I can make some predictions regarding the weighted average. Since the three PWT vintages under consideration are cointegrated in the cases of developed economies and developing Asian countries, I do not expect that any one vintage is particularly more informative than the others. I therefore expect that the weights will, on average, be about 1/3 each. Such a simple average would be sufficient to average out random noise. However, since it is probable that the vintages are not cointegrated for several individual countries, I expect there to be marked differences between individual countries in the weight assigned to each vintage, with the simple average only emerging if I take the average weight assigned to each vintage over the whole group. For the African and South American subsamples, I expect there to be one or two vintages that consistently receive a higher weight. For these subsamples, the three vintages do not convey the same story about the true evolution of GDP per capita and therefore there should be a vintage that is more informative than the others. Again, I expect this pattern to only emerge in the overall average with large differences between individual countries.

State-space models have been recently applied to reconcile different estimates of GDP15. These

models make use of the Kalman filter, which is a recursive statistical method that can optimally estimate the state variable at time t, using all the available information, that is, present and historical, at time t (Harvey, 1990). Aruoba et al (2013) take the measurement error perspective to GDP revisions as described above and develop a state-space model that applies the Kalman filter to obtain optimal estimations of true GDP. This methodology was implemented in the

(18)

context of Australian GDP16 by Rees et al (2014), who reconcile three different estimates of

Australian GDP per capita. I will begin by describing Rees et al (2014)’s methodology, before applying it to the three latest vintages of the PWT, again taking RGDPO per capita as the flagship variable.

Methodology

Following the notation of Rees et al (2014), let ∆𝑦𝑦𝑡𝑡 denote the growth rate of unobservable true

GDP per capita, which is assumed to be an AR(1) process:

∆𝑦𝑦𝑡𝑡 = 𝜇𝜇(1 − 𝜌𝜌) + 𝜌𝜌∆𝑦𝑦𝑡𝑡−1+ 𝜀𝜀𝐺𝐺,𝑡𝑡 (9)

Where 𝜇𝜇 is the mean growth rate of true GDP per capita, 𝜌𝜌 is the persistence of the AR(1) process and 𝜀𝜀𝐺𝐺,𝑡𝑡 denotes the normally distributed innovation in time t. The growth rate of true

GDP per capita cannot be measured, but we do have three noisy measurements, namely the growth rate of output-side GDP per capita in each of latest three vintages. This yields the following measurement equation:

� ∆𝑦𝑦𝑡𝑡9.1 ∆𝑦𝑦𝑡𝑡9.0 ∆𝑦𝑦𝑡𝑡8.1 � = �11 1� ∆𝑦𝑦𝑡𝑡+ � 𝜀𝜀9.1,𝑡𝑡 𝜀𝜀9.0,𝑡𝑡 𝜀𝜀8.1,𝑡𝑡 � (10)

Where the left hand side represents the growth rates of RGDPO per capita in vintages 9.1, 9.0 and 8.1, with 𝜀𝜀9.1,𝑡𝑡, 𝜀𝜀9.0,𝑡𝑡 and 𝜀𝜀8.1,𝑡𝑡 denoting their respective measurement errors in time t. Since

there is a large overlap of the data underlying the three latest PWT vintages, it is likely that their measurement errors are correlated (Aruoba et al, 2013). It is therefore necessary to allow for this correlation by letting [𝜀𝜀𝐺𝐺,𝑡𝑡 𝜀𝜀9.1,𝑡𝑡 𝜀𝜀9.0,𝑡𝑡 𝜀𝜀8.1,𝑡𝑡] ∼ 𝑁𝑁(0, Σ) where:

Σ = ⎣ ⎢ ⎢ ⎢ ⎡ 𝜎𝜎𝐺𝐺2 𝜎𝜎𝐺𝐺,9.1 𝜎𝜎9.12 𝜎𝜎𝐺𝐺,9.0 𝜎𝜎9.1,9.0 𝜎𝜎9.02 𝜎𝜎𝐺𝐺,8.1 𝜎𝜎9.1,8.1 𝜎𝜎9.0,8.1 𝜎𝜎8.12 ⎦⎥ ⎥ ⎥ ⎤ (11)

Where the variances of the growth rates of true GDP per capita and the three latest PWT vintages are on the diagonal and their covariances are on the lower triangle. As Aruoba et al (2013) show, this model is not identified without putting some restriction on the Σ matrix, since different parameters can yield equal distributions of the observable variables17. Therefore, some

restriction to the Σ matrix has to be imposed. Following Rees et al (2013), I impose the following restriction: 𝜁𝜁 =Var(∆𝑦𝑦Var(∆𝑦𝑦𝑡𝑡) 𝑡𝑡9.1)= 1 1 − 𝜌𝜌2𝜎𝜎𝐺𝐺2 1 1 − 𝜌𝜌2𝜎𝜎𝐺𝐺2 + 2𝜎𝜎𝐺𝐺,9.12 + 𝜎𝜎9.12 = 0.5 (12)

16 Note that this is not a panel data approach and can therefore not be applied to the PWT as a whole simultaneously. I will therefore apply the model to each country in the sample individually.

(19)

The intuition behind this restriction is that the variance of the growth rate of RGDPO per capita in vintage 9.1 is twice as large as the variance of the true growth rate of GDP per capita. This restriction is arbitrary and any other value of 𝜁𝜁 would suffice in making the model identifiable, and any value where 0 < 𝜁𝜁 < 1 would be sensible (Aruoba et al, 2013).

Having defined the latent variable ∆𝑦𝑦𝑡𝑡 and its relationship with the observed variables, the

model can be expressed in a state-space form. Following the notation of Rees et al (2014), let 𝑠𝑠𝑡𝑡 = [∆𝑦𝑦𝑡𝑡 𝜀𝜀9.1,𝑡𝑡 𝜀𝜀9.0,𝑡𝑡 𝜀𝜀8.1,𝑡𝑡]′,𝑚𝑚𝑡𝑡= [∆𝑦𝑦𝑡𝑡9.1 ∆𝑦𝑦𝑡𝑡9.0 ∆𝑦𝑦𝑡𝑡8.1]′, 𝑀𝑀 = [𝜇𝜇(1 − 𝜌𝜌) 0 0 0]′ , 𝜀𝜀𝑡𝑡 = [𝜀𝜀𝐺𝐺,𝑡𝑡 𝜀𝜀9.1,𝑡𝑡 𝜀𝜀9.0,𝑡𝑡 𝜀𝜀8.1,𝑡𝑡]′, 𝐴𝐴 = � 𝜌𝜌 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 � and 𝐶𝐶 = �1 1 0 01 0 1 0 1 0 0 1�.

The state-space model can then be written as:

𝑠𝑠𝑡𝑡 = 𝑀𝑀 + 𝐴𝐴𝑠𝑠𝑡𝑡−1+ 𝜀𝜀𝑡𝑡 (13)

𝑚𝑚𝑡𝑡= 𝐶𝐶𝑠𝑠𝑡𝑡 (14)

Following Rees et al (2014), the estimation of the state-space model proceeds as follows. All parameters are collected in one vector Θ = (𝜇𝜇, ρ, 𝜎𝜎𝐺𝐺2, 𝜎𝜎𝐺𝐺,9.1, 𝜎𝜎9.12 , 𝜎𝜎𝐺𝐺,9.0, 𝜎𝜎9.1,9.0, 𝜎𝜎9.02 , 𝜎𝜎𝐺𝐺,8.1,

𝜎𝜎9.1,8.1, 𝜎𝜎9.0,8.1, 𝜎𝜎8.12 ). To estimate these parameters, the Metropolis-Hastings Markov Chain

Monte Carlo algorithm (MCMC) is used. This algorithm is initiated by first maximizing the posterior distribution of Θ: 𝑝𝑝(Θ|𝑚𝑚1:𝑇𝑇) ∝ 𝑝𝑝(𝑚𝑚1:𝑇𝑇|Θ)𝑝𝑝(Θ) and then using the inverse Hessian

at the maximum to obtain the covariance matrix of Θ. The procedure then continues iteratively by drawing a new vector of parameters from Θ∗ ∼ 𝒩𝒩(Θ𝑖𝑖−1, 𝑐𝑐Σ

𝑖𝑖−1) where a proposed vector

Θ∗ is accepted with probability 𝑚𝑚𝑖𝑖𝑚𝑚 �1, 𝑝𝑝�𝑚𝑚1:𝑇𝑇�Θ∗�𝑝𝑝(Θ∗)

𝑝𝑝�𝑚𝑚1:𝑇𝑇�Θ𝑖𝑖−1�𝑝𝑝(Θ𝑖𝑖−1)� and the parameter c is set such

that an acceptance rate of around 25% is achieved. The Kalman filter comes into play during this iterative process, as it is needed in order to evaluate 𝑝𝑝(𝑚𝑚1:𝑇𝑇|Θ) before drawing Θ∗. The

iterative process continues for 50,000 draws, of which the first 25,000 are discarded.

The estimation requires a set of priors, as the density of these priors over the parameter draw 𝑝𝑝(Θ) is used to initiate the algorithm. My prior concerning the mean growth rate of GDP is 2.7% per year, as this was the average annual growth rate in my sample. All other priors, which are very non-restrictive, remain the same as in Rees et al (2013)18.

When the estimation process is complete, the model will have estimated the growth rate of true GDP per capita ∆𝑦𝑦𝑡𝑡, which I will call RGDPCO(M) from now on. As stated before, I will use

this estimate to construct a weighted average of the observed PWT vintages that approximates RGDPCO(M) as closely as possible. This naturally leads to a non-linear optimization routine, optimizing the following function:

[𝑤𝑤1, 𝑤𝑤2, 𝑤𝑤3] = arg min𝑤𝑤 1,𝑤𝑤2,𝑤𝑤3�[ 𝑤𝑤1𝑅𝑅𝐺𝐺𝐷𝐷𝑅𝑅𝐶𝐶𝑅𝑅9.1𝑡𝑡+ 𝑤𝑤2𝑅𝑅𝐺𝐺𝐷𝐷𝑅𝑅𝐶𝐶𝑅𝑅9.0𝑡𝑡+ 𝑤𝑤3𝑅𝑅𝐺𝐺𝐷𝐷𝑅𝑅𝐶𝐶𝑅𝑅8. 1 − 𝑅𝑅𝐺𝐺𝐷𝐷𝑅𝑅𝑅𝑅𝐶𝐶(𝑀𝑀)𝑡𝑡] 2 𝑇𝑇 𝑡𝑡=1

(20)

𝑆𝑆𝑢𝑢𝑆𝑆𝑆𝑆𝑒𝑒𝑐𝑐𝑡𝑡 𝑡𝑡𝑡𝑡 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑡𝑡𝑚𝑚𝑠𝑠𝑡𝑡𝑟𝑟𝑐𝑐𝑖𝑖𝑚𝑚𝑡𝑡𝑠𝑠: � 𝑤𝑤0 ≤ 𝑤𝑤1+ 𝑤𝑤2+ 𝑤𝑤3= 1

1, 𝑤𝑤2, 𝑤𝑤3 ≤ 1

(15) Where 𝑤𝑤1, 𝑤𝑤2 and 𝑤𝑤3 are the weights assigned to vintages 9.1, 9.0 and 8.1 respectively.

RGDPCO9.1 denotes real output-side GDP per capita in vintage 9.1 and the other abbreviations follow the same notation. Equation (15) thus obtains the weights that minimize the squared difference between the weighted average of PWT vintages and the estimate of true GDP per capita, RGDPCO(M). To see how these weights will be inversely related to the amount of measurement error in each vintage, consider how RGDPCO(M) is constructed. Put simply, at each draw from the posterior distribution, the Kalman filter will extract the most information from the vintage that is the most informative, i.e. contains the least amount of error. These so called Kalman gains will govern how similar the final estimate of RGDPCO(M) is to any one of the vintages. Should a vintage have high Kalman gains, then this vintage will be more similar to RGDPCO(M) and therefore be assigned a higher weight when optimizing equation (15). The Kalman filter does not only average over space, but also over time (Aruoba et al, 2013). A weighted contemporaneous averaging as proposed here will therefore not be the same as RGDPCO(M), but it will be the simplest way to approximate it as closely as possible.

Results

Before delving into the results of the weighting procedure, I will first illustrate the smoothing effect that the state-space model and in particular the Kalman filter provides. Since the Kalman filter operates in a Bayesian framework, it can identify swings in RGDPO per capita that are unlikely and probably due to measurement error, given all of the previous observations.

Figure 2: Comparison of PWT9.1 with smoothed estimate (RGDPCO(M)) for France

Notes: (1.) The green line is the smoothed estimate of the French real output-side GDP per capita growth rate in

(21)

Figure 3: Comparison of PWT9.1 with smoothed estimate (RGDPCO(M)) for the Democratic Republic of Congo (DRC)

Notes: (1.) The green line is the smoothed estimate of the DRC’s real output-side GDP per capita growth rate in

percentages (RGDPCO(M)) and the black line is the DRC’s real output-side GDP per capita growth rate as reported in PWT vintage 9.1 (RGDPCO(91)). (2.) 95% CI denotes the 95% confidence interval around the estimate. As a result, these swings are filtered out and the estimate RGDPOC(M) looks significantly smoother with lower peaks and higher troughs. This is shown in Figures 2 and 3, which compare RGDPCO(M) and its associated 95% confidence interval with the growth rate of RGDPO per capita in the latest PWT vintage, 9.1. As I have applied the model to all 107 countries in my sample, I cannot show the graphs for all countries here19, so I will only showcase one developed

country and one developing country. Figure 2 shows the comparison of RGDPCO(M) for France and Figure 3 shows the same for the Democratic Republic of Congo (DRC). Figures 2 and 3 show the smoothing effect of the Kalman filter in particular in years where vintage 9.1 of the PWT reports especially large growth or contractions. In 1992 for example, PWT9.1 reports a growth rate of almost 30% of RGDPO per capita in the DRC. Given the observations made in the other vintages and in the previous years, the model has recognized that this is an unrealistically large growth rate and therefore RGDPCO(M) only shows a modest growth rate in that year.

To answer the question of how the model generated this smoothed estimate of GDP per capita growth, Figures 3 and 4 plot the Kalman gains for France and the DRC. These Kalman gains are a direct indication of which vintage contains the most information for these particular countries. The Kalman gains (denoted as KG in the Figures) are plotted on a pairwise basis by comparing the gains of each vintage with each other vintage, yielding three pairs. Intuitively, if the Kalman gains are primarily on one side of the 45 degree line that is also plotted in both Figures, it means that the gains were larger for the vintage plotted on the axis on that side of

(22)

Figure 4: Kalman Gains (KG) comparison for France

Notes: (1.) Panel A compares the Kalman gains of PWT vintage 9.1 (𝐾𝐾𝐺𝐺9.1) with PWT vintage 9.0 (𝐾𝐾𝐺𝐺9.0).

Similarly, Panel B compares the Kalman gains of vintages 9.1 and 8.1 and Panel C compares the Kalman gains of vintages 9.0 and 8.1. (2.) The red star and ellipse show the mean and 95% confidence interval respectively.

Figure 5: Kalman Gains (KG) comparison for the DRC

Notes: (1.) Panel A compares the Kalman gains of PWT vintage 9.1 (𝐾𝐾𝐺𝐺9.1) with PWT vintage 9.0 (𝐾𝐾𝐺𝐺9.0).

Similarly, Panel B compares the Kalman gains of vintages 9.1 and 8.1 and Panel C compares the Kalman gains of vintages 9.0 and 8.1. (2.) The red star and ellipse show the mean and 95% confidence interval respectively. the line. Therefore, that vintage was found to be more informative than the vintage plotted on the other axis.

Figure 4 shows the Kalman gains in the case of France do not tend to favor any particular vintage, as they are centered near and around the 45 degree line. Each vintage apparently contains equal amounts of information about the true evolution of GDP per capita as the other vintages. This is consistent with the fact that France is a developed country and therefore there is a large probability that the different PWT vintages are cointegrated. Since each vintage contributes a roughly equal amount of information to the estimation of French RGDPOC(M), I also expect that the three PWT vintages will be assigned roughly equal weights in the weighted average.

(23)

in the panel on the right. There is therefore a clear hierarchy, with vintage 9.1 containing the most measurement error, followed by vintage 9.0 and vintage 8.1 being the most informative. As a result, I expect the weights assigned to each vintage to follow this same hierarchy, vintage 8.1 thus receiving the highest weight and vintage 9.1 the lowest.

The main results of the state-space model, the optimal weights assigned to each PWT vintage under consideration, are presented in Table 320. In order to maximize the amount of information

available to the Kalman filter, I have used the full time period available for each country. Returning to the cases of France and the DRC, it is good to see that the weights line up with my expectations based on the visual inspection of the Kalman gains. Both the evenly distributed weights in the case of France and the weights that favor vintage 8.1 in the case of the DRC confirm my expectations.

Another expectation that is confirmed is that there are large differences in the weights from country to country, even within the same subgroup. While there are visible consistencies, each subgroup contains many outliers. An example is the United States, where the weight is fully loaded only on vintage 9.1. This is consistent with the finding of the cointegration test that there are likely many countries, even developed ones, for which the different vintages do not cointegrate.

The most striking discovery presented in Table 3 lies in the average weights per subgroup, which are not according to my expectations. Surprisingly, the country by country weights average out to roughly the same values for each subgroup, regardless of development status or the results of the cointegration test. In each subgroup, vintage 8.1 receives on average nearly half the weight in the optimally weighted average, while the other half is evenly distributed between vintages 9.1 and 9.0. There are many countries for which one of the vintages receives a weight of zero. To control for the effect that this may have on the average weights, I also report the median weights21. The pattern of high weights being placed on vintage 8.1 is even

more pronounced when considering the median weights. This finding is not only surprising given the results of the cointegration analysis, but also unintuitive considering the many supposed improvements made to the PWT in the revision from ICP 2005 to ICP 2011. It appears that some critical information about the evolution of GDP per capita was lost in this revision, and that therefore vintage 8.1 should not be left behind by researchers.

20 For completeness, I have also applied the state-space method to a number of countries using expenditure-side GDP per capita (RGDPE). The results are presented in Table 7 in the Appendix and are on average very similar to the results using RGDPO per capita in the case of developing economies, while developed economies distribute the weight more evenly when using RGDPE per capita.

(24)

Developed Large Africa South America

9.1 9.0 8.1 9.1 9.0 8.1 9.1 9.0 8.1

Australia 0.000 0.470 0.530 Benin 0.470 0.000 0.530 Argentina 0.000 0.594 0.406 Austria 0.122 0.526 0.352 Botswana 0.000 0.712 0.288 Barbados 0.149 0.403 0.449 Belgium 0.000 0.417 0.583 Burkina Faso 0.335 0.000 0.665 Bolivia 0.000 0.541 0.459 Canada 0.165 0.045 0.790 Burundi 0.641 0.211 0.148 Brazil 0.004 0.490 0.506 Cyprus 0.341 0.000 0.659 Cabo Verde 0.000 0.636 0.364 Chile 0.396 0.000 0.604 Denmark 0.481 0.000 0.519 Cameroon 0.275 0.415 0.310 Colombia 0.465 0.000 0.535 Finland 0.281 0.709 0.010 CAR 0.000 0.438 0.562 Costa Rica 0.336 0.269 0.394 France 0.441 0.307 0.252 Chad 0.481 0.191 0.328 Dominican Rep. 0.000 0.974 0.026 Germany 0.492 0.016 0.492 Comoros 0.000 0.192 0.808 Ecuador 0.134 0.000 0.866 Greece 0.123 0.000 0.877 Congo 0.000 0.858 0.142 El Salvador 0.801 0.000 0.199 Hong Kong 0.529 0.000 0.471 DRC 0.140 0.254 0.607 Guatemala 0.297 0.000 0.703 Iceland 0.033 0.467 0.500 Egypt 0.470 0.000 0.530 Honduras 0.000 0.528 0.472 Ireland 0.634 0.000 0.366 Equatorial Guinea 0.339 0.386 0.275 Jamaica 0.691 0.000 0.309 Israel 0.275 0.497 0.229 Ethiopia 0.563 0.000 0.438 Mexico 0.000 0.645 0.356 Italy 0.647 0.015 0.338 Gabon 0.444 0.000 0.556 Panama 0.000 0.460 0.540 Japan 0.504 0.000 0.496 Gambia 0.988 0.000 0.012 Paraguay 0.001 0.732 0.267 Luxembourg 0.000 0.540 0.460 Ghana 0.000 0.308 0.692 Peru 0.178 0.000 0.822 Malta 0.000 0.642 0.359 Guinea 0.583 0.164 0.253 Trinidad & Tobago 0.356 0.000 0.644 Netherlands 0.000 0.440 0.560 Guinea-Bissau 0.728 0.000 0.272 Uruguay 0.400 0.000 0.601 New Zealand 0.369 0.253 0.377 Ivory Coast 0.463 0.179 0.357 Venezuela 0.522 0.000 0.478 Norway 0.000 0.841 0.159 Kenya 0.000 0.618 0.382 Average 0.236 0.282 0.482

Portugal 0.000 0.335 0.665 Lesotho 0.139 0.373 0.489 Median 0.163 0.135 0.475

Romania 0.155 0.691 0.154 Madagascar 0.164 0.109 0.727 Asia - Developing

Singapore 0.280 0.079 0.642 Malawi 0.550 0.000 0.451 9.1 9.0 8.1

South Korea 0.000 0.266 0.734 Mali 0.018 0.528 0.454 Bangladesh 0.335 0.000 0.665 Spain 0.182 0.022 0.797 Mauritania 0.000 0.444 0.556 China 0.569 0.432 0.000 Sweden 0.383 0.048 0.570 Mauritius 0.294 0.637 0.069 Fiji 0.000 0.766 0.234 Switzerland 0.315 0.434 0.251 Morocco 0.236 0.253 0.511 India 0.081 0.059 0.861 Taiwan 0.309 0.000 0.692 Mozambique 0.000 0.510 0.489 Indonesia 0.000 0.755 0.245 United Kingdom 0.076 0.343 0.581 Namibia 0.501 0.049 0.451 Iran 0.115 0.000 0.885 United States 1.000 0.000 0.000 Niger 0.000 0.348 0.652 Jordan 0.000 0.820 0.180

Average 0.262 0.271 0.467 Nigeria 0.289 0.000 0.711 Nepal 0.526 0.000 0.474

Median 0.275 0.266 0.496 Rwanda 0.154 0.329 0.517 Malaysia 0.000 0.259 0.741

Senegal 0.235 0.373 0.392 Pakistan 0.671 0.000 0.329 South Africa 0.123 0.356 0.521 Philippines 0.443 0.000 0.557 Tanzania 0.861 0.000 0.139 Sri Lanka 0.201 0.183 0.616 Togo 0.259 0.000 0.741 Syria 0.000 0.233 0.767 Tunisia 0.473 0.202 0.325 Thailand 0.783 0.000 0.217 Uganda 0.444 0.000 0.556 Turkey 0.111 0.781 0.108 Zambia 0.692 0.139 0.169 Average 0.256 0.286 0.459 Zimbabwe 0.368 0.501 0.131 Median 0.115 0.183 0.474 Average 0.310 0.261 0.429 Median 0.289 0.211 0.451 9.1 9.0 8.1

Total Developing Average 0.274 0.255 0.421

Total Developing Median 0.267 0.197 0.457

Notes: (1.) This table shows the optimal weights that solve equation (15) for each country and the average weights by subgroup.

(25)

VI – Implications

The weighted average of the three latest PWT vintages described above was designed to optimally combine the information about true GDP per capita embedded in these different vintages. This section will provide descriptive statistics on how this changes the GDP per capita statistics reported in the PWT relative to vintage 9.1.

The first question one may ask is whether or not the weighting scheme has significantly changed the average GDP per capita statistics. I have tested this by applying a t-test of equivalent means, which operates under the null-hypothesis that the means of two variables are equal22. The results

of the t-test are presented in Table 4, and clearly indicate that the mean of RGDPO per capita in the weighted average is not the same as the mean of RGDPO in PWT vintage 9.1. Mean GDP per capita decreased from 9,305 dollars in PWT vintage 9.1 to 8,754 dollars in the weighted average, a decrease of 551 dollars or about 6%. This change is statistically significantly different from zero, as indicated by the large t-statistic and correspondingly small p-value. Another interesting statistic reported in Table 4 is that the variance of RGDPO per capita appears to have decreased due to the weighted averaging procedure, as evidenced by the decrease in standard deviation and standard error. This is consistent with the idea that the weighting scheme gives more weight to the vintage with the least amount of noise, making the resulting weighted average less noisy, thus decreasing the variance of RGDPO per capita.

Table 4: Results of the t-test of equivalent means

Variable Mean Standard

Error Deviation Standard 95% Confidence Interval

Weighted Average 8,754 139.79 10,426.96 8,480 9,028

PWT Vintage 9.1 9,305 145.82 10,877.32 9,019 9,591

Difference -551 11.01 821.24 -573 -530

t statistic: -50.08 p-value: 0.000

Notes: (1.) This Table presents the results of the t-test that tests for the equivalence of the means of the weighted

average as described in Table 3 and PWT vintage 9.1 over the entire sample period. (2.) This test operates under the null-hypothesis that the means of the weighted average and vintage 9.1 are equal.

The results presented in Table 4 are a clear indication that the weighting procedure has had a significant impact on the reported level of RGDPO per capita and its variance. Nevertheless, in keeping with the spirit of this thesis, I will also show how this effect differs between developed and developing nations. For each country, I have calculated the average RGDPO per capita over the entire sample period and plotted a histogram of the difference between this average in the weighted average and PWT vintage 9.1. Panel A in Figure 6 below shows the results of this differencing for developed nations and Panel B shows the same for developing nations. Several significant differences emerge by comparing the two Panels. First, in any given year, the average developed nation is 1,044 dollars poorer per capita in the weighted average as compared to PWT vintage 9.1. By comparison, the decrease of 294 dollars on average for developing nations substantially smaller. Furthermore, the dispersion of the difference is

Referenties

GERELATEERDE DOCUMENTEN

In Section 2.2, we shall provide a variational characterization of the PSVD.. It is also the unique minimum Frobenius norm

However, if this is the way normalization is achieved, the wave function for the ground state gives not the amplitude that a universe arises from nothing, but that the ground

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

• publication hors séries; the first one of this nature appeared just in time for the July plenary: Individual and Society in the Mediterranean Muslim World: Issues and

Along with this, the fact that non-royal persons label certain kinds of texts as sAx.w in the Middle Kingdom gives one tangible basis to propose that the sAx.w shown performed

With this in mind, the researcher chose a qualitative exploratory research design in order to try to understand the unique perceptions of a group of educators in urban schools

Whereas we know for sure that Danzanravjaa was the author of the works ascribed to him, we cannot in any way be certain which, if any, of those ascribed to the 6th Dalai Lama,