Comparing LDA as classification method with JEL classification codes

(1)

Comparing LDA as classification method with

JEL classification codes

Jelle Jacobus Schagen

June 27, 2015

Bachelor Thesis Econometrics

Supervisors: dr. Kees Jan van Garderen and dr. Marco van der Leij

(2)

Abstract

In this paper the LDA model of Blei et al. (2003) is applied to a dataset of economic articles to create topics. These topics form the base of predicting which JEL categories are attached to the economic articles. The predicted topics and JEL categories show significant similarities but during the research process only a part of the dataset has been correctly predicted. This occured because of an incomplete match between the LDA topics and the JEL categories.

Title: Comparing LDA as classification method with JEL classification codes Authors: Jelle Jacobus Schagen, jelle.schagen@student.uva.nl, 10035877 Supervisors: dr. Kees Jan van Garderen and dr. Marco van der Leij Date: June 27, 2015

(3)

1. Introduction

Whenever a thesis, like this one, or another article is published this increases the number of articles on the web. This never ending increase over time makes it harder to find a relevant article when searching for something specific. To solve this problem keywords are attached to papers when they are published. These keywords are chosen by the au-thor and based on the subject. By searching for the appropiate keywords the requested articles can be found.

The keywords that are chosen by the author are either made up by the author or come from an existing classification system. If the keywords are made up by the author they can perfectly describe the content of his work because he is free to choose. By using an existing system the author is restricted to set categories. Keywords are picked based on the best match with the content as chosen by the author.

In this paper the Latent Dirichlet Allocation (LDA) model as published by Blei, Ng and Jordan (2003) is used for the classification of economic articles. The LDA model generates topics which consist of related words based on the occurence of the words in a corpus. A good implementation of LDA as a classification method could result in an automatic classification method.

(6)

and thus increase the probability that the subject of a topic is the same as the subject of the article.

Another classification method to order economic articles is produced by the Journal of Economic Literature and is called a JEL classification code. The codes are assigned to the articles by their author and an article can contain multiple codes. This classification code consists of twenty main categories from the field of economics. The second and third level subcategories do specify the subject of an article. Because the third level of the JEL codes contain mainly the same subject as the upper second level, the focus of this paper is on the second level JEL code.

By recognizing LDA generated topics as a subcategory of the JEL code, a link between these classification methods is made. Articles with a JEL code can be compared with the topic distribution of the LDA model. If the classification methods give similar results an automatic classification system could be possible. In this paper the goal is to create this link and find an answer to the question: can the LDA model successfully predict the JEL codes of articles?

To make a comparison between LDA and JEL possible a dataset from EconLit is chosen. The dataset contains 181 thousand abstracts of economic articles. Also, the necessary information, such as year of publication and the JEL codes, are included in the dataset. The publications in the dataset come from the time period 2000 up to and including 2011.

In order to do the research the theory behind the LDA model is presented in the theoret-ical framework. The theorettheoret-ical framework also contains information about the different levels within the JEL classification codes. Second, the methods used for trimming data, creating topics and the adjustment of JEL codes to LDA formed topics are explained. Subsequently a description of the comparison methods is given. Finally, the results are

(7)

(8)

2. Theory behind LDA and JEL codes

For understanding the research done it is important to understand both the classification methods LDA and JEL. Therefore the structure of these methods is explained.

2.1. LDA model

2.1.1. The theory

The LDA model considers documents as a mixture of a finite number of topics and each meaningful word in a document as allocated by one of the topics. To make the used terms clear, the same terminology as Blei et al. (2003) will be used:

• A word is the smallest unit within this model and denoted by w.

• A document consists of N words and can be written as d = (w1, w2, ..., wN). • A corpus is the biggest unit and is a collection of M documents. Therefore the

notation is D = (d1, d2, ..., dM).

When LDA is applied to a corpus it allocates words to a set number of topics based on their occurence in the corpus. Also a mixture of these topics over the different docu-ments is produced. By doing so, LDA finds useful applications like creating topics from news articles as done by Xin Zhao et al. (2011) with articles from the New York Times. In their paper the topics created from the New York Times are compared with topics created from applying an adjusted LDA model to Twitter data. The Twitter-LDA model

(9)

proposed by Xin Zhao et al. was adjusted to take into account the restricted length of tweets.

This paper is restricted to the original LDA model as published by Blei et al. (2003). An improvement made in the original LDA model compared to other text corpora modelling models such as unigrams (Nigam et al., 2000) is the multinominal distribution θ over the topics in each document. Blei shows that the restriction of one topic in a document is too restrictive to model a corpus. The pLSI model (Hofmann, 1999) takes multiple topics in one document into account but is restricted to the training set that is used. Therefore the pLSI model is only known with the topic distribution as known from the training set. Besides that, Blei et al. show in their comparison between LDA and pLSI that the pLSI model suffers from overfitting.

Within each topic zn, the words wnfollow a multinominal distribution. The same word can be part of multiple topics and for every word wi, with i being an element of the whole vocabulary of words, p(wi | zn) holds. Another requirement of the distribution of words is the ’Bag-of-words’ assumption, which states that the sequence of words does not affect the LDA model.

Blei et al. (2003) created LDA as a three-step generative process for creating documents in a corpus:

1. Choose N ∼ Poisson(ζ). 2. Choose θ ∼ Dirichlet(α). 3. For each of the N words wn:

a) Choose a topic zn ∼ Multinomial(θ). b) Choose a word wn from p(wn | zn).

(10)

Two important steps for this paper are step 2 and step 3b. If step 2 is applied to the EconLit dataset it gives the distribution over the formed topics for every abstract. Step 3b gives the probabilities of appearances of certain words within a topic. For both steps high percentages mean a high chance of appearance and thus their relevance.

2.1.2. The script

The LDA model that is applied in this paper is a script in the programming language Python, produced by Hansen et al. (2014). The script is available to the public and can be adjusted. By adjusting parameters it is able to fit the research goal and the dataset that is used. For example, one of the parameters is the number of topics created by LDA. To make an approximation for a different number of JEL categories the number of topics needs to be adjusted.

The script also contains two steps to clean the dataset. The first one contains the removal of the so called stopwords. Stopwords are the words that help to build a sentence but do not contain information. Examples are ’the’, ’and’, ’where’, ’themself’, etc. This is necessary because the stopwords are not helpful in describing the content of a document. The second step is the stemming of words. Hereby the base of the words are taken in order to find words with the same meaning but a different conjugation. An example of a base word is ’financ’ which occurs within the abstracts in the form of ’finance’ or ’financing’.

2.2. JEL codes

The second classification method in this paper are the JEL classification codes as de-scribed on the website of the American Economic Association. This method uses letters to distinguish between twenty economic main categories. Some examples are

(11)

’Microeco-nomics’ and ’Financial Eco’Microeco-nomics’ which give a broad description of a certain category. Second and third level subcategories are used for further specification. The Financial Economics main category for example consists of four subcategories1:

• G0 General

• G1 General Financial Markets

• G2 Financial Institutions and Services • G3 Corporate Finance and Governance

The number is used as reference to the subcategory. For this paper the third level within JEL classification codes is not relevant.

The assignment of JEL codes is done by authors whenever they publish an article in the fields of economics. Several codes can be attached to one paper to cover all of the content. The goodness of fit between the JEL code and the subject also depends on the knowledge of the author about the JEL classification system. If not all categories are known a category can be overlooked that would have better matched the content.

2.3. Theory discussion

LDA classification is based on the occurence of meaningful words in the document. Therefore it systematically can classificate articles based on a statistical foundation. Namely, the topic distribution and the distribution over words. An advantage of such a classification method is that all articles obtain keywords based on the same systematic principles.

A disadvantage of the comparison of the LDA and JEL methods is the division in topics and the division in categories. JEL codes are a fixed set of categories and the LDA

(12)

topics that are formed depend on the corpus that is used and the parameters that are set. The topics therefore are not programmed to form the same kind of categories as used in JEL classification. In order to solve this problem JEL codes are only assigned to LDA generated topics if similarities can be found.

(13)

3. Research Methods

The research done is based on the LDA model programmed in Python by Hansen et al. (2014). Whenever LDA is applied it is their script that is used. All steps are programmed in Python. These steps contain preparing the dataset to make it suitable for the LDA model and finding JEL codes that match LDA topics.

3.1. Dataset

The dataset used is EconLit and is supplied by the American Economic Association. This EconLit dataset contains the abstracts of economic articles published from 2000 up to and including 20111. Some abstracts appear twice in the dataset so cutting the doubles out is necessary to produce a dataset of 147,526 unique abstracts. Because not all words in the abstract contain information stopwords are removed. Also we take in account that words need to be stemmed to get hold of their base form.

3.2. Creating topics

To approach the second level of the JEL classification codes with LDA a two-step method is used. In the first step the letters of the JEL codes are identified and in the second step the topics are created to match the second level of the JEL codes. When the LDA

(14)

model is applied the number of topics to be formed is chosen manually. During this paper the number of topics is chosen to be 10 more than the number of categories in the JEL classification system that is approached. This is based on test runs with a lower or higher numbers of topics which created less suitable topics.

In order to form topics the LDA model is applied to the cleaned dataset of unique abstracts. Because the first level of the JEL codes contains 20 main categories the number of topics is chosen to be 30. The 20 words that have the biggest probability of appearance in a topic are taken as most relevant. Based on these words one of the first level JEL categories is matched to the topic.

The output of the LDA model contains a distribution over the topics. With the as-signment of first level JEL classification codes to topics a distribution over the main categories of the JEL classification system is created. Now it is possible to create sub-sets of the data for each main category whereby each subset contains the abstracts that correspond for at least 10% with the main category.

By applying the LDA model on the subsets of data, topics are created that cover dif-ferent subjects within a main category. The number of topics chosen depends on the number of second level categories in the main category and is increased by 10. Again the 20 words that have the largest probability of appearance are considered as most relevant and are used for appointing second level JEL categories to topics.

The abstracts are assigned with the second level JEL code that has the highest proba-bility in the distribution and if this probaproba-bility is at least 20%. Otherwise the abstract is considered non-classifiable.

The original attached JEL code of each abstract can be compared with the distribution of JEL codes that are assigned by the LDA model.

(15)

it is considered to be a correct prediction. For every subcategory the number of correct predictions is collected.

3.3. Pearson Chi square test

To test the null hypothesis that there is no significant relation between the classification methods the Pearson Chi square test is applied. This test is made for categorical data and produces a goodness of fit between the found distribution and the expected distribution. The formula of the Pearson Chi square test is:

χ2 = n X i=1 (Oi− Ei)2 Ei , (3.1)

with Oi the number of correct predicted articles and Ei the expected number of correct predictions. The expected number of correct predictions is found by: Ei = p × xi, with p the probability on a correct prediction and xi the number of predictions that has been done for every subset.

(16)

4. Results

The results from the separated steps of the research methods are presented and tested.

4.1. Topics created by LDA

4.1.1. First level JEL codes

Applying LDA on the whole dataset that consist of 147,526 unique abstracts created 30 topics with the 20 words that have the largest probability of occurring. In Table 4.1 a part of the formed topics is displayed. Depending on the words within a topic one of the 20 main categories of the JEL classification system is assigned to a topic.

Some of the topics clearly show a certain subject by the words in it. For example the topics that has been assigned as ’J’ and ’I’. The J as category contains the subject Labor and Demographic Economics and I contains Health, Education, and Welfare. But for some other topics it was not possible to find a match with one of the JEL codes. One example is topic22 in Table 4.1. This topic consists of words that contain no economic information.

Not only were some topics unable to be classified, also some JEL codes could not find a matching topic. Categories B, M, Y and Z could not be identified within the formed topics and therefore no subset is formed for these categories.

(17)

J I K R D topic22 E F

employ health govern region market issu financi countri

work program state local price discuss bank trade

wage insur public area competit articl financ intern

labor care law hous consum present credit develop

worker poverti polit urban demand literatur crisi foreign

Table 4.1.: Topics formed by the whole dataset

4.1.2. Second level JEL codes

By applying the LDA model separately on the 16 subsets of abstracts 16 new sets of topics are created whereby each set has a different number of topics. In Table 4.2 some of the topics that correspond to category R are presented with the 5 most occuring words. Category R contains the subjects Regional, Real Estate, and Transportation Economics. The second level categories are:

• R0 General

• R1 General Regional Economics • R2 Household Analysis

• R3 Production Analysis and Firm Location • R4 Transportation Systems

• R5 Regional Government Analysis

The assignment of JEL categories to topics is based on the strongest similarities found within my knowledge between the categories subject and the 20 most representative words of a topic. As result matches have been created which contain different levels of similarities. This also resulted in the assignment of a second level JEL category to several topics like R2 in Table 4.2. Furthermore, some second level JEL categories did not show any similarity with topics and some topics did not show any similarity with any of the categories. Therefore these categories and topics did not receive any assignment.

(18)

R1 R2 R2 R3 R4 R5 topic1 topic14

govern employ hous firm cost urban price chang

public work incom perform demand citi value increase

local educ household activ choic area risk year

servic group low competit transport spatial return impact

tax labour program busi suppli land properti foreign

Table 4.2.: Second level topics

4.2. Outcomes predictions

After applying the LDA model for the first time 16 subsets with abstracts that showed similarities with one of the main categories of the JEL classification code were produced. In Table 4.3 the number of abstracts in every category are given. On these subsets the LDA is applied for a second time to create topics to match the second level JEL categories.

JEL main category Abstracts in subset

A 6770 C 16177 D 13660 E 15061 F 12201 G 9538 H 7539 I 6440 J 8447 K 6697 L 12048 N 14271 O 4577 P 4402 Q 15929 R 6186 Total 159943

Table 4.3.: Subsets after applying LDA for the first time

(19)

topics and a minimum of 20% a prediction has been made. These predicted abstracts are presented in Table 4.4. If the prediction turned out to be similar as one of the JEL codes that was attached by the author it is labelled as a correct prediction. The last column in Table 4.4 shows the number of correct predictions for every subset.

4.3. Hypothesis testing

4.3.1. JEL code correction

Because many of the authors attach multiple JEL classification codes to their articles a correction term is introduced. The chance of a correct prediction is therefore not equal to p = ₁₃₅1 , as would be expected for the 135 second level JEL classification codes, but multiplied by the average number of JEL codes attached within the EconLit dataset which is a = totalnumberof allocatedJ ELcodes_{numberof articles} = 442951₁₄₇₅₂₆.

4.3.2. Pearson Chi square test applied

By using the number of predictions and oberservations for every category as described in Table 4.4, the expected values Ei are calculated and presented in Table 4.5. By fill-ing in the expected values and correct predictions from Tables 4.4 and 4.5 in formula 3.1, ˜χ2 = 38351, 05 is found with 74 different categories. For χ2 with 73 degrees of free-dom and α = 0, 05 the critical value is χ2 = 93, 945. Therefore the results are significant.

(20)

Label Predicted Correct predicted Label Predicted Correct Predicted A1 261 6 I3 404 35 A2 695 16 J0 191 0 C0 26 0 J1 174 56 C1 1180 71 J6 33 5 C5 58 6 J7 45 0 C7 182 4 J8 53 0 C9 317 1 K1 43 6 D1 322 15 K2 146 5 D2 170 4 L1 366 11 D3 343 4 L2 93 16 D4 244 28 L6 129 13 D5 580 27 L8 75 4 D6 118 4 N1 239 3 D7 183 23 N2 420 3 D8 1059 153 N3 1397 19 D9 139 4 N4 266 7 E0 202 0 N5 169 2 E1 227 7 N6 162 2 E2 160 12 N7 413 2 E3 287 24 N9 222 3 E5 239 63 O3 35 1 E6 262 11 O4 51 8 F1 464 144 O5 88 1 F2 228 30 P1 65 1 F3 52 6 P2 89 1 F4 436 13 P3 94 3 F5 267 6 Q0 276 8 F6 144 0 Q1 420 230 G1 1588 666 Q2 355 90 G2 259 53 Q4 615 315 G3 702 98 Q5 83 18 H0 315 0 R1 164 24 H2 86 16 R2 371 95 H4 65 0 R3 72 14 H5 38 4 R4 153 17 H7 151 2 R5 144 31 I1 465 284 Total 20741 2858 I2 112 5

(21)

Label Expected correct Label Expected correct Label Expected correct A1 5.80 F4 9.70 N1 5.32 A2 15.46 F5 5.94 N2 9.34 C0 0.58 F6 3.20 N3 31.07 C1 26.24 G1 35.32 N4 5.92 C5 1.29 G2 5.76 N5 3.76 C7 4.05 G3 15.61 N6 3.60 C9 7.05 H0 7.01 N7 9.19 D1 7.16 H2 1.91 N9 4.94 D2 3.78 H4 1.45 O3 0.78 D3 7.63 H5 0.85 O4 1.13 D4 5.43 H7 3.36 O5 1.96 D5 12.90 I1 10.34 P1 1.45 D6 2.62 I2 2.49 P2 1.98 D7 4.07 I3 8.99 P3 2.09 D8 23.55 J0 4.25 Q0 6.14 D9 3.09 J1 3.87 Q1 9.34 E0 4.49 J6 0.73 Q2 7.90 E1 5.05 J7 1.00 Q4 13.68 E2 3.56 J8 1.18 Q5 1.85 E3 6.38 K1 0.96 R1 3.65 E5 5.32 K2 3.25 R2 8.25 E6 5.83 L1 8.14 R3 1.60 F1 10.32 L2 2.07 R4 3.40 F2 5.07 L6 2.87 R5 3.20 F3 1.16 L8 1.67 Total 461.30

(22)

5. Conclusion

During this research the aim was to make as many as possible correct predictions for JEL categories by using LDA. LDA resulted in the creation of topics and based on the content of the topics JEL codes were assigned. Because the topics did not create perfect matches with the JEL classification codes, a part of the JEL categories were not covered in the LDA classification system. Namely, 4 categories of the 20 main categories and this resulted in 61 missing second level JEL codes of the total of 135 after applying LDA on the subsets. Therefore no correct predictions could be created for articles which had their attached JEL codes from one of these 61 missing categories.

Besides the incomplete match between LDA topics and JEL categories there was also a incomplete match between topics and abstracts. The first selection was based on a similarity of 10%. This resulted in matches between the 16 main categories and 159,943 articles. Some of the articles are matched with multiple categories. The biggest mis-match however is visible during the second similarity check on the biggest likeness and at least 20%. Just 20,741 of the 147,526 articles are matched with a second level JEL category.

13.78% of the performed predictions were correct, which turns out to be significant. But this leads to the discussion whether the significance is of any importance when the number of predictions is less than one seventh of the entire dataset.

(23)

My conclusion therefore leads to the following. My findings show that applying LDA creates a classification method that is significantly better than a random method. How-ever there is an incomplete match between LDA topics and JEL categories which results in not making predictions for a big part of the dataset. Therefore the LDA model can not be used to automatic classify articles in the JEL system. However, I believe a classi-fication system could exist whereby the topics created by LDA better suit the different categories and therefore improve the number of correct predictions.

(24)

Bibliography

American Economic Association, ”EconLit”, https://www.aeaweb.org/econlit/.

American Economic Association, ”JEL Classification Code Guide”, https://www.aeaweb.org/jel/guide/jel.php.

Blei, D.M., A.Y. Ng and M.I. Jordan (2003), ”Latent dirichlet allocation”, Journal of Machine Learning Research, 3, 993-1022.

Hansen, S., M. McMahon and A. Prat (2014), Transparancy and deliberation on the FOMC: A computational linguistics approach, CEPR Discussion Paper 9994.

Hofmann, T. (1999), ”Probalistic latent semantic indexing”, Proceedings of the Twenty-Second Annual International SIGIR Conference.

Nigam, K. A.McCallum, S. Thrun and T. Mitchell (2000), ”Text classification from la-beled and unlala-beled documents using EM”, Machine Learning, 39(2/3): 103-134.

Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Lim Ee-Peng, Hongfei Yan and Xiaoming Li (2011), ”Comparing Twitter and traditional media using topic models”, Proceedings of the 33rd European Conference on Information Retrieval.

(25)

A. JEL codes

A General Economics and Teaching A1 General Economics

A2 Economic Education and Teaching of Economics A3 Collective Works

B History of Economic Thought, Methodology, and Heterodox Approaches B0 General

B1 History of Economic Thought through 1925 B2 History of Economic Thought since 1925 B3 History of Economic Thought: Individuals B4 Economic Methodology

B5 Currect Heterodox Approaches

C Mathematical and Quantitative Methods C0 General

C1 Econometric and Statistical Methods and Methodology: General C2 Single Equation Models; Single Variables

C3 Multiple or Simultaneous Equation Models; Multiple Variables C4 Econometric and Statistical Methods: Special Topics

C5 Econometric Modeling

C6 Mathematical Methods and Programming C7 Game Theory and Bargaining Theory

C8 Data Collection and Data Estimation Methodology; Computer Programs C9 Design of Experiments

D Microeconomics D0 General

D1 Household Behavior and Family Economics D2 Production and Organizations

(26)

D4 Market Structure and Pricing

D5 General Equilibrium and Disequilibrium D6 Welfare Economics

D7 Analysis of Collective Decision-Making D8 Information, Knowledge, and Uncertainty D9 Intertemporal Choice and Growth

E Macroeconomics and Monetary Economics E0 General

E1 General Aggregative Models

E2 Macroeconomics: Consumption, Saving, Production, Employment, and Invest-ment

E3 Prices, Business Fluctuations, and Cycles E4 Money and Interest Rates

E5 Monetary Policy, Central Banking, and the Supply of Money and Credit E6 Macroeconomic Policy, Macroeconomic Aspects of Public Finance, and General Outlook

F International Economics F0 General

F1 Trade

F2 International Factor Movements and International Business F3 International Finance

F4 Macroeconomic Aspects of International Trade and Finance F5 International Relations and International Political Economy F6 Globalization

G Financial Economics G0 General

G1 General Financial Markets

G2 Financial Institutions and Services G3 Corporate Finance and Governance H Public Economics

H0 General

H1 Structure and Scope of Government H2 Taxation, Subsidies, and Revenue

H3 Fiscal Policies and Behavior of Economic Agents H4 Publicly Provided Goods

H5 National Government Expenditures and Related Policies H6 National Budget, Deficit, and Debt

(27)

H7 State and Local Government; Intergovernmental Relations H8 Miscellaneous Issues

I Health, Education, and Welfare I0 General

I1 Health

I2 Education and Research Institutions I3 Welfare and Poverty

J Labor and Demographic Economics J0 General

J1 Demographic Economics J2 Demand and Supply of Labor

J3 Wages, Compensation, and Labor Costs J4 Particular Labor Markets

J5 Labor-Management Relations, Trade Unions, and Collective Bargaining J6 Mobility, Unemployment, and Vacancies

J7 Labor Discrimination

J8 Labor Standards: National and International K Law and Economics

K0 General

K1 Basic Areas of Law

K2 Regulation and Business Law K3 Other Substantive Areas of Law

K4 Legal Procedure, the Legal System, and Illegal Behavior L Industrial Organization

L0 General

L1 Market Structure, Firm Strategy, and Market Performance L2 Firm Objectives, Organization, and Behavior

L3 Nonprofit Organizations and Public Enterprise L4 Antitrust Issues and Policies

L5 Regulation and Industrial Policy L6 Industry Studies: Manufacturing

L7 Industry Studies: Primary Products and Construction L8 Industry Studies: Services

L9 Industry Studies: Transportation and Utilities

(28)

M1 Business Administration M2 Business Economics

M3 Marketing and Advertising M4 Accouting and Auditing M5 Personnel Economics N Economic History

N0 General

N1 Macroeconomics and Monetary Economics; Growth and Fluctuations N2 Financial Markets and Institutions

N3 Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthrophy

N4 Government, War, Law, International Relations, and Regulation

N5 Agriculture, Natural Resources, Environment, and Extractive Industries N6 Manufacturing and Construction

N7 Transport, Trade, Energy, Technology, and Other Services N8 Micro-Business History

N9 Regional and Urban History

O Economic Development, Technological Change, and Growth O1 Economic Development

O2 Development Planning and Policy

O3 Technological Change; Research and Development; Intellectual Property Rights O4 Economic Growth and Aggregate Productivity

O5 Economywide Country Studies P Economic Systems

P0 General

P1 Capitalist Systems

P2 Socialist Systems and Transitional Economies P3 Socialist Institutions and Their Transitions P4 Other Economic Systems

P5 Comparative Economic Systems

Q Agricultural and Natural Resource Economics; Environmental and Eco-logical Economics

Q0 General Q1 Agriculture

(29)

Q3 Nonrenewable Resources and Conservation Q4 Energy

Q5 Environmental Economics

R Regional, Real Estate, and Transportation Economics R0 General

R1 General Regional Economics R2 Household Analysis

R3 Production Analysis and Firm Location R4 Transportation Systems

R5 Regional Government Analysis Y Miscellaneous Categories

Y1 Data: Tables and Charts Y2 Introductory Material Y3 Book Reviews (unclassified) Y4 Dissertations (unclassified) Y5 Further Reading (unclassified) Y6 Excerpts

Y7 No Author General Discussions Y8 Related Disciplines

Y9 Other

Z Other Special Topics Z0 General

Z1 Cultural Economics; Economic Sociology; Economic Anthropology Z2 Sports Economics: General

Comparing LDA as classification method with JEL classification codes