Covid-19 twitter summary analysis

(1)

Covid-19:

A Twitter summary analysis

(2)

Layout: typeset by the author using LA_TEX.

(3)

Covid-19:

A Twitter summary analysis

Stefan Houben 11000341

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dhr. G. Colavizza

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 907 1098 XG Amsterdam

(4)

Preface

This thesis was written under the supervision of G. Colavizza. The process of collecting data was done in collaboration with fellow student Sarah Bosscha (11291486), parts of the code (mainly in the process of collecting data) have also been written in collaboration with Sarah Bosscha. G. Colavizza has been fully aware and supportive of this throughout the graduation project.

(6)

Abstract

As the global pandemic continues, researchers everywhere work hard to create a better understanding of covid-19 and how it spreads. Some researchers write short summaries to social media platforms such as Twitter. The purpose of this paper is to find out the difference between such Twitter summaries and abstracts of papers, as those normally serve as a short summary. With a labeled database of sentences extracted from Twitter summaries or paper abstracts, a model is trained to correctly classify these sentences. Features are created from sentences using a pre-trained BERT model, then logistic regression is applied to classify the features. Resulting accurracy is 0.9016, indicating that there is a reasonable difference between text from paper abstracts and Twitter summaries. This model can be useful in future work, both to create baseline summaries and to evaluate other summarisation models.

(7)

Chapter 1 Introduction

Since its outbreak, covid-19 has had the entire world in its grip. Different countries react in different ways[29], or with different lockdown methods[14][31]. Researchers and experts all over the world are continuously working on research into covid-19. The World Health Organization (WHO) has a large database which has been used in numerous researches[24][23]. According to WHO’s literature database, over 30,000 papers have been submitted since the first reported case of covid-19. Usu-ally these papers contain an abstract, which serves as a short summary of the paper. However, some of the scientists behind these papers use social media as a means to share additional brief summaries of their work. One of the most popular social media platforms is Twitter[11]. Because Twitter maintains a 140 charac-ter maximum limit per Tweet, some people post multiple Tweets in succession to create what is known as a Twitter thread. Seeing as these are, generally, short summaries uploaded on a social media platform, it is interesting to see whether they are different from papers’ abstracts. And if they are indeed different, is it possible to train a model to learn these differences for future summarisation?

1.1 Main summarisation methods

In Natural Language Processing (NLP), two very distinct different types of sum-marisation are commonly used, extractive and abstractive[18]. The differences between these determine what model or approach is used in this research. It is therefore important to understand these differences.

Extractive summarisation involves reproducing pieces of text from the paper which are likely to contain useful information. These pieces are usually full sentences. This method requires us to define what is considered a "useful" sentence, and a

(8)

model would have to be able to learn that definition. This is known as relevance classification[7]. If it is possible to do this: a model would then pick out a small subset of sentences from the original paper, which would then form a summary. The advantage of this approach is that it copies sentences from the paper, thus skipping the difficult step of generating a sentence on its own. It can be assumed that the original paper contains few to no grammatical errors in its sentences, therefore the extractive summary would also contain grammatically correct sen-tences. However, as a disadvantage, some information of the original paper would likely be lost, and these extracted sentences may not form a coherent text.

Abstractive summarisation involves learning the information written in the paper and then generating new sentences which try to more accurately and efficiently describe the same information. This is what is commonly done when summaris-ing, in an attempt to convey the information within a smaller text while losing as little meaning as possible. However, if the model is to create its own sentences, it is required to more accurately understand the context of the text. Because of this, a more powerful information representation is necessary. One upside of this approach is that it generates summaries that are more coherent and resemble a man-made summary rather than machine-made. However, the main disadvantage is that it requires the model to write its own grammatical sentences, which is still a hard task for Natural Language Processing, even to this day.

1.2 Extractive summarisation approaches

There are several different approaches to extracting important information in a given text, according to Lloret[18]. These approaches are known as statistical-based, topic-statistical-based, graph-statistical-based, discourse-statistical-based, and machine learning-based. Topic-based approaches involve the use of certain words or phrases which helps to determine the relevance of the sentence. Phrases such as "in conclusion" or "the aim of this paper" are mentioned as examples of good indicators. Within this field, many different methods to locate these topics have been developed, however this paper will not go into further detail on these methods.

The oldest approach has been used for over 60 years now. In 1958, Luhn[19] attempted to create abstracts from papers using simple variables such as term frequency to determine which words are useful to the paper. His first step was to remove stop words, such as "a" or "the", since they carry no semantic meaning[19]. After doing this, the most frequently used words that do have semantic meaning were used as a good indicator of important information. Sentences containing

(9)

multiple of these words are likely useful for summarisation. This method is an example of a statistical approach. Other statistical approaches involve term fre-quency and inversed document frefre-quency (tf*idf)[22]. The reasoning behind term frequency is that words that appear more frequently in a given paper compared to other papers in the collection can be important keywords. Inversed document frequency entails that words which are commonly used in the entire collection are discarded as these give no good indication of useful information.

The graph-based approach aims to represent the text as a network of nodes with connections to one another. Each node can represent a word or a full sentence. The topology of this graph can contain useful information for summarisation. LexRank[6] is a summarisation system which utilizes graphs to represent all can-didate sentences that can be included in the summary. It does so by linking sentences which have a sufficiently high similarity. The most central nodes can then be identified, and used to create a short summary.

Discourse-based approaches involve the use of a linguistic point of view, such as exploiting discourse relations. In 1988, Rhetorical Structure Theory[20] (RST) was proposed to represent discourse through nucleus and satellite relations. In 1999, Marcu[21] used RST for a summarisation approach in which the discourse representation was used to determine the importance of different parts of text in a document. Marcu has shown that the nuclei of a text have a strong correlation with what are considered important units of that text. Using discourse relations to find the nuclei of a text can therefore be used to create extractive summaries. Finally, the last approach uses machine learning to create summaries. There are many variations of this approach such as binary classification[15], Hidden Markov Models[4] and Bayesian[2] methods. All of these methods are built upon the idea of creating extractive summaries, as they all attempt to rank sentences based on different variables. By then extracting the highest ranked sentences one can cre-ate a summary. However, machine learning has progressed beyond just extractive summarisation. In 2017, Paulus[25] used a deep reinforced model for abstractive summarisation, achieving a record score of 41.16 with the ROUGE[17] evaluation algorithm.

1.3 NLP models

In NLP there are multiple models which can break down text into a feature rep-resentation for a machine learning algorithm to learn. Of course, it is important for a model to learn the semantic meaning of an entire document, rather than just

(10)

single words. This is one of the largest challenges in NLP. Words can have varying meanings depending on context, so learning individual words without said context is unlikely to yield good results in large pieces of text. For summarisation, a model is required to be able to find "useful" words or phrases, which means the model needs to process not only individual words, but their context as well. To illustrate the necessity of context, a word such as "virus" can be used to describe a computer virus. Yet this same word is also commonly found in covid-19 literature. If the word "virus" alone, without any context, is considered important, the model might pick out sentences with different meanings. Understanding different meanings of a word is required for a model to recognise which sentence is or is not useful in a text.

1.4 BERT

Some state-of-the-art models in NLP can use contexts to learn the correct se-mantic meaning behind a given word. One of such models is BERT, short for Bidirectional Encoder Representations from Transformers[5]. BERT is a powerful language representation model, designed to pre-train deep representations[30] of large collections of unlabeled data. It does so in a bidirectional manner, so it can take both preceding and following text into account when creating each word’s representation. Due to this approach, the model is able to differentiate between the same words in different contexts. Some BERT models have been pre-trained on specific corpora, such as clinical data[1] or biomedical texts[16].

BERT has obtained its status as a powerful NLP model by achieving high scores on some of Natural Language Processing’s hardest tasks, such as scoring 80.5% on the GLUE[32] benchmark, and receiving an accuracy score of 86.7% the MultiNLI[33] corpus, among others. At the time of BERT’s release, these scores were the high-est achieved so far. This shows that contextual information is very relevant to the aforementioned tasks.

BERT models can also be distilled to decrease the model’s size. Knowledge distilla-tion[8] is a compression technique used to train smaller models to closely reproduce the behaviour of larger models, which allows for much smaller models to retain most of the original model’s accuracy. There are multiple different distilled ver-sions of BERT, such as tinyBERT[12] and distilBERT[28]. The tinyBERT model is able to retain 96% of the original model’s accuracy on the GLUE[32] benchmark while being 7.5 times smaller[12]. DistilBERT has 40% fewer parameters, and is therefore 60% faster than the original BERT model[28]. It is shown to retain 97% of BERT’s original accuracy on the GLUE[32] benchmark.

(11)

1.5 GPT-2

GPT-2[26] is another powerful state-of-the-art NLP model that specializes in pre-dicting the next word given the preceding text. It has been trained on a large dataset of 8 million web pages, and it contains 1.5 billion parameters. GPT-2 can perform numerous tasks, such as predicting words or even generating entire bodies of text, making it ideal for abstractive summarisation. This may currently be the most powerful model for future work, especially if abstractive summarisation is shown to be important for the task at hand.

(12)

Chapter 2 Dataset

To define our dataset we first need to define what a Twitter thread is exactly. A Twitter thread is a string of replies used to circumvent Twitter’s low character limit in order to create a longer story, or in this case summary. Some scientists use Twitter to upload summaries about their papers in this manner.

The dataset used in this research consists of both abstracts and Twitter threads. Since neither of these are readily available, data collection became a large part of the actual research.

2.1 Altmetrics

To collect data on papers, Altmetrics’ own application programming interface, or API for short, was used. It requires the user to have a key however, which can some time to obtain. The API itself can provide information on papers such as which journals published them, and ID’s of Twitter users and their tweets regard-ing the paper. Altmetrics also provides DOI’s of these papers, although one would have to search for these themselves, using reliable search engines such as Google Scholar[10]. The Twitter ID’s can be used to search out tweets, and therefore also threads.

Now of course, not every paper has a thread on it. In fact, a lot of the Alt-metrics links contained very few, or even no Twitter IDs at all. A low amount of Twitter activity signifies that there is likely no thread to be found. The number of tweets per Altmetric link can be seen here:

(13)

Figure 2.1: Number of papers by tweet count

The x-axis describes the number of tweets for a paper, the y-axis shows how many papers contain that number of tweets.

As can be seen, there are plenty of papers with little to no Twitter activity, dis-qualifying these from the research. Since some papers have up to 8000 Tweets, searching through these also takes up a lot of time.

Unfortunately, not every Altmetrics link actually contains a paper with an ab-stract. Some links lead to webpages, journals, or even just large figures of data which cannot be used for text processing or classification. Because of this, gath-ering data turned out more difficult than initially anticipated. This is a process which could be improved upon in later work.

2.2 Thread data collection

Collecting threads for analysis was done using Twitter’s API, and a library aimed at making the API easier to use, known as Tweepy[27]. A thread is defined as multiple tweets in a row, each a reply to the previous one. To find threads, the data collection program would, when gives a tweet ID, recursively work its way back to the original tweet at the start of the thread, then determine if it was indeed one account uploading all the tweets. If so, it would be considered for a thread, but would then first be checked to see if its language was indeed English. This proved to be a challenge, as some English papers were written by scientists from non-English speaking countries, who would occasionally upload a thread in a different language. Once a thread has been identified it is saved in a database using MySQL.

(14)

2.3 Dataset preparation

The dataset used in this research consists of abstracts of scientific papers regarding covid-19 and Twitter thread summaries on these papers. From both, full sentences are extracted. Sentences from a thread are labeled with a "1", whereas abstract sentences are labeled with a "0". Data was saved in a csv format for easy future use. The resulting database consists of 729 sentences, 500 of which are thread sen-tences and 229 of which are abstract sensen-tences. The average length of a sentence in a Twitter thread is 73, whereas the average length for abstract sentences was 125 characters.

(15)

Chapter 3 Method

Earlier we discussed the two main approaches to summarisation. In order to ex-tract sentences for a Twitter thread, a model would have to know what seperates those sentences from what is normally written in a paper. To accomplish this task, this research will focus on creating a text classifier which can learn the differences between thread and abstract sentences. This classifier could later be used for ex-tractive summaries.

Some code for this method was adopted from jalammar’s github page[9].

3.1 BERT

For the model, BERT has been chosen to create embeddings from the sentences used in abstracts and threads. A pretrained BERT model is used to create the embeddings. To improve computational efficiency, a distilled version of the BERT model is used, known as distilBERT[28]. The pre-trained distilBERT model cre-ates features from the input text. The features are then matched with the original sentence’s label. The feature-label pairs are used to train a logistic regression model. After the training, the model will be tested on a test set.

(16)

Figure 3.1: Example of BERT features

3.2 Logistic Regression

To classify feature vectors as either 1 or 0, logistic regression (LR) is used. Logistic regression is a method to specifically created for binary classification tasks[34]. It attempts to fit a function that can correctly predict outputs of 1 or 0 based on the given input features. Every iteration, logistic regression attempts to maximize the log likelihood.

For a given feature vector of length n, logistic regression attempts to fit the fol-lowing function: log(_1−pp ) = b0+ b1 ∗ f1+ b2∗ f2... + bn∗ fn

Where b are parameters, f are the values in the feature vector, and p is the proba-bility of a prediction "1". Finding the maximum likelihood entails finding optimal parameters for b for each f in the feature vector, to match the vector to the correct outcome.

(17)

Chapter 4 Training the model

4.1 Preparing data

For training and testing the model, the dataset was split into a training and test set. The training set contains 75% of the original sentences, with the test set containing the remaining 25%. Sentences are first tokenized and padded, using BERT’s own build-in tokenizer and padding algorithm. DistilBERT then uses the prepared sentences to create features. These features are linked to the label of the sentence. This creates the feature-label pairs required for logistic regression.

4.2 The training process

With the feature-label pairs at the ready, the training process can begin. The lo-gistic regression is trained over a varying number of iterations. After each training process, the testing process was done to measure accuracy. The logistic regression model would have to predict the label "1" for a thread sentence, or "0" for an ab-stract sentence. Then the model is trained again, for a larger number of iterations, and the resulting accuracies are all measured.

4.3 Cross Validation

Since the dataset is small, cross validation was applied to train the model five seperate times, each time with different training and test sets, randomised. The average accuracy was calculated over these five seperate tests to give as accurate of a result as possible.

(18)

Chapter 5 Evaluation

Evaluation of the model was done on a test set, similarly labeled to the training set. This test set contained 25% of the sentences in the database. First the BERT model creates features from these sentences. Then the logistic regression model uses these features to predict a label, "0" or "1". The obtained accuracy is equal to the fraction of correct predictions. Training and subsequent testing were done five times in total, and average accuracies were calculated at the end to determine the optimal number of iterations for the model.

Figure 5.1: Average accuracy of five test runs using different numbers of iterations

As can be seen in the figure above, after 7 iterations the model has converged 15

(19)

and hovers around 0.9 accuracy. To be exact, its accuracy at 7 iterations averages out at 0.9016.

Since the dataset contained 500 thread and 229 abstract sentences, its accuracy when only guessing the "1" label would be 0.6859. The achieved accuracy of 0.9016 is significantly higher than this.

(20)

Chapter 6 Conclusion

As shown by the model’s accuracy score of 0.9016, there are clear differences be-tween sentences extracted from Twitter threads and abstracts. One of the larger differences is the average length in number of characters between thread and ab-stract sentences, at 73 and 125 characters. The model’s high accuracy score shows that there are clear differences between thread and abstract sentences, and it is able to learn those to correctly distinguish the two. It is also worth pointing out that while the dataset isn’t balanced, the model still scores much higher accuracy than a model guessing only the more common label.

(21)

Chapter 7 Discussion

7.1 High accuracy

The model’s high accuracy can be explained by multiple reasons. The average sentence length for abstracts is much higher than that of thread sentences, which could be a strong indicator. This difference could be caused by Twitter’s character limit in a single tweet, which is 140 characters. With this limit, it is highly unlikely that the average number of characters in tweets would reach that of the abstract sentences.

Furthermore, another large difference which BERT should be able to pick up on is formality. Abstract sentences are usually formally written, as they are part of a scientific paper. Often these are written in passive form, or some variation that avoids using personal pronouns. Twitter threads are more commonly written in first-person form, as shown here: "1. I believe that if #SARSCoV2 is allowed to spread uncontrolled through an entire nation, it will be an unprecedented human-itarian disaster."[3]. A scientific paper will usually not contain a sentence starting with "I believe", so this is one of the more distinguishable features of a tweet.

7.2 Challenges with the dataset

Processing the Twitter threads turned out more difficult than foreseen. These threads commonly contained characters which are usually not seen in scientific texts, such as "#" and "@". Ideally one would simply remove these characters from the text, however this can create other problems. For example, "#Anti-LockdownTeam" would then be translated "Anti-"#Anti-LockdownTeam", and if one would go further processing text, you may even want to remove the dash to form a

(22)

single word. However this word would now be "AntiLockdownTeam", which is not a word you’d expect to find in a dictionary. It is possible to write ASCII in Twitter as well, which is likely rare in threads, but would cause problems when creating features. All of these problems need to be adressed, so the text needs thorough processing and cleaning. Future research into processing Twitter threads is re-quired to improve datasets for further testing.

Creating a more balanced dataset could also more accurately show the perfor-mance of the model.

7.3 Future work

The model resulting from this research can be used to create extractive summaries from papers by evaluating each sentence and extracting those that would be clas-sified as a thread sentence. These extractive summaries can serve as a baseline[13] for future work.

Further research and work is needed to create models for abstractive summarisa-tion. These models could use aforementioned extracted summaries as a baseline, using ROUGE[17] to compare their accuracies.

Furthermore, since the model is able to distinguish between sentences from pa-pers and threads, it may also be used to evaluate thread summaries created by other models, as it can decide whether another model produces text that would fit in a Twitter thread. Future research could utilise the model created in this paper to accurately determine their program’s capabilities.

(23)

Bibliography

[1] Emily Alsentzer et al. “Publicly available clinical BERT embeddings”. In: arXiv preprint arXiv:1904.03323 (2019).

[2] Chinatsu Aone, Mary Ellen Okurowski, and James Gorlinsky. “Trainable, scalable summarization using robust NLP and machine learning”. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1. 1998, pp. 62–66.

[3] Carl T. Bergstrom. “"1. I believe that if #SARSCoV2 is allowed to spread un-controlled through an entire nation, it will be an unprecedented humanitarian disaster. Every possible measure should be taken to prevent this from hap-pening. Yet in some countries this may be unavoidable."”. In: (2020). url: https://twitter.com/CT_Bergstrom/status/1252075528711860224. [4] John M Conroy and Dianne P O’leary. “Text summarization via hidden

markov models”. In: Proceedings of the 24th annual international ACM SI-GIR conference on Research and development in information retrieval. 2001, pp. 406–407.

[5] Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for language understanding”. In: arXiv preprint arXiv:1810.04805 (2018). [6] Günes Erkan and Dragomir R Radev. “Lexrank: Graph-based lexical

central-ity as salience in text summarization”. In: Journal of artificial intelligence research 22 (2004), pp. 457–479.

[7] Ben Hachey and Claire Grover. “Extractive summarisation of legal texts”. In: Artificial Intelligence and Law 14.4 (2006), pp. 305–345.

[8] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network”. In: arXiv preprint arXiv:1503.02531 (2015).

(24)

[9] H. Antonelli J. Lammar. A visual Notebook to using BERT for the first time. 2020. url: https://github.com/jalammar/jalammar.github.io/blob/ master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_ First_Time.ipynb.

[10] Péter Jacsó. “Google scholar revisited”. In: Online information review (2008). [11] Bernard J Jansen et al. “Twitter power: Tweets as electronic word of mouth”. In: Journal of the American society for information science and technology 60.11 (2009), pp. 2169–2188.

[12] Xiaoqi Jiao et al. “Tinybert: Distilling bert for natural language understand-ing”. In: arXiv preprint arXiv:1909.10351 (2019).

[13] Virapat Kieuvongngam, Bowen Tan, and Yiming Niu. “Automatic Text Sum-marization of COVID-19 Medical Research Articles using BERT and GPT-2”. In: arXiv preprint arXiv:2006.01997 (2020).

[14] Malouke Esra Kuiper et al. “The intelligent lockdown: Compliance with COVID-19 mitigation measures in the Netherlands”. In: Available at SSRN 3598215 (2020).

[15] Julian Kupiec, Jan Pedersen, and Francine Chen. “A trainable document summarizer”. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 1995, pp. 68– 73.

[16] Jinhyuk Lee et al. “BioBERT: a pre-trained biomedical language represen-tation model for biomedical text mining”. In: Bioinformatics 36.4 (2020), pp. 1234–1240.

[17] Chin-Yew Lin. “ROUGE: A Package for Automatic Evaluation of Sum-maries”. In: Text Summarization Branches Out. Barcelona, Spain: Associ-ation for ComputAssoci-ational Linguistics, July 2004, pp. 74–81. url: https : //www.aclweb.org/anthology/W04-1013.

[18] Elena Lloret and Manuel Palomar. “Text summarisation in progress: a liter-ature review”. In: Artificial Intelligence Review 37.1 (2012), pp. 1–41. [19] H. P. Luhn. “The Automatic Creation of Literature Abstracts”. In: IBM

Journal of Research and Development 2.2 (1958), pp. 159–165.

[20] William C Mann and Sandra A Thompson. “Rhetorical structure theory: Toward a functional theory of text organization”. In: Text 8.3 (1988), pp. 243– 281.

[21] Daniel Marcu. “Discourse trees are good indicators of importance in text”. In: Advances in automatic text summarization 293 (1999), pp. 123–136.

(25)

[22] Victoria McCargar. “Statistical approaches to automatic text summariza-tion”. In: Bulletin of the american society for information science and tech-nology 30.4 (2004), pp. 21–25.

[23] World Health Organization et al. “Global prevalence of vitamin A deficiency in populations at risk 1995-2005: WHO global database on vitamin A defi-ciency”. In: (2009).

[24] World Health Organization et al. “Worldwide prevalence of anaemia 1993-2005: WHO global database on anaemia.” In: (2008).

[25] Romain Paulus, Caiming Xiong, and Richard Socher. “A deep reinforced model for abstractive summarization”. In: arXiv preprint arXiv:1705.04304 (2017).

[26] Alec Radford et al. “Language models are unsupervised multitask learners”. In: OpenAI Blog 1.8 (2019), p. 9.

[27] Joshua Roesslein. “tweepy Documentation”. In: Online] http://tweepy. readthe-docs. io/en/v3 5 (2009).

[28] Victor Sanh et al. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter”. In: arXiv preprint arXiv:1910.01108 (2019).

[29] Janice Hopkins Tanne et al. “Covid-19: how doctors and healthcare systems are tackling coronavirus worldwide”. In: Bmj 368 (2020).

[30] Ian Tenney, Dipanjan Das, and Ellie Pavlick. “BERT rediscovers the classical NLP pipeline”. In: arXiv preprint arXiv:1905.05950 (2019).

[31] Aurelio Tobıas et al. “Changes in air quality during the lockdown in Barcelona (Spain) one month into the SARS-CoV-2 epidemic”. In: Science of the Total Environment (2020), p. 138540.

[32] Alex Wang et al. “Glue: A multi-task benchmark and analysis platform for natural language understanding”. In: arXiv preprint arXiv:1804.07461 (2018).

[33] Adina Williams, Nikita Nangia, and Samuel R Bowman. “A broad-coverage challenge corpus for sentence understanding through inference”. In: arXiv preprint arXiv:1704.05426 (2017).

[34] Ji Zhu and Trevor Hastie. “Kernel logistic regression and the import vec-tor machine”. In: Advances in neural information processing systems. 2002, pp. 1081–1088.

Covid-19 twitter summary analysis

Covid-19:

A Twitter summary analysis

Covid-19:

A Twitter summary analysis

Contents

Preface

Abstract

Chapter 1

Introduction

1.1

Main summarisation methods

1.2

Extractive summarisation approaches

1.3

NLP models

1.4

BERT

1.5

GPT-2

Chapter 2

Dataset

2.1

Altmetrics

2.2

Thread data collection

2.3

Dataset preparation

Chapter 3

Method

3.1

BERT

3.2

Logistic Regression

Chapter 4

Training the model

4.1

Preparing data

4.2

The training process

4.3

Cross Validation

Chapter 5

Evaluation

Chapter 6

Conclusion

Chapter 7

Discussion

7.1

High accuracy

7.2

Challenges with the dataset

7.3

Future work

Bibliography