• No results found

5.4 Binary Classification

5.4.2 Results

Table 21 demonstrates the results of the binary classification of successful or unsuccessful discussions using SVM classifier and the fine-tuned Longformer PLM. The table illustrates that when discussions end with the finalization label, both SVM and Longformer models perform similarly in the Setting 1, with F1-Scores of 0.66 and 0.65 respectively. Additionally, for this setting where the input data is short texts, we fine-tuned a BERT model which also showed comparable performance with scores of 0.66 across all evaluation metrics. In contrast, when the discussion text is used as input (Setting 2), the Longformer classifier performs better with an F1-Score of 0.71, while the SVM classifier performs worse with an F1-Score of 0.52. The models show roughly similar behaviour when we combined texts and classes as input. Surprisingly, when we remove the ending utterances with finalization label, the performance of the Longformer classifier increased in setting 1 and setting 2 by 0.05 in F1-Score. However, in the Setting 3 the performance dropped to 0.64, showing the combination of texts and classes is not helpful in this way. The Longformer model demonstrates superior performance in setting 2 when finalization is not considered, further research is needed to understand the reasons.

One possibility is that the discussion texts contain more indicators of the success of a discussion, for instance, the word ”RfC” appears more frequently in successful discussions than in unsuccessful ones.

Table 21: Binary classification experiment results using SVM model as the baseline and fine-tuned Longformer PLM. Setting 1: Only classes as input data, Setting 2: Only discussion texts as input data, Setting 3: A concatenation of discussion texts and transformed classes as input data.

Baseline Longformer

Dimension/Classes Precision Recall F1-Score Precision Recall F1-Score Including finalization

Setting 1 0.66 0.66 0.66 0.66 0.65 0.65

Setting 2 0.52 0.52 0.52 0.74 0.72 0.71

Setting 3 0.60 0.60 0.60 0.72 0.72 0.72

Excluding finalization

Setting 1 0.61 0.61 0.61 0.72 0.70 0.70

Setting 2 0.52 0.52 0.52 0.76 0.76 0.76

Setting 3 0.53 0.53 0.53 0.66 0.65 0.64

6 Conclusion

In this study, we have aimed to identify effective deliberative strategies that can be used in online discussions in order to stop them from running off the rails. We consider a deliberative argumentation strategy to be a sequence of moves that participants take to push the discussion forward [3]. In this chapter, we discuss the thesis research questions in light of the findings in Chapter 5.

To recall, the research questions are as follows:

Q1. How to identify the deliberative strategy in discussions using argumentative attributes?

Q2. How to analyze the identified strategies, distinguishing successful from unsuccessful ones?

Q3. How to predict the success of a discussion based on the used strategies?

Our source of deliberative discussions is Wikipedia talk pages, where editors discuss changes they have made or proposed, suggest new content, edit disagreements, or challenge cited sources to reach a consensus on the accuracy of an article. The themes of the discussions and disputes on Wikipedia talk pages are generally in line with the deliberation process where providing reasons, references, and attaining consensus are important actions.

We have addressed the first research question by investigating three argumentative attributes: dis-course act, argumentative relation, and frame. We have developed three classifiers to predict each argumentative attribute of discussion utterances. The classifier is trained on a corpus that comprises Wikipedia discussion utterances that are labeled for the three attributes [3]. Adopting supervised learning, alongside fine-tuning BERT and RoBERTa PLMs, we have employed the prompting ap-proach [76] to address the lack of labeled data in some subsets of the corpus. Our experiments, however, show that the prompt-based learning paradigm does not provide any advantages in this task.

Still, our best-performing models demonstrated substantial enhancement in performance compared to the baselines and the models developed in [3].

For the second research question, we used the RfC-Predecessor pairs corpus [23] consisting of 421 pairs of topical controlled discussions (both successful and unsuccessful) from Wikipedia talk pages.

Applying the best-performing classifiers to the discussions in this corpus, we extracted the most com-mon successful and unsuccessful strategies using an efficient sequential pattern mining algorithm called PrefixSpan [80]. We carried out a deep analysis of successful and unsuccessful deliberative strategies, which demonstrates the important role of the three argumentative attributes in identifying effective deliberative strategies. Our analysis of the discourse act dimension indicates that the most frequent actions in successful patterns are ”finalization” and ”recommendation”, while in unsuccess-ful patterns, ”questioning” actions predominate. In regards to the argumentative relation dimension, our analysis illustrates that having more than one ”attacking” utterance can negatively impact the success of the discussion. Besides, in terms of the frame dimension, making ”dialogue” actions and providing ”neutral” points of view are more common in successful conversations. In contrast, in unsuccessful discussions, participants tend to mostly focus on the ”verifiability” of the content. In addition, we have analyzed 3-dimensional strategies in successful and unsuccessful discussions. We found that (”finalization”, ”support”, ”dialogue”) is the most prevalent 3-dimensional strategy in suc-cessful deliberations. In such discussions, ”attack” label is always surrounded by productive discourse act labels such as ”recommendation” or ”understanding”, and effective frame labels like ”dialogue”

or ”neutral”. More precisely, if an ”attack” action occurs in a particular discussion, it is critical to be followed by a support (argumentative relation) and dialogue (frame) labeled utterance to avoid discussion derailment.

Our findings can be useful in the development of a tool for forecasting conversation derailment [19], where the goal is to anticipate the possibility of conflicts in online discussions before it occurs. Con-sidering the frequent successful and unsuccessful strategies that we found, the tool can dynamically analyze the current conversation moves and provide early warning for prospective conversation de-railment that will occur later. It can also assist users to follow a productive discussion strategy step by step to achieve consensus and avoid failure.

For the third research question, we have attempted to predict the success of discussions based on the identified strategies, the discussion text, and a combination of text and strategies. To accomplish this, we fine-tuned the Longformer PLM [81], which has a linearly scalable attention mechanism that can handle long texts. Our results indicate that strategies identified using the three argumentative attributes (discourse act, argumentative relation, and frame) can be used to predict the success or failure of a discussion. However, the text alone achieved the best results.

Future Works

In light of the findings of our study, we would like to make some recommendations for further re-search. In this study, we only examined the role of three argumentative attributes, namely discourse act, argumentative relation, and frame. Online user comments frequently include arguments that lack proper reasoning. To evaluate the sufficiency of the supporting evidence and, thus, the strength of the entire argument, it can be helpful to recognize such assertions as parts of an argument and to choose the proper sorts of support. Park and Cardie (2104) proposed the task of classifying the verifiability of a proposition, where the proper form of support is reason, evidence, and optional evidence. One related research direction that can be pursued in the future is to consider the verifiabitliy attribute in identifying and analyzing deliberative strategies.

As stated in Section 2.2.2, the WikiConv corpus [9] includes 90,930,244 conversations, whereas the RfC-Predecessor corpus only comprises 842 conversations. Developing a model or pipeline that can expand the corpus while still maintaining its controlled topical property could significantly improve the results of this study. This can be achieved by creating a method to automatically identify and ex-tract relevant conversations from WikiConv corpus [9] or any other larger pool of data, and then apply the same annotation and classification process used in the RfC-Predecessor corpus. This would allow for a larger sample size and a more diverse set of data, which in turn would provide more robust and generalizable results. Furthermore, this will also allow researchers to analyze and compare the per-formance of the models over a broader range of data and to identify the variations and commonalities in the conversations.

In this study, we focused on forecasting the outcome of a discussion by identifying its strategies.

However, another potential future direction for this research could be to develop a tool that can predict the likelihood of a discussion derailing, based on the identified strategies at a specific point in the conversation. This could involve creating a model that is able to predict the success or failure of a discussion, given a portion of the conversation’s strategy up to a certain point in the discussion.

Bibliography

[1] D. Walton, “Types of dialogue and burdens of proof.,” Frontiers in Artificial Intelligence and Applications, vol. 216, pp. 13–24, 2010.

[2] A. Kittur, B. Suh, B. Pendleton, and E. Chi, “He says, she says: Conflict and coordination in wikipedia,” in Conference on Human Factors in Computing Systems - Proceedings, 04 2007.

[3] K. Al-Khatib, H. Wachsmuth, K. Lang, J. Herpel, M. Hagen, and B. Stein, “Modeling delibera-tive argumentation strategies on Wikipedia,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Melbourne, Australia), pp. 2545–2555, Association for Computational Linguistics, July 2018.

[4] S. Wolf, “Investigating jury deliberation in a capital murder case,” Small Group Research, vol. 41, no. 4, pp. 380–385, 2010.

[5] J. Fishkin, When the people speak: Deliberative democracy and public consultation. Oup Ox-ford, 2009.

[6] P. Arag´on, A. Kaltenbrunner, A. Calleja-L´opez, A. Pereira, A. Monterde, X. E. Barandiaran, and V. G´omez, “Deliberative platform design: The case study of the online discussions in decidim barcelona,” CoRR, vol. abs/1707.06526, 2017.

[7] L. Oswald, “Automating the analysis of online deliberation? a comparison of manual and com-putational measures applied to climate change discussions,” May 2022.

[8] N. Klemp and A. Forcehimes, “From town-halls to wikis: Exploring wikipedia’s implications for deliberative democracy,” Journal of Deliberative Democracy, vol. 6, 09 2010.

[9] Y. Hua, C. Danescu-Niculescu-Mizil, D. Taraborelli, N. Thain, J. Sorensen, and L. Dixon, “Wi-kiConv: A corpus of the complete conversational history of a large online collaborative com-munity,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (Brussels, Belgium), pp. 2818–2823, Association for Computational Linguistics, Oct.-Nov. 2018.

[10] M. Faruqui, E. Pavlick, I. Tenney, and D. Das, “WikiAtomicEdits: A multilingual corpus of Wikipedia edits for modeling language and discourse,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (Brussels, Belgium), pp. 305–315, As-sociation for Computational Linguistics, Oct.-Nov. 2018.

[11] S. Reese, G. Boleda, M. Cuadros, L. Padr´o, and G. Rigau, “Wikicorpus: A word-sense disam-biguated multilingual Wikipedia corpus,” in Proceedings of the Seventh International Confer-ence on Language Resources and Evaluation (LREC’10), (Valletta, Malta), European Language Resources Association (ELRA), May 2010.

[12] A. Ghaddar and P. Langlais, “WikiCoref: An English coreference-annotated corpus of Wikipedia articles,” in Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC’16), (Portoroˇz, Slovenia), pp. 136–142, European Language Resources Association (ELRA), May 2016.

[13] D. Roy, S. Bhatia, and P. Jain, “A topic-aligned multilingual corpus of Wikipedia articles for studying information asymmetry in low resource languages,” in Proceedings of the Twelfth Lan-guage Resources and Evaluation Conference, (Marseille, France), pp. 2373–2380, European Language Resources Association, May 2020.

[14] A. Ghaddar and P. Langlais, “WiNER: A Wikipedia annotated corpus for named entity recog-nition,” in Proceedings of the Eighth International Joint Conference on Natural Language Pro-cessing (Volume 1: Long Papers), (Taipei, Taiwan), pp. 413–422, Asian Federation of Natural Language Processing, Nov. 2017.

[15] V. Prabhakaran and O. Rambow, “A corpus of Wikipedia discussions: Over the years, with topic, power and gender labels,” in Proceedings of the Tenth International Conference on Lan-guage Resources and Evaluation (LREC’16), (Portoroˇz, Slovenia), pp. 2034–2038, European Language Resources Association (ELRA), May 2016.

[16] J. Im, A. Zhang, C. Schilling, and D. Karger, “Deliberation and resolution on wikipedia: A case study of requests for comments,” in Proceedings of the ACM on Human-Computer Interaction, vol. 2, pp. 1–24, 11 2018.

[17] C. De Kock and A. Vlachos, “I beg to differ: A study of constructive disagreement in online conversations,” in Proceedings of the 16th Conference of the European Chapter of the Associ-ation for ComputAssoci-ational Linguistics: Main Volume, (Online), pp. 2017–2027, AssociAssoci-ation for Computational Linguistics, Apr. 2021.

[18] J. P. Chang and C. Danescu-Niculescu-Mizil, “Trouble on the horizon: Forecasting the derail-ment of online conversations as they develop,” CoRR, vol. abs/1909.01362, 2019.

[19] J. Zhang, J. Chang, C. Danescu-Niculescu-Mizil, L. Dixon, Y. Hua, D. Taraborelli, and N. Thain,

“Conversations gone awry: Detecting early signs of conversational failure,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Melbourne, Australia), pp. 1350–1361, Association for Computational Linguistics, July 2018.

[20] M. Karan and J. ˇSnajder, “Preemptive toxic language detection in Wikipedia comments using thread-level context,” in Proceedings of the Third Workshop on Abusive Language Online, (Flo-rence, Italy), pp. 129–134, Association for Computational Linguistics, Aug. 2019.

[21] J. Pavlopoulos, J. Sorensen, L. Dixon, N. Thain, and I. Androutsopoulos, “Toxicity detection:

Does context really matter?,” CoRR, vol. abs/2006.00998, 2020.

[22] J. P. Chang, C. Chiam, L. Fu, A. Wang, J. Zhang, and C. Danescu-Niculescu-Mizil, “ConvoKit:

A toolkit for the analysis of conversations,” in Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (1st virtual meeting), pp. 57–60, Association for Computational Linguistics, July 2020.

[23] E. J. Schmidt, “A new controlled dataset for investigating deliberation on wikipedia,” bachelor’s thesis, Leipzig University, 2022.

[24] J. R. Searle, Speech Acts: An Essay in the Philosophy of Language. Cambridge, London: Cam-bridge University Press, 1969.

[25] A. Peldszus and M. Stede, “From argument diagrams to argumentation mining in texts: A sur-vey,” Int. J. Cogn. Informatics Nat. Intell., vol. 7, pp. 1–31, 2013.

[26] Levin, Schneider, and Gaeth, “All frames are not created equal: A typology and critical analysis of framing effects.,” Organizational behavior and human decision processes, vol. 76 2, pp. 149–

188, 1998.

[27] J. Lawrence and C. Reed, “Argument Mining: A Survey,” Computational Linguistics, vol. 45, pp. 765–818, 01 2020.

[28] H. Wachsmuth, M. Potthast, K. Al-Khatib, Y. Ajjour, J. Puschmann, J. Qu, J. Dorsch, V. Morari, J. Bevendorff, and B. Stein, “Building an argument search engine for the web,” in Proceedings of the 4th Workshop on Argument Mining, (Copenhagen, Denmark), pp. 49–59, Association for Computational Linguistics, Sept. 2017.

[29] M. Samadi, P. Talukdar, M. Veloso, and M. Blum, “Claimeval: Integrated and flexible frame-work for claim evaluation using credibility of sources,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, p. 222–228, AAAI Press, 2016.

[30] L. Wang and W. Ling, “Neural network-based abstract generation for opinions and arguments,”

in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (San Diego, California), pp. 47–57, Association for Computational Linguistics, June 2016.

[31] C. Stab and I. Gurevych, “Annotating argument components and relations in persuasive es-says,” in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, (Dublin, Ireland), pp. 1501–1510, Dublin City University and Association for Computational Linguistics, Aug. 2014.

[32] D. N. Walton and E. C. W. Krabbe, Commitment in Dialogue: Basic Concepts of Interpersonal Reasoning. Albany, NY, USA: State University of New York Press, 1995.

[33] P. McBurney, D. Hitchcock, and S. Parsons, “The eightfold way of deliberation dialogue,” In-ternational Journal of Intelligent Systems, vol. 22, no. 1, pp. 95–132, 2007.

[34] D. Friess and C. Eilders, “A systematic review of online deliberation research,” Policy Internet, vol. 7, 08 2015.

[35] J. FISHKIN, A. SIU, L. DIAMOND, and N. BRADBURN, “Is deliberation an antidote to ex-treme partisan polarization? reflections on “america in one room”,” American Political Science Review, vol. 115, no. 4, p. 1464–1481, 2021.

[36] D. Chen, A. Fisch, J. Weston, and A. Bordes, “Reading Wikipedia to answer open-domain questions,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Vancouver, Canada), pp. 1870–1879, Association for Computational Linguistics, July 2017.

[37] A. Caines, S. Pastrana, A. Hutchings, and P. Buttery, “Aggressive language in an online hacking forum,” in Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), (Brussels, Belgium), pp. 66–74, Association for Computational Linguistics, Oct. 2018.

[38] F. Wu and D. S. Weld, “Open information extraction using Wikipedia,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (Uppsala, Sweden), pp. 118–127, Association for Computational Linguistics, July 2010.

[39] M. Althobaiti, U. Kruschwitz, and M. Poesio, “Automatic creation of Arabic named entity an-notated corpus using Wikipedia,” in Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, (Gothen-burg, Sweden), pp. 106–115, Association for Computational Linguistics, Apr. 2014.

[40] J. A. Botha, M. Faruqui, J. Alex, J. Baldridge, and D. Das, “Learning to split and rephrase from Wikipedia edit history,” in Proceedings of the 2018 Conference on Empirical Methods in Nat-ural Language Processing, (Brussels, Belgium), pp. 732–737, Association for Computational Linguistics, Oct.-Nov. 2018.

[41] C. M. E. Stab, Argumentative Writing Support by means of Natural Language Processing. PhD thesis, Technische Universit¨at Darmstadt, Darmstadt, 2017.

[42] E. Guest, B. Vidgen, A. Mittos, N. Sastry, G. Tyson, and H. Margetts, “An expert annotated dataset for the detection of online misogyny,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, (Online), pp. 1336–1350, Association for Computational Linguistics, Apr. 2021.

[43] A. X. Zhang, B. Culbertson, and P. K. Paritosh, “Characterizing online discussion using coarse discourse sequences,” 2017.

[44] C. Tan, V. Niculae, C. Danescu-Niculescu-Mizil, and L. Lee, “Winning arguments: In-teraction dynamics and persuasion strategies in good-faith online discussions,” CoRR, vol. abs/1602.01103, 2016.

[45] A. Giachanou and F. Crestani, “Opinion retrieval in twitter: Is proximity effective?,” in Pro-ceedings of the 31st Annual ACM Symposium on Applied Computing, SAC ’16, (New York, NY, USA), p. 1146–1151, Association for Computing Machinery, 2016.

[46] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji oc-currences to learn any-domain representations for detecting sentiment, emotion and sarcasm,”

in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2017.

[47] R. Meir and G. R¨atsch, An Introduction to Boosting and Leveraging, vol. 2600, pp. 119–184. 01 2003.

[48] J. Park and C. Cardie, “Identifying appropriate support for propositions in online user com-ments,” in Proceedings of the First Workshop on Argumentation Mining, (Baltimore, Maryland), pp. 29–38, Association for Computational Linguistics, June 2014.

[49] J. Park and C. Cardie, “A corpus of eRulemaking user comments for measuring evaluability of arguments,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), (Miyazaki, Japan), European Language Resources Association (ELRA), May 2018.

[50] J. Park, C. Blake, and C. Cardie, “Toward machine-assisted participation in erulemaking: An argumentation model of evaluability,” in Proceedings of the 15th International Conference on Artificial Intelligence and Law, ICAIL ’15, (New York, NY, USA), p. 206–210, Association for Computing Machinery, 2015.

[51] V. Niculae, Learning Deep Models with Linguistically-Inspired Structure. PhD thesis, Cornell University, Darmstadt, 2018.

[52] E. M. Bender, J. T. Morgan, M. Oxley, M. Zachry, B. Hutchinson, A. Marin, B. Zhang, and M. Ostendorf, “Annotating social acts: Authority claims and alignment moves in Wikipedia talk pages,” in Proceedings of the Workshop on Language in Social Media (LSM 2011), (Portland, Oregon), pp. 48–57, Association for Computational Linguistics, June 2011.

[53] C. Danescu-Niculescu-Mizil, L. Lee, B. Pang, and J. M. Kleinberg, “Echoes of power: Language effects and power differences in social interaction,” CoRR, vol. abs/1112.3670, 2011.

[54] L. Wang and C. Cardie, “Improving agreement and disagreement identification in online discus-sions with a socially-tuned sentiment lexicon,” in Proceedings of the 5th Workshop on Computa-tional Approaches to Subjectivity, Sentiment and Social Media Analysis, (Baltimore, Maryland), pp. 97–106, Association for Computational Linguistics, June 2014.

[55] V. Prabhakaran and O. Rambow, “A corpus of Wikipedia discussions: Over the years, with topic, power and gender labels,” in Proceedings of the Tenth International Conference on Lan-guage Resources and Evaluation (LREC’16), (Portoroˇz, Slovenia), pp. 2034–2038, European Language Resources Association (ELRA), May 2016.

[56] E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” CoRR, vol. abs/1610.08914, 2016.

[57] J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,”

in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Process-ing (EMNLP), (Doha, Qatar), pp. 1532–1543, Association for Computational LProcess-inguistics, Oct.

2014.

[58] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[59] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (San Diego, California), pp. 1480–1489, Association for Computational Linguistics, June 2016.

[60] D. Kumar, R. Cohen, and L. Golab, “Online abuse detection: the value of preprocessing and neu-ral attention models,” in Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, (Minneapolis, USA), pp. 16–24, Association for Computational Linguistics, June 2019.

[61] T. Davidson, D. Warmsley, M. W. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” CoRR, vol. abs/1703.04009, 2017.

[62] N. Safi Samghabadi, P. Patwa, S. PYKL, P. Mukherjee, A. Das, and T. Solorio, “Aggression and misogyny detection using BERT: A multi-task approach,” in Proceedings of the Second Work-shop on Trolling, Aggression and Cyberbullying, (Marseille, France), pp. 126–131, European Language Resources Association (ELRA), May 2020.

[63] A. Muti and A. Barr´on-Cede˜no, “A checkpoint on multilingual misogyny identification,” in Pro-ceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, (Dublin, Ireland), pp. 454–460, Association for Computational Linguistics, May 2022.

[64] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2018. cite arxiv:1810.04805Comment: 13 pages.

[65] Y. Kementchedjhieva and A. Søgaard, “Dynamic forecasting of conversation derailment,” CoRR, vol. abs/2110.05111, 2021.

[66] S. Bird and E. Loper, “NLTK: The natural language toolkit,” in Proceedings of the ACL Inter-active Poster and Demonstration Sessions, (Barcelona, Spain), pp. 214–217, Association for Computational Linguistics, July 2004.

[67] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret-tenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[68] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional trans-formers for language understanding,” CoRR, vol. abs/1810.04805, 2018.

[69] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov, “Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol. abs/1907.11692, 2019.

[70] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–

297, 1995.

[71] Z. Liu, X. Lv, K. Liu, and S. Shi, “Study on svm compared with the other text classification methods,” 01 2010.

[72] Y. Zhu, R. Kiros, R. S. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler, “Align-ing books and movies: Towards story-like visual explanations by watch“Align-ing movies and read“Align-ing books,” CoRR, vol. abs/1506.06724, 2015.

[73] T. H. Trinh and Q. V. Le, “A simple method for commonsense reasoning,” CoRR, vol. abs/1806.02847, 2018.

[74] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Lan-guage models are few-shot learners,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 1877–1901, Curran Associates, Inc., 2020.

[75] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J.

Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” CoRR, vol. abs/1910.10683, 2019.

[76] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and pre-dict: A systematic survey of prompting methods in natural language processing,” ArXiv, vol. abs/2107.13586, 2021.

[77] R. Agrawal and R. Srikant, “Mining sequential patterns,” Proceedings of the Eleventh Interna-tional Conference on Data Engineering, pp. 3–14, 1995.

[78] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance im-provements,” in Advances in Database Technology — EDBT ’96 (P. Apers, M. Bouzeghoub, and G. Gardarin, eds.), (Berlin, Heidelberg), pp. 1–17, Springer Berlin Heidelberg, 1996.

[79] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu, “Freespan: Frequent pattern-projected sequential pattern mining,” in Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(R. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa, R. Ramakrishnan, S. Stolfo, R. Bayardo, and I. Parsa, eds.), Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355–

359, 2000. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) ; Conference date: 20-08-2000 Through 23-08-2000.

[80] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “Mining sequential patterns by pattern-growth: the prefixspan approach,” IEEE Transactions on Knowl-edge and Data Engineering, vol. 16, no. 11, pp. 1424–1440, 2004.

[81] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,” CoRR, vol. abs/2004.05150, 2020.

[82] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew, “Huggingface’s transformers: State-of-the-art natural language pro-cessing,” CoRR, vol. abs/1910.03771, 2019.

[83] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.

[84] N. Ding, S. Hu, W. Zhao, Y. Chen, Z. Liu, H. Zheng, and M. Sun, “OpenPrompt: An open-source framework for prompt-learning,” in Proceedings of the 60th Annual Meeting of the Asso-ciation for Computational Linguistics: System Demonstrations, (Dublin, Ireland), pp. 105–113, Association for Computational Linguistics, May 2022.

[85] T. L. Scao and A. M. Rush, “How many data points is a prompt worth?,” CoRR, vol. abs/2103.08493, 2021.

[86] H. Wachsmuth, J. Kiesel, and B. Stein, “Sentiment flow - a general model of web review argu-mentation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 601–611, Association for Computational Linguistics, Sept.

2015.