Final Thesis

(1)

University of Amsterdam

Faculty of Science

Graduate School of Informatics

Trying to count consumer’s sentiment

Author: Ilia Gkouma

Student Number: 11831774 University of Amsterdam Faculty of Science

Thesis Master Information Studies: Business Information Systems Final version: 9-7-2018

Supervisor: Tom Van Engers Examiner: Frank Nack

(2)

Acknowledgements

The decision I took to come to Amsterdam was the wisest decision I have ever made. The experiences I had, the people I met, the stimulus I received will be forever etched in my mind and soul. This year, I did not only obtained knowledge in my field of study but also I did change my whole mindset as person and the most important is that now I know what I want to do in the next years of my life.

During the 5 months of my research, with the constant help of my supervisor Pr. Dr. Tom Van Engers, I achieved my thesis. As an innovator and patient supervisor, he actively provided me with constant help and advices. I would like to mention that I will never forget the first lecture at the Science Park, and from the first moment I knew that he will be one of the most influential people I will ever meet. I would like to thank him for everything he offered me and express my respect and gratitude to him.

Moreover, for my research, I followed an internship at the Social Media Hub of Royal Dutch KLM Airlines. I would like to thank my manager, Mrs. Carolijn Hauwert for her constant help. She helped me adapt easily in the working environment, she made me feel comfortable and guide me towards my goals. She influenced me as a person and I feel very lucky to meet her. To me, Mrs. Hauwert reach the ideal manager an employee can have.

I would like to express my gratitude to my old friend Leandros Papadomanolakis, who despite the fact that many miles keep us apart he always supports and trusts me. My thanks also to my friend, Miss Vasiliki Kalaitzi, for sharing this wonderful year with me, and for her support and advices during easy and difficult times. I would like to thank also my dear friend, Mr. Dimitris Gounaris, for his support and constant encouragement.

I would like to express my respect and thank to my brother-in-law, Mr. Victor Lemmens who triggered my interest and skills in IT and trusted me even in some difficult times in my life. His support and trust made me a stronger and surer person. His trust has priceless value to me.

In the end, I would like to express my gratitude to my parents and sister. With their unconditional understanding and trust, I came here and finished my studies. Without them I could never become the person I am today. They teach me with their experience,

(3)

support my decisions and do whatever they can in order to help me. No word can deliver my love to them.

To conclude, I would like to dedicate this thesis to my father, Themis.

Ilia Gkouma 7 June, 2018 Amsterdam

Abstract

This thesis focuses on how social media affects consumer’s sentiment towards a company which further impacts the customer satisfaction process. Firstly, a literature review and analysis gives thorough insights on sentiment analysis. Secondly, a sentiment analysis is conducted where scores are measured with the help of TextBlob. Based on the results of this experiment, a statistical analysis follows in order to interpret the results. Eventually, the result is that social media sentiment is possible to have impacts on customer satisfaction.

Keywords

Sentiment Analysis, Customer Satisfaction, Social Media, TextBlob, Natural Language Processing

(4)

(5)

Table of Conten

Acknowledgements...i Abstract...ii Keywords...ii 1. Introduction...1 1.1. Motivation...1 1.2. Research Question...2 1.3. Research Structure...3 2. Theoretical Background...3 2.1. Sentiment Analysis...3

2.2. Sentiment Analysis within a company...5

2.3. Case Study...6

3. Research Method...7

3.1. Introduction to Experiment objects...7

3.2. Data Collection...8 3.3. Sentiment Analysis...8 3.4. Research Hypothesis...9 4. Research Results...10 4.1. Experiment Results...10 4.2. Discussion...15 4.2.1. Conclusions...15 4.2.2. Limitation...17 4.2.3. Future Work...17 References...18 Appendix...20

A.1 Script for converting csv files to json...20

A.2 Script for Sentiment Analysis...20

A.3 Script for Correlation and Graphs...21

A.4 Script for Case Topic Study and Distribution Graphs...23

List of TablesY Table 1 Active User's social media accounts on January 2018...1

Table 2 Case volume September 2017 - March 2018)...7

Table 3 Data fields...8

(6)

Table 5 Average NPS and Sentiment Score of different social media channels....12

Table 6 Correlation for Case Origins...12

Table 9 NPS Status and Sentiment Score Status of different social media channels14 Table 10 Distribution details for NPS and Sentiment Score...14

Table 11 Statistical Assessment for NPS, New NPS and Sentiment Score...15

List of Figure Figure 1 Message to customers for NPS (Source: Social Media Hub, KLM)...6

Figure 2 Calculation of NPS (Source: Social Media Hub, KLM)...7

Figure 3 Example of text processed by TextBlob in Python...9

Figure 4 Fluctuation trends of NPS for Case Origins...13

Figure 5 Fluctuation trends of sentiment scores for Case Origins...13

Figure 6 Sentiment Score Distribution Figure 7 NPS Score Distribution...14

(7)

1. Introduction

1.1. Motivation

Nowadays, social media play a vital role in forming people’s way of thinking, taking into account that approximately more than one fourth of the entire global population is using it (Ahmad, 2013 & www.statistica.com). Many people witness that social media like Facebook, Twitter, WhatsApp are uninterrupted part of their lives. Most of the people these days know how to use them and take advantage of them. So, it is necessary to mention the social media definition. Social media are currently Web 2.0 Internet – based application, the user – generated content is the fuel of them and the backbone of social media services is the user profile.

As can clearly be seen on Table 1, the quantity of people and accounts of social media is huge. The enormous impact of social media proceed from the latter but also from the content itself. Simple users create and manage social media and in this way they are differentiated by usual internet applications. As Bertot, Jaeger and Hansen (2012) mention user – generated contents are what social media relies on. Moreover, Porter (2010) identifies that the communication pattern has been transformed from one – to – many to many – to – many. Based on Kaplan and Haenlein (2010) the higher the presence on social media, the higher the influence on other’s people way of thinking and behavior.

It is undeniable that social media arise from every entity of the society. Apart from individuals, every organization that wants to survive in this time of information, is obliged to use social media in order to communicate and interact with the customers in the most optimal way. Regardless the size of the organization or the exact kind of firm – governments, commercial firms, industrial organizations, non- profit organizations – they all use social media in order to upgrade the relationship with the customers within the needed prerequisite of these days. Social media have already remarkable impacts on fields of economy, society and politics.

Social Media Number of Active User Accounts

Facebook 2.2 billion Twitter 330 million WhatsApp 1.3 billion Table 1 Active User's social media accounts on January 2018

To start with, social media are becoming the most important media channel for companies to reach customers (Mangold & Faulds, 2009). Especially, through the continuously increasing use and impact of social media on people’s way of thinking, firms of all sizes pay attention to what is mentioned about them. There is a growing need of analyzing the information derived from social media. Sentiment analysis (SA) is continuously becoming the center of research and practice because organizations need to know the true and objective opinions of people about their services and products. In the course of the time, firms of all size, spend more and more of their

(8)

capital in finding ways of identifying and analyzing customers’ opinions. With sentiment analysis, they detect their weaknesses and consequently grow more powerful. Consequently, “what other people think” is a key concept during the decision-making process

The widespread use of social media has created massive amounts of textual data. That is why today one of the most famous source for big data is user interactions on social media platforms and mobile applications (Tedeschi & Benedetto, 2015). The user – generated contents, such as update status and individual opinions, are becoming a useful resource to understand what people think about a certain brand. A brand is related with products and services provided by an enterprise. In the mind of the customer, the image of a brand is clearly shaped and the concept of trust is a key concept. Brands need to spot and analyze big amounts of data in order to improve their reputation about customers so that they will enhance their competitiveness. As identified before, through sentiment analysis of the social users’ content a firm has the ability to:

 Detect its weaknesses and limitations (Souza, 2015)

 Discover new knowledge related to brand awareness and behavioral patterns (Micu, Geru & Lixandroiu , 2017)

 Create ad-hoc marketing campaigns and advertisements analyzing users’ sentiments (Becker, Nobre & Kanabar, 2013)

 Manage its brand image

 Gain a competitive advantage ( He, Zha & Li, 2013)

The aforementioned explosive diffusion of social media streams and their impact on firms and individuals is attractive and interesting since it is a growing phenomenon. In this research, focus is placed on the sentiment of the customers and the customer’s satisfaction. As many people have seen, sentiment analysis has gained a center of attention and is applied within many fields. Within this research, a case study is applied in the Social Media Department of KLM Royal Dutch Airlines.

1.2. Research Question

There are many studies that have deepened in the impacts of social media. Erdogmus and Cicek (2012) identified the impact of social media on brand loyalty whereas Hudson and Thal (2013) discovered the impact of them on the consumer decision process. Moreover, other studies such as Piller, Vossen and Ihl (2010) studied the co-creation of innovation. All the researches done already prove the remarkable impact of social media in different fields. Yet none of the previous studies have focused on the impact of social media on customers’ satisfaction. Given this lack, it comes to the question of this research:

Does a correlation between social media and customer satisfaction exist?

In order to help the research answer the better way some sub question have been composed, as follows:

(9)

And

Does every social media channel have the same effect on customer satisfaction?

1.3. Research Structure

In this research, towards the answer, the first step is to give a thorough insight on the available literature. In chapter 1, the aim is to provide a thorough literature review and analysis about the concepts used within this research. Moreover, there is an introduction about the case study and the some of the components of the experiment. Following, in chapter 3, the description of the experiment’s objects, the details of the data as well as the sentiment analysis itself are presented. In this chapter, the research hypothesis is mentioned. Afterwards, the result of the sentiment analysis are described and the dependent and independent variables are defined. In the same chapter, there is a statistical analysis which aims to find a final answer to the research hypothesis. In chapter 4, on the ground of statistical significance the answer to the research question will be concluded.

2. Theoretical Background

2.1. Sentiment Analysis

As mentioned before, social media have changed the way people interact with each other and with companies (Hanna, Rohm, & Crittenden, 2011; Kietzmann, Hermkens, McCarthy, & Silvestre, 2011). Incorporating the use of social media in customer service processes is a logical progression for firms in order to expand and strengthen the communication with the customers. As Andzulis, Panagopoulos and Rapp (2012) mention, social media should be an integral part of a firm’s repertoire due to the fact that it allows employees to engage customers and build social capital that would “encourage customers to interact engage and establish relationships with them”. As a result, social media can have influence on customer satisfaction process. It has to be underlined that due to the continuously increasing firm and customer interaction, the power is shifting towards customers (Prahalad & Ramaswamy, 2004). In addition, another result of the increasing cooperation of customers and employees is the cocreation of knowledge and value leading to equality between them (Greenberg 2010). Thus, customer contact employees must adapt to the expectations of the customers (Hibbert, Winklhofer, & Temerak, 2012) or risk alienating or losing their customer base (Agnihotri, Dingus, Hu & Krush, 2016). For instance, a Harris Interactive report found that 82% of consumers have discontinued dealing with a company as a result of a negative experience (Right Now Technologies, 2010). Hence, social media could constitute a tool which enables positive customer experiences that meet their expectations.

(10)

A recent trend in digital marketing analytics sphere is to track and analyze consumers’ feelings and opinions about the products and services of firms attributed to the Customer Generated Content (CGC) on social media. Technology effectively used could aim firms – in this case KLM – to improve the customer satisfaction process by applying sentiment analysis on CGC derived from social media. Sentiment analysis (SA) – also called Opinion mining (OM) – is a highly interesting research concept within the field of Information Processing (Guerrero, Olivas, Romero & Viedma, 2015). . SA is a type of subjectivity analysis (Wiebe, 1994) that identifies positive and negative opinions, emotions, attitudes, based on data derived from documents, journals, magazines or social media and expressed in natural language (Wilson, 2009). SA or OM form the computational treatment of opinions, sentiments and subjectivity of text towards an entity. This entity can represent individuals, events or topics. Essentially, SA determines the polarity of a text, if it is neutral positive or negative. Moreover, as Guerrero et al. mention in their research in 2015 (p. 2): “the concept of SA encompasses many processes such as extraction of sentiments, sentiment classification, subjectivity classification, opinion summarization or opinion spam detection”. That is why, SA can be regarded as a classification process. It has to be noted that despite the fact that SA and OM are interchangeable, some researchers believe that they have different meaning. According to them, OM extracts and analyzes people’s opinion about an entity whereas SA calculates the sentiment people express and then classify their polarity (Tsytsarau & Palpanas, 2012).

The main sources of data are derived from product/service reviews found online on the web. These reviews are important to the business holders as they can take business decisions according to the analysis results of users’ opinion about their products/services. Furthermore, due to the extremely wide use of social media these days, data could be derived from them. There are many studies in literature applying sentiment analysis on twitter data (Souza et al., 2015). Moreover, data could be derived from journals, news, blogs and websites.

There are different approaches to SA which are related to social media textual data. The classification techniques are: (1) lexicon-based and (3) machine learning-based. First, the lexicon – based approach is based on a sentiment lexicon which is a collection of known words and their corresponding sentiment score. It is further divided into dictionary – based and corpus – based approach which, respectively, use statistical and semantic methods to find the polarity. This technique is one of the most widely used for classifying text sentiment. It has to be mentioned that it is not requested to have any training dataset and the classification is conducted based on the negative and positive words in a text. By investigating a large number of data, the high rate error can be eliminated. In contrast with lexicon – based techniques, machine learning approaches make use of learning algorithms. This approach has some drawbacks, because there is no system that can deal with context dependent words. For example, a word ‘long’ can be interpreted as positive and negative. In order to solve this problem, researchers use a holistic lexicon based approach. The score is being computed by polarity and the strength of word, positive words are assigned with a semantic orientation score of + 1, and negatives words with -1 (Bhuta, Doshi, Doshi & Narvekar, 2014). Secondly, machine learning approach relies on the famous ML algorithms to solve the SA as a regular text classification problem that makes use of syntactic and linguistic features. Machine learning approach is divided in supervised, i.e. depending on the existence of labeled training documents, and unsupervised learning. According to Gautam and Yadav (2014) there are three supervised learning

(11)

machine techniques that are mostly used by researchers which are: (1) Naïve Bayes, (2) maximum entropy and (3) support vector machine. Naïve Bayes is the most commonly used classifier which computes the posterior probability of a class (Medhat, Hassan & Korashy, 2014). It classifies the documents to their right category by comparing the content with the list of words. Moreover, maximum entropy is similar to the processes of Naïve Bayes, also it finds distribution over classes and provides polarity of the sentiments (Gautam et al., 2014). Furthermore, the support vector machine can successfully separate the different classes by determining the linear separators in the search space (Medhat et al., 2014). The margin of the classifier is defined by finding a margin between to classes that is not the same as any document. This maximizes the margin that results in indecisive decisions (Gautam et al., 2014).

The three classification levels of Sentiment Analysis are document-level, sentence-level and aspect-sentence-level (Medhat et al., 2014). Firstly, in document-sentence-level SA the focus is placed on the classification of an opinion document as expressing a positive or a negative sentiment. Secondly, sentence-level SA aims to categorize sentiments expressed in each sentence. In the first place of sentence-level, it identifies whether the sentence is subjective or objective. If the sentence is subjective, SA will consider if the sentence has a negative or positive meaning. Liu (2012) mentions that there is no fundamental difference between document and sentence level classifications because sentences can be regarded as small documents. Nevertheless, these kinds of SA classifications do not operate in the required detailed level and thus aspect-level SA emerges. Thirdly, the goal of aspect-level SA aims to categorize sentiments in relation to specific aspects of the entities. The main step is identification of the entities and their aspects should be defined. The opinion holders can give different opinions for different aspects of the same entity like for example in the following sentence:

“The agents of KLM are very motivated and polite but, unfortunately, my luggage is lost.”

2.2. Sentiment Analysis within a company

Each company automatically stores every information about services, products, customers, suppliers and transactions in a form of a database. These are big or even massive quantities of information that can be not directly used for analytical purposes. These data can be used and analyzed via various data mining algorithms in order to extract hidden regularities and classifications. However, in order to analyze customer’s behavior these data are not enough. Moreover, businesses require real-time information. That is why the best data source for this goal is social media, where the data has textual form. When a user sends a message through Facebook or WhatsApp, or a tweet on twitter, he/ she creates a valuable information for the analysts of a company in order to analyze the weaknesses of the company, the consumer’s needs and desires. It is proven that companies achieve success or failure in the market due to customers. For that reason, each company tries to meet the needs of their customers. As Bijaksic, Bevada and Markic (2014) have mentioned, only those who know their customers can satisfy their needs and so create loyal customer, shown by the experience of successful companies.

(12)

In addition to the traditional instruments for collecting data, such as questionnaires, interviews or direct comments, focus is placed on social networks. Messages sent on social media accounts of a company are systematically reviewed and opinions of customers are recorded. Some examples of messages of customers that express an opinion follow:

“Last time I came to your company X the service was quite disappointing.” “I lost my luggage during my flight with your company Y.”

“This mobile of company Ω is the best of all!”

Undoubtedly, SA is crucial for all kinds of firms, as well as organizations such as governments, needing to know the true and objective opinions of people about their services and products (McGlohon, Glance, Reiter, 2010).

2.3. Case Study

Several airlines are among the most active companies in using social media (Social bakers, 2013). Therefore, airline companies constitute an ideal environment for researching the effect of social media in customer satisfaction. Moreover, these kind of companies could offer a big dataset in order to apply a thorough sentiment analysis for answering the research questions. For this study, the chosen case company is KLM Royal Dutch Airlines, for the reason that it is very active on a range of online platforms – Twitter, Facebook, WhatsApp, WeChat, Kakao Talk, LinkedIn, Blog, Pinterest, Instagram, YouTube . KLM, as part of Air France KLM, is a major international player in the aviation industry, and is a well-known company in The Netherlands with a general brand awareness of more than 90% (NBTC-NIPO Research, 2011). At present, KLM is considered worldwide as a frontrunner in the commercial use of social media (IFITT, 2012) with - in April 2018 – 12.5 million Facebook friends and more than 2 million Twitter followers. Consumers can contact KLM “24/7” via Facebook and Twitter in 9 different languages.

KLM uses a software system, Salesforce, in order to handle the communication with the consumers. In Salesforce, every message of a consumer and every answer of an agent is saved as a social post. A case includes every post of a certain consumer about a specific issue, i.e. the whole discussion between agent and consumer. A case can take place in different social media networks, so, there is a parameter called case origin which constitutes the channel in which the conversation is happening. In addition, every case has a social persona which is the profile of the consumer in a certain channel. A person account is constituted one or more social personas, depending on which social media channel she or he uses. It is worth mentioning that every case has an NPS – Net Promoter Score which is filled by each consumer, in a form of a message as shown in Figure 1.

(13)

Figure 1 Message to customers for NPS (Source: Social Media Hub, KLM)

If a consumer chooses the option from 0-6 then he or she is classified as a detractor whereas, if the choice is either 7 or 8, then he or she is passive, and if the choice is 9 or 10, the consumer is likely to promote the company. In a higher level, the overall NPS rate is calculated as shown in Figure 2. For this research, the NPS score which will be used is the rating each consumers selects for each case.

Figure 2 Calculation of NPS (Source: Social Media Hub, KLM)

In terms of data, KLM receives massive amounts of posts every day in several social media platforms. The volume of the data from September 2017 until March 2018, as can be seen clearly in Table 2, is remarkable. The interaction between agents and consumers is growing day by day, making the sentiment analysis itself necessary for the company in order to evaluate the customer satisfaction process.

Total Case Volume WhatsApp case volume Facebook case volume Twitter case volume September 2017 October 2017 116.340 121.897 7.385 30.077 85.578 73.871 13.079 10.796 November 2017 December 2017 January 2018 February 2018 March 2018 122.523 183.033 170.155 120.662 150.998 52.770 86.712 83.850 60.898 79.622 53.702 73.765 65.781 44.348 53.768 10.564 15.163 14.115 10.339 11.188 Table 2 Case volume September 2017 - March 2018 (Source Social Media Hub, KLM)

(14)

3. Research Method

3.1. Introduction to Experiment objects

For the sake of exploring whether social media have impact on customer satisfaction, first step is to determine which social media networks will be used in this research. Taking into consideration that this is a case study within KLM Facebook and Twitter will be used. The introductions and motivations are delineated below.

Twitter – the microblogging created in 2006 at the United States – has gained tremendous attentions and attracted massive users (Kwak et al., 2010). From the other hand, Facebook is the 3rd_{most popular website in March 2018 (}_{www.wikipedia.com}_).

Moreover, Facebook page of KLM was ranked as the best airline Facebook Page on April 2018 (www.dreamgrow.com). KLM, as mentioned before, is active in numerous social media, among them the latter. The choice considered that KLM will provide the Twitter and Facebook data.

From the other side, KLM uses NPS in order to count the customer satisfaction. Each social post has an NPS ranked by the corresponding consumer, which, in this research, will be collated with the sentiment score.

3.2. Data Collection

Data were collected through Salesforce. Salesforce is a world’s first customer relationship management online platform. Firstly, a report was created including the following fields included in Table 3.

Field Explanation

Case Number ID of each social case Content Case Origin Created Date NPS Score NPS Status Type of case Reply speed Case Topic Case Detail Case Phase Actioned by Account Name

The actual message in form of text Social Media Channel

Date of the message Net Promoter Score

Completed or uncomplete NPS

There are several types of cases such as complaints and requests Calculated in minutes

There are numerous case topics corresponding in different department/processes of KLM Each case topic includes numerous case details about the exact issue of the message The phase of the travel when the message is sent, such as post travel, day of travel The KLM agent’s name

The customer’s name

Table 3 Data fields

Moreover, the filters applied in the report through Salesforce in order to gain the appropriate results were the following:

 Case Origin should be equal to Facebook, Twitter.

 NPS Status equals to completed, because if NPS Status is uncompleted, zero is appeared as the data in the column and this could confuse the analysis if a consumer actually rates the service with zero.

 Photo messages are excluded.

 Messages equal to characters such as “?”, “!” are excluded.  Language equals to English.

(15)

3.3. Sentiment Analysis

Typically, in social media like Twitter, a tweet is a text with some special characters such as # for hashtag. In this study, focus is placed on the sentiment that hides behind the text. To be more precise, this research takes as an input the text and the output is a number which represents the sentiment score. Currently, the most popular way to mine text in order to distill a sentiment measurement is sentiment analysis.

For this research, in order to gain a numerical sentiment score which is reliable, a python library has been used. TextBlob is a python library for processing textual data, which is free and can be found online (http://textblob.readthedocs.io/en/dev/). TextBlob’s creators are De Smedt and Daelemans. It is a powerful tool which is easy to use, rendering the most suitable tool for this research. TextBlob is constituted by Natural Language Toolkit (NTLK) and Pattern, two powerful natural language processing tools for measuring numerical sentiment. Practically, TextBlob can calculate the “sentiment” of a sentence. “Sentiment” is a measurement of the emotional content of a sentence. The sentiment score, also called polarity score, is a float number between -1 and 1.Typically, the number is positive if the sentence says something “good” and negative it says something “bad”. According to the creators of TextBlob, the best value to distinguish the negative and the positive is 0.1. The extreme negative sentiment is represented by -1, whereas the extreme positive by 1. The simplicity of Textblob comes from its Application Programming Interface (API). For instance, a user in order to get a sentiment score as an output only needs to type three lines of code in Python. As can be seen in Figure 3, the text “KLM has excellent customer service!” is going to be processed by Pattern, the sentiment analyzer of TextBlob. After the execution of this script, the result is 1. This sentiment score, called polarity, is the quantitative evaluation of the sentiment in this particular sentence. In this case, regarding the below instance, the sentence is completely positive. The reliability of Pattern is guaranteed at 75% (De Smedt & Daelemans, 2012).

Figure 3 Example of text processed by TextBlob in Python

According to De Smedt and Daelemans (2012), Pattern is a package written in Python for multiple tasks including sentiment analysis. In that way, SA is supported by a large lexicon which includes frequently used word with corresponding sentiment scores for each one. This score indicated the assessment of the sentiment of the word. To be more specific, when a sentence is processed by TextBlob, it retrieves the word sentiment based on the lexicon and then calculates the average of all words’ scores.

(16)

3.4. Research Hypothesis

At the time when NPS and sentiment scores were well prepared, a correlation analysis would be conducted in order to answer the main research question of this research. Once the sentiment for each social post is available, the social posts and their corresponding sentiment scores would be organized per case. The same would be done regarding the NPS. Eventually, the independent variable y is the NPS scores and the dependent variable x is the sentiment scores. The null and alternative hypothesis of this research were formed below. The alternative hypothesis was adapted as the research hypothesis.

H0: There is no statistically significant correlation between sentiment scores and NPS scores.

Ha: There is statistically significant correlation between sentiment scores and NPS scores.

The analysis process is to calculate the significant value p using Pearson’s method. The conclusions from the hypothesis test will be derived from the comparison of p and the significant level 0.05. If the significant value p is greater than the significant level 0.05 then the null hypothesis will fail to be rejected which means that there is no correlation between the NPS and Sentiment Scores. On the other hand, the alternative hypothesis will be accepted which means that the correlation between NPS and Sentiment Scores exist.

However, the second step of the statistical analysis – if the correlation exists – would be to figure out the strength of the correlation r. The intensity of the correlation will be shown by r, which is a float number between -1 and 1. It is used to describe the linear relationship between NPS and Sentiment Scores meaning that if r is positive there is a positive correlation between NPS and Sentiment Scores. In Pearson’s Correlation, the closer r is to 1, the more increase in one variable associates with an increase in the other.

It has to be mentioned that the conduction of Pearson’s method was implemented by Python and SPSS Statistics in order to ensure the reliability of the results.

4. Research Results

4.1. Experiment Results

In order to answer the first research question the following steps were followed. First of all, the number of cases per topic where counted. In order to identify the most crucial case topics, the top 20 topics, in terms of number of cases, where selected. Moreover, for each case topic the sentiment score was calculated as well as the NPS score. In order to study the customer satisfaction per topic, the NPS was taken into account because it is the most objective score for customer satisfaction. From the other hand, sentiment score is neutral for all the top 20 case topics. A distribution graph was created and each topic is classified based on the NPS and sentiment score. To be more precise, the following conditions were taken into account for the classification:

 If the NPS is between 0 and 6 then the status is “Detractor”

(17)

 If the NPS is either 9 or 10, the status is “Promoter’.

 Correspondingly, if the sentiment score is between -1 and 0, then it is marked as negative.

 If the sentiment score is between 0 and 0.1 (which is not a significantly high sentiment score), the status is neutral.

 If the sentiment score is greater than 0.1, then the status is positive. The classification conditions were selected based on the standard KLM’s classification criteria for NPS. In order to classify sentiment scores, the above significant score values were determined. In order to identify the most crucial topics for customer satisfaction, focus is placed on NPS since average sentiment score is mostly neutral or slightly positive.

Case Topic Number of

Cases Average NPS NPS Status SentimentAverage Score

Sentiment Score Status

Existing Booking (BO) Flying Blue & Loyalty (NT) 4267 1421 7.3 8.5 Promoter Promoter 0.08 0.07 Neutral Neutral Flight Distributions (DOT) Baggage (PFT) Options (PFT) Baggage (PT) New Booking (BO) Check-in (DOT) Flying Blue & Loyalty (PT) Flying Blue& Loyalty (BO) Customer Care (PT) Social Media (NT) Airport (DOT) Refund& Compensation (PT) About KLM (NT) Marketing & Brand (NT) Travel documents & Requirements (PFT) 1325 1117 5.3 8 7.8 5.1 7.2 7.3 8.3 8.2 5 7.2 7 5.3 8.3 7 7.5 Detractor Promoter Promoter Detractor Promoter Promoter Promoter Promoter Detractor Promoter Promoter Detractor Promoter Promoter Promoter 0.07 0.1 0.1 0.06 0.09 0.07 0.06 0.1 0.04 0.1 0.1 0.05 0.15 0.14 0.1 Neutral Positive Positive Neutral Neutral Neutral Neutral Positive Neutral Positive Positive Neutral Positive Positive Positive

(18)

Transit & Transfer (DOT) Baggage (DOT) 8.3 8.3 Promoter Promoter 0.1 0.1 Positive Positive

Table 4 Case topics

In table 3, the case topics, number of cases, average NPS, and the NPS status can be clearly seen. It has to be noticed that for some topics such as “Existing Booking (BO)” most of the consumers are promoters but the average NPS score is 7.3 which means that there are also many “Detractors” and “Passives” who reduce the average score. In this research, focus is placed on the topics whose NPS is below 8, meaning that consumers are detractors. For that reason, the most crucial case topics that need improvement in terms of the customer satisfaction process are, in ascending order, the following:

 Customer Care (PT) – 54% Detractors, 41% Negative  Baggage (PT) – 54% Detractors, 37% Negative

 Refund & Compensation (PT) – 51% Detractors, 35% Negative  Flight Distribution (DOT) – 51% Detractors, 36% Negative  Marketing & Brand (NT) – 33% Detractors, 31% Negative  Airport (DOT) – 31% Detractors, 33,3% Negative

 New Booking (BO) – 30% Detractors, 28% Negative  Social Media (NT) – 25% Detractors, 40% Negative  Existing Booking (BO) – 27% Detractors, 30% Negative  Check-in (DOT) – 28% Detractors, 41% Negative

 Travel documents & Requirements (PFT) – 23% Detractors, 28% Negative  Options (PFT) – 23% Detractors, 22,5% Negative

As we can see from the above information, the percentages of the Detractos and the Negatives appear to have a relationship. In cases such as Customer Care, the percentages of the detractors and negatives are in the same level, which shows that there can be a correlation between NPS and Sentiment Score, at least in the negative messages.

Regarding the second research question, different case channels have different effects on customer satisfaction. Given the results in table 4, Sentiment Scores for both Facebook and Twitter are neutral so, observations cannot be derive from the Sentiment Analysis. From the other hand, NPS rates show that Facebook is more efficient in terms of customer satisfaction. It is necessary to be mentioned that most of the 79% of the total case volume came from Facebook and only the rest 21% from Twitter.

To be more precise, when the communication occurs through Twitter 50% of the cases rank 8 for the NPS, 25% score 2 and the average is 6. At the same period of time, 50% of the cases on Facebook score 9, 25% score 6 and the average is 7.5.

Case Origin Cases Volume Average NPS Average Sentiment Score

Facebook 11497 7.5 0.1

Twitter 3089 6 0.008

(19)

After the above results, Pearson’s method was used in order to identify if there is a correlation between NPS and sentiment scores of Twitter and Facebook data.

Case Origin

P-value Pearson’s Correlation

Facebook 0.001 0.240

Twitter 0.001 0.286

Table 6 Correlation for Case Origins

As identified in Table 6, there is a correlation between NPS and Sentiment Scores for both channels. Nevertheless, the correlation is weak for both Facebook and Twitter, where the latter has a slightly stronger correlation. In Figure 4, we can notice both Facebook and Twitter follow approximately the same trend on the NPS rates. Moreover, in Figure 5, there is similarity on the trends especially after January 2018.

(20)

Figure 5 Fluctuation trends of sentiment scores for Case Origins

In order to make more accurate remarks, the percentages of each category of NPS status and Sentiment Score status were calculated (Table 9). As mentioned before, conclusions cannot derive from Sentiment Score status categories since most of the consumers are neutral and slightly positive, so focus is placed on the three categories of NPS status. Regarding Facebook, 53% of the consumers appear to be promoters. On the other hand, most of the consumers who communicate through Twitter are divided approximately equally between detractors (41%) and promoters (38%).

Case

Origin Detractors Passives Promoters Negatives Neutrals Positives

Facebook 25% 22% 53% 29% 33% 38%

Twitter 41% 21% 38% 35% 31% 34%

Table 7 NPS Status and Sentiment Score Status of different social media channels

In Figure 6, the Sentiment Score distribution is presented. It is clear that most of the cases range in a neutral to slightly positive level. However, the distribution of the NPS is not as expected due to the fact that most of the cases are equal to 10. It has to be noted that for the NPS distribution, a bar graph (Figure 7) was used for the reason why NPS has standard values from 0 to 10. On the contrary to NPS, Sentiment Score distribution appears to be much more neutral, making the research even more interesting to examine the possible correlation between these two variables. Details about the distribution of NPS and Sentiment Scores are presented in Table 10.

Mean Standard Deviation 25% of the cases 50% of the cases 75% of the cases NPS 7 3.45 6 10 Sentiment Score 0.085 0.15 0 0.15

(21)

Figure 6 Sentiment Score Distribution Figure 7 NPS Score Distribution

Eventually, approximately 82,000 social posts (14,590 social cases) were filtered out from Facebook and Twitter. It has to be noted that 79% of the data derived from Facebook and the rest 21% from Twitter. It is clear that consumers of KLM airlines tend to use more Facebook/ Facebook messenger than Twitter in order to communicate with agents.

Following the sentiment analysis process, the sentiment score for each filtered Facebook message or tweet was obtained with the valuable help of TextBlob.

In order to examine if there is a correlation between NPS Score and Sentiment Score, Pearson’s method was used. The correlation was implemented with two ways in order to ensure the reliability of the findings. Firstly, the implementation was conduct with Python and secondly, with the valuable help of SPSS Statistics. The results were exactly the same in both implementations.

As displayed in table 5, the significant value for the potential correlation between NPS and Sentiment Score is p= 0.001. Because significant value p = 0.001 is less than 0.05, the research successfully rejects the null hypothesis that there is no statistically significant correlation between sentiment score and NPS. From the other hand, the strength of this correlation is r= 0.253, which shows that the correlation is significant but weak.

For the reason that the correlation is significant but weak, the validity of NPS was questioned. In order to test the validity, three hypothetic NPS rates were filled by three agents of the Social Media Hub department of KLM. These hypothetic NPS rates referred to 350 social posts of the same Case Topic, Customer Care (PT). After gathering the hypothetic NPS, the homogeneity of the dataset was checked. For the sake of that, a Levene’s test for the homogeneity of variances was conducted. The significant value of the test was p=0.32. Given that the significance is greater than 0.05, Levene’s test is non-significant so equal variances are assumed.

Following, the conduction of the Pearson’s method was made in order to test if there is a potential correlation between the average of the three hypothetic NPS and Sentiment Scores. As displayed in table 5, the significant value p is lower than 0.05, as it is p=0.006. This shows that the correlation of the New NPS and Sentiment Score exists and r is slightly higher than in the correlation of NPS and Sentiment Score.

Nevertheless, the correlation between the traditional NPS and the New NPS – manually filled by KLM agents - is very strong as r is equal to 0.783 and the significant

(22)

Variables P-value Pearson’s Correlation

NPS – Sentiment Score 0.001 0.253

New NPS – Sentiment Score 0.006 0.372

New NPS - NPS 0.001 0.783

Table 9 Statistical Assessment for NPS, New NPS and Sentiment Score

4.2. Discussion 4.2.1. Conclusions

Towards the answer of the first research question, the case topics with the most negative NPS are mentioned. After the analysis, the topic “Customer Care (PT)” appears to have the highest percentage of detractors. The same percentage of detractors (54%) belongs also to the topic “Baggage (PT)”. In addition, “Refund & Compensation (PT)” and “Flight Distribution (DOT) reach 51% of detractors. These are the topics which seem to be the most problematic. KLM could focus on them since there is room for more improvement of customer satisfaction.

Secondly, focus is placed on the two different social media channels, Facebook and Twitter. Facebook appears to be the most efficient channel for customer satisfaction, as the average NPS score is 7.5.We identified a common trend between NPS and sentiment scores of Facebook and Twitter from December 2017 until February 2018. Moreover, after the statistical analysis, we identified a significant correlation which is weak.

For the sake of research hypothesis, Pearson correlation and significant value were calculated after collecting the independent variable NPS and the dependent variable Sentiment Score for the period of October 2017 to March 2018. Regarding the validity of the statistical analysis, methodological triangulation was applied, using Python and SPSS Statistics.

Pearson correlation r and significant value p offered sufficient evidence for the acceptance of the alternative research hypothesis. Statistical significance requires at least 95% confidence for the existence of a correlation and the experiment of this research has 99.9% confidence that the correlation exists. Nevertheless, this correlation is weak and the validity of the NPS is proven as mentioned in the Experiment Results of this research. Looking at the distribution graphs of NPS and sentiment scores, we would expect that most of the sentiment scores would be either significantly positive or negative as the NPS rates are. However, sentiment scores appear to be neutral to slightly positive.

With the purpose of examining the unexpected results, line charts are presented in Figure 8 and 9, which show the fluctuation trends between NPS and sentiment scores. At first sight, there is no correlation between sentiment scores and NPS. However, a deeper examination of the trends show that there is correlation at the period of January 2018 – February 2018 and March 2018 – April 2018.

(23)

Figure 8 and 9 Fluctuation trends for NPS and Sentiment Scores

Unexpectedly, a possible explanation could be that consumers may show different sentiment when communicating with a KLM agent than when they are asked to rank the whole company. For example, numerous times customers tend to communicate in order to mention their complaints especially in a Post-travel phase. It is logical though that even if they are polite, the sentiment score will not be highly positive due to the nature of the complaint. Moreover, sometimes consumers may be frustrated due to a lost baggage or a cancelled flight but afterwards, when their requests are accepted by KLM they rank with a high NPS score. It has to be mentioned that due to the high volume of messages, sometimes complaints and requests take time to be proceeded so consumers may think that the customer service process is not as effective as they thought. For that case, KLM could use automatic messages that would keep the customer informed about the stage of his/her request.

Moreover, NPS actually is a score, as mentioned in Chapter 2, which shows if the consumers would recommend KLM. From the other hand, sentiment scores describe the sentiment of a consumer in a specific message or discussion. It is logical, thus, that these scores do not represent the same “emotion” and confirm the saying that humans are unpredictable.

4.2.2. Limitation

Firstly, as can be seen in the distribution graphs of sentiment scores and NPS, the sentiment scores are unexpectedly neutral in average. This might be due to the insuffiency of TextBlob.

Due to time insuffiency, only three agents of KLM were asked to rate themselves the NPS while reading conversations between agents and consumers. There are possibilities that if more agents would rate, there could be a stronger correlation between the alternative NPS and the sentiment scores. Moreover, the sample size of the cases ranked by the agents was very small.

4.2.3. Future Work

Inspired by this research, three suggestions were put forward. First of all, researchers who are interested on NLP problems could re-examine if the appropriate research methods were used for the given experiment objects. It should be checked if the implementation of TextBlob is the most optimal for the current dataset or if the

(24)

pre-processing of the data is sufficient. Other classifiers could be used such as Naïve Bayes. Furthermore, deeper insights on the correlation of sentiment scores derived from consumer’s messages and the extent to which they would recommend KLM should be also studied by qualitative analysis.

References

Agnihotri, R., Dingus, R., Hu, M. Y., & Krush, M. T. (2016). Social media: Influencing customer satisfaction in B2B sales. Industrial Marketing Management, 53, 172-180.

Andzulis, J. M., Panagopoulos, N. G., & Rapp, A. (2012). A review of social media and implications for the sales process. Journal of Personal Selling & Sales Management, 32(3), 305-316.

Becker, K., Nobre, H., & Kanabar, V. (2013). Monitoring and protecting company and brand reputation on social networks: when sites are not enough. Global Business and Economics Review, 15(2-3), 293-308. Bertot, J. C., Jaeger, P. T., & Hansen, D. (2012). The impact of polices on government social media usage: Issues, challenges, and recommendations. Government information quarterly, 29(1), 30-40.

Bhuta, S., Doshi, A., Doshi, U., & Narvekar, M. (2014, February). A review of techniques for sentiment analysis Of Twitter data. In Issues and challenges in intelligent computing techniques (ICICT), 2014

international conference on (pp. 583-591). IEEE

Bijaksic, S., Markic, B., & Bevanda, A. (2014). BUSINESS INTELLIGENCE AND ANALYSIS OF SELLING IN RETAIL/POSLOVNA INTELIGENCIA I ANALIZA PRODAJE U MALOPRODAJI. Informatologia, 47(4), 222.

Dijkmans, C., Kerkhof, P., & Beukeboom, C. J. (2015). A stage to engage: Social media use and corporate reputation. Tourism Management, 47, 58-67.

DreamGrow (2018). List of most popular airline facebook pages – www.dreamgrow.com [Online;accessed 5-5-2018]

Erdoğmuş, İ. E., & Cicek, M. (2012). The impact of social media marketing on brand loyalty.

(25)

Gautam, G., & Yadav, D. (2014, August). Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In Contemporary computing (IC3), 2014 seventh international conference

on (pp. 437-442). IEEE.

Greenberg, P. (2010). CRM at the speed of light: social CRM strategies, tools, and techniques for engaging

your customers. New York, NY: McGraw-Hill.

Hanna, R., Rohm, A., & Crittenden, V. L. (2011). We’re all connected: The power of the social media ecosystem. Business horizons, 54(3), 265-273.

He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464-472.

Hibbert, S., Winklhofer, H., & Temerak, M. S. (2012). Customers as resource integrators: toward a model of customer learning. Journal of Service Research, 15(3), 247-261.

Hudson, S., & Thal, K. (2013). The impact of social media on the consumer decision process: Implications for tourism marketing. Journal of Travel & Tourism Marketing, 30(1-2), 156-160.

IFITT (2012, 2018). Use of social media – www.ifitt.org [Online;accessed 1-5-2018]

Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business horizons, 53(1), 59-68.

Kietzmann, J. H., Hermkens, K., McCarthy, I. P., & Silvestre, B. S. (2011). Social media? Get serious! Understanding the functional building blocks of social media. Business horizons, 54(3), 241-251.

Kolchyna, O., Souza, T. T., Treleaven, P., & Aste, T. (2015). Twitter sentiment analysis: Lexicon method, machine learning method and their combination. arXiv preprint arXiv:1507.00955.

Kwak, H., Lee, C., Park, H., & Moon, S. (2010, April). What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World Wide Web (pp. 591-600). ACM.

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language

technologies, 5(1), 1-167.

Mangold, W. G., & Faulds, D. J. (2009). Social media: The new hybrid element of the promotion mix. Business horizons, 52(4), 357-365.

McGlohon, M., Glance, N. S., & Reiter, Z. (2010, May). Star Quality: Aggregating Reviews to Rank Products and Merchants. In ICWSM.

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.

Micu, A., Micu, A. E., Geru, M., & Lixandroiu, R. C. (2017). Analyzing user sentiment in social media: Implications for online marketing strategy. Psychology & Marketing, 34(12), 1094-1100.

Obar, J. A., & Wildman, S. S. (2015). Social media definition and the governance challenge: An introduction to the special issue.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in

Information Retrieval, 2(1–2), 1-135.

Piller, F., Ihl, C., & Vossen, A. (2010). A typology of customer co-creation in the innovation process. Porter, J. (2010). Designing for the social web, ebook. Peachpit Press.

Prahalad, C. K., & Ramaswamy, V. (2004). Co-creating unique value with customers. Strategy &

(26)

Serrano-Guerrero, J., Olivas, J. A., Romero, F. P., & Herrera-Viedma, E. (2015). Sentiment analysis: A review and comparative analysis of web services. Information Sciences, 311, 18-38.

Shah, S., Ahmad, A., & Ahmad, N. (2013). Role of packaging in consumer buying behavior. International

Review of Basic and Applied Sciences, 1(2), 35-41.

Smedt, T. D., & Daelemans, W. (2012). Pattern for python. Journal of Machine Learning Research, 13(Jun), 2063-2067.

Social Bakers (2013). Airlines using social media – www.socialbakers.com [Online;accessed 25-4-2018] Statistica (2018). Active users’ social media accounts on January 2018 – www.statistica.com

[Online;accessed 17-4-2018]

Tedeschi, A., & Benedetto, F. (2015, September). A cloud-based big data sentiment analysis application for enterprises' brand monitoring in social media streams. In Research and Technologies for Society and

Industry Leveraging a better tomorrow (RTSI), 2015 IEEE 1st International Forum on (pp. 186-191). IEEE.

Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Mining and

Knowledge Discovery, 24(3), 478-514.

Wikipedia (2018). List of most popular websites – Wikipedia, the free encyclopedia. [Online;accessed 25-4-2018]

Wilson, T., Wiebe, J., & Hoffmann, P. (2009). Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational linguistics, 35(3), 399-433.

Appendix

A.1 Script for converting csv files to json import csv

import json

csvfile = open('omdata.csv', 'r') jsonfile = open('filedata.json', 'w')

fieldnames = ("Case Number","NPS Score","Type","Content","Reply speed","Automated Message","Case Origin","Case Topic","Case Detail","Case Phase","Created Date")

reader = csv.DictReader( csvfile, fieldnames) for row in reader:

json.dump(row, jsonfile) jsonfile.write('\n')

A.2 Script for Sentiment Analysis from textblob import TextBlob import pandas as pd

import numpy as np import datetime

(27)

#Input and output files input_file= "filedata.json" output_file="resdata.csv" df = pd.read_json(input_file, lines=True) def datafix(content): content= TextBlob(content) content= content.replace("\n", " ") content= content.correct() content= content.replace("\r", " ") sentiment= content.sentiment.polarity return sentiment

df = df[["Case Number", 'Content', "Created Date","NPS Score","Type","Reply speed","Case Origin","Case Topic","Case Detail","Case Phase"]] # keep columns that i need

df.Content = df.Content.apply(datafix) #replace column "Content" with what i want via datafix

df["Created Date"] = pd.to_datetime(df["Created Date"]) # covert to "4-5-2017" format

#df['Date'] = df['Created Date'].apply( lambda df:

#datetime.datetime(day=df.day, month=df.month, year=df.year)) #df.set_index(df['Date'],inplace=True)

#df['count'].resample('D', how='sum')

#df.set_index('Created Date').groupby(pd.TimeGrouper('D')).mean().dropna() #df = pd.groupby(df, by=[df['Created Date'].day()])

#['NPS Score'].mean()

df.to_csv(output_file, index=False) # save

A.3 Script for Correlation and Graphs import numpy as np

import pandas as pd

import statsmodels.api as sm from matplotlib import pyplot as plt

from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score

(28)

import seaborn as sns input_file= "lol.csv" #output_file="linre.csv" df = pd.read_csv(input_file)

df["Case Phase"] = df["Case Phase"].astype('category') df["Case Phase Num"] = df["Case Phase"].cat.codes ### plot correlations

plt.figure(figsize=(12, 10))

corcols = ['Content', 'NPS Score', 'Reply speed', 'Case Phase Num'] corr = df[corcols].corr()

sns.heatmap(corr,

cmap='viridis', vmax=1.0, vmin=-1.0, linewidths=0.1, annot=True, annot_kws={"size": 8}, square=True); plt.show()

######################## # split into train/test and fit the model

df2 = df.drop(columns=['NPS Score', 'Case Number', 'Created Date', 'Case Phase', 'Type', 'Case Origin'

,'Case Topic' ,'Case Detail' ])

x_train, x_test, y_train, y_test = train_test_split(df2, df['NPS Score'], test_size=0.3, random_state=0 #,stratify=df['label'] ) # x_train = x_train.reshape(-1, 1) # y_train = y_train.reshape(-1, 1) # x_test = x_test.reshape(-1, 1) # y_test = y_test.reshape(-1, 1) regr = LinearRegression() regr.fit(x_train, y_train)

(29)

y_pred = regr.predict(x_test) # The coefficients

print('Coefficients: \n', regr.coef_) # The mean squared error

print("Mean squared error: %.2f"

% mean_squared_error(y_test, y_pred)) # Explained variance score: 1 is perfect prediction print('Variance score: %.2f' % r2_score(y_test, y_pred)) ### Plot the Regressor with Seaborn

sns.regplot(x="Content", y="NPS Score", data=df); plt.show()

A.4 Script for Case Topic Study and Distribution Graphs import pandas as pd

import numpy as np import seaborn as sns

from matplotlib import pyplot as plt import csv

path = "apadisi3.csv" savepath = "catop.csv"

samecols = ['NPS Score', 'Case Topic','Sentiment Score'] df = pd.read_csv(path)

def setNPSStatus(row):

if (row['NPS Score'] <= 6) & (row['NPS Score'] >= 0): return 'Detractor'

elif ((row['NPS Score'] == 8) | (row['NPS Score'] == 7)): return 'Passive'

elif ((row['NPS Score'] == 10) | (row['NPS Score'] == 9)): return 'Promoter'

def setSSStatus(row):

if (row['Sentiment Score'] >= -1) & (row['Sentiment Score'] <=0): return 'Negative'

elif (row['Sentiment Score'] > 0) & (row['Sentiment Score'] <=0.25): return 'Neutral'

elif (row['Sentiment Score'] > 0.25) & (row['Sentiment Score'] <1): return 'Positive'

(30)

df['NPSStatus'] = df.apply(lambda row: setNPSStatus(row), axis=1) df['SSStatus'] = df.apply(lambda row: setSSStatus(row), axis=1) df.to_csv(savepath)

for topic in df['Case Topic'].unique(): df2 = df[df['Case Topic']==topic] for nps in df2['NPSStatus'].unique():

print('Topic: '+str(topic)+", NPSstatus: "+str(nps)+", count: "+str(len(df[(df['Case Topic']==topic) & (df['NPSStatus']==nps)])))

for content in df2['SSStatus'].unique():

print('Topic: '+str(topic)+", SSStatus: "+str(content)+", count: "+str(len(df[(df['Case Topic']==topic) & (df['SSStatus']==content)])))

####################### plots ########################### print(df['Sentiment Score'].describe()) fig = sns.kdeplot(df['Sentiment Score'], shade=True);

fig.figure.suptitle("Sentiment Score Distribution", fontsize = 10) plt.xlabel('Sentiment Score', fontsize=10)

plt.ylabel('Distribution', fontsize=10) plt.show()

######################################################### print(df['NPS Score'].describe())

fig = sns.countplot(x= 'NPS Score', data=df, color='skyblue') plt.show()