• No results found

Making Decisions in a Big Data World: Using a Satisficing Algorithm to Efficiently Summarize and Analyze Online Textual Information for Consumers.

N/A
N/A
Protected

Academic year: 2021

Share "Making Decisions in a Big Data World: Using a Satisficing Algorithm to Efficiently Summarize and Analyze Online Textual Information for Consumers."

Copied!
40
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Making Decisions in a Big Data World: Using a

Satisficing Algorithm to Efficiently Summarize and

Analyze Online Textual Information for Consumers.

Christopher van der Made

Information Studies (track: Business Information Systems) University of Amsterdam (UvA)

UvA-ID: 6070574

christophervandermade@gmail.com Supervisor: dhr. dr. M.H.A. Koolen (UvA)

Abstract:

In today’s world, information, and in particular web-based information, is becoming increasingly important. The consumer is faced with an ever-growing stream of information. The information is also more easily accessible due to the rise of the Internet. A phenomenon called information overload is often the result, in particular when individuals do not have enough time to read an entire text. In this research an attempt is made to help alleviate this problem, by developing an algorithm that automatically summarizes texts by determining the most important sentences, based on the principle of “satisficing”. Satisficing is a methodology that, instead of trying to find the optimal solution, settles for the first adequate solution. The performance of the algorithm is tested by conducting a user study. This user study tests the reading comprehension of financial news articles from Reuters on the basis of either reading the generated summary of the article, skimming the article, or “conventionally” reading the article. The findings of the first analysis of the user study indicate that there is no difference in comprehension of the texts by the reading of the summary or the entire text. Also earlier findings, which indicated that reading comprehension was lower from skimming compared to conventional reading, were confirmed. This could mean that the reading of a summary is ‘good enough’ to base decisions on. However, in a second analysis, where the participants who did not finish the text in the allotted time were excluded from the analysis, the same conclusion could not be reached due to reduced sample size. Therefore more research is needed to confirm the findings of the first analysis. As a final concluding remark it can be said that some very promising results have been produced as a consequence of the current research. Using the developed algorithm, it has been shown that automatically generated summaries can be an efficient tool to facilitate satisficing for consumers who do not have enough time to read through an entire piece of textual information, by significantly reducing the amount of time needed to retrieve the same amount of information compared to skimming and conventionally reading.

Keywords:

Satisficing, skimming, summarizing, conventional reading, algorithm, corpus, TF*IDF, NLTK, python, textual information.

(2)

Table of contents

TABLE OF CONTENTS ... 2

1. INTRODUCTION ... 3

2. RELATED WORK ... 4

2.1 Satisficing ... 5

2.2 Skimming ... 6

2.3 Text Summarization ... 7

2.4 Visual Representation of Text ... 8

3. ALGORITHM ... 8

3.1 Corpus ... 8

3.2 Toolkit ... 9

3.3 Algorithm in Detail ... 9

3.4 Analysis of Tool ... 11

4. USER STUDY ... 12

4.1 Materials and Participants ... 12

4.2 Protocol ... 14

4.3 Analysis ... 15

5. RESULTS ... 16

5.1 Results post partial-filtering ... 17

5.2 Results post complete-filtering ... 20

6. DISCUSSION ... 22

7. CONCLUSION ... 25

8. REFERENCES ... 27

8.1 Literature ... 27

8.2 Links/API’s used for algorithm ... 28

9. APPENDIX ... 29

9.1 Source Code (Python) ... 29

9.2 Planning ... 31

9.3 Texts/summaries for User Study ... 32

(3)

1.

Introduction

In today’s world, information, and in particular web-based information, is becoming increasingly important. The consumer is faced with a growing stream of information to base decisions on every day. The information is also more easily accessible due to the rise of the Internet. The reading of online text is one of the most frequent interactions between humans and computers (Duggan & Payne, 2011). Furthermore the vast amount of information available on the Internet, and the easy access to it, often causes consumers to feel overwhelmed. Also it is predicted that the future of reading will be mobile. A trend is observed where an increasing amount of consumers read texts on their smartphone, which have significantly smaller screens (Lippincott, 2010). Due to these phenomena users are very much likely to experience information- or cognitive overload when seeking information on the Internet (Berghel, 1997). Moreover consumers do not have the time or the capacity to read through all of the relevant textual information. Due to these reasons reading online differs from reading from, for example, a book. This results in different reading tactics, like the skimming of the textual information (Duggan & Payne, 2011). By skimming (i.e. focusing only on the “important” sections) the text an attempt is made to maximize the amount of information gained, the search process also taken into account. In general it can be stated that skimming, but also summarization, are two existing methods that attempt to maximize the amount of information gained, when the search process is also taken into account. Little or no research has been done in the comparison of skimming and the reading of summaries of texts. That is the reason for the current research, in which a possible solution for the information-overload problem will be tested in comparison to skimming and to “conventional” reading of texts. The proposed solution is a tool that automatically summarizes texts on the basis of word frequency (details in section [3]). This will be tested with a user study to discover if the proposed solution actually improves the process of gaining information in a short period of time in comparison to skimming and conventional reading. This has led to the following research question for the current research:

"In what way can the use of the proposed satisficing algorithm to summarize textual information, in comparison to skimming, improve the decision-making process for consumers?" In this research automatic text summarization will be used to implement “satisficing”. Satisficing is a decision-making heuristic that entails choosing the first option that satisfies the requirements of the decision-maker, instead of the optimal choice (Moyer, 2007). This heuristic will be further elaborated in the related work section. Skimming and the reading of summaries, both being methods that do not review all information but instead attempt to review only the most important information, can therefore both be characterized as a satisficing method.

To give a short introduction of the to be developed algorithm, it is now briefly discussed. With the use of word frequencies in the text, compared to the word frequencies of those words in an associated corpus (i.e. large collection of textual information), an attempt is made to improve the skimming process by extracting sentences with topical “keywords” from the input text. By doing this, an alternative for skimming is given, by summarizing the textual information using this satisficing algorithm. The sentences that contain the highest amount of these keywords will be extracted and presented to the user in the same order as they occurred in the original text. According to Bischof and Airoldi (2012) it is more accurate to summarize textual information not only on the basis of word frequency, but furthermore

(4)

on the basis of exclusivity for the specific topic. Therefore it might be more reliable to change the corpus according to the topic of the text that will be summarized. To sum up, in the current research an attempt is made to develop a satisficing algorithm using the word frequencies to summarize textual information. Furthermore the effectiveness of this new summarizing tool will be tested and compared against the traditional skimming of a text. To make such an algorithm accessible to consumers, the summaries will be presented to the user in the most visually attractive and cognitive beneficial manner. For example the font size of the presented text has been shown to have an effect on the subjective perception of text (Darroch et al., 2005). Other possibilities to present the text in a visually attractive manner will be considered as well.

The structure of the thesis is as follows: first an extended literature review will be presented in the related work section. In the related work section the most important terminology will be further elaborated. First the decision heuristic “satisficing” will be explained, next more information will be given on the technique of skimming, after that the status quo of text summarization will be given and finally the related work chapter will end with a section that will discuss the use of visually attractive presentations for textual information. In the section after that the proposed satisficing algorithm to summarize textual information will be presented. Subsequently to the algorithm section the research methods of the user study (materials and participants, protocol and analysis) will be discussed. Hereafter the results of the user study will be presented. Finally the current thesis will end with a discussion and conclusion section.

2.

Related

Work

A literature review will be conducted to gain more insight into the current topic. First, the advantages and disadvantages of satisficing will be discussed. Second, the currently used skimming technique will be discussed in more detail. Third, the current status quo in the field of automatic summarization will be enlightened. Finally the use of visually attractive presentations for textual information will be briefly discussed to gain insight in what kind of (cognitive) advantages that might lend to the current tool.

Question that will be answered in the literature review are: - Satisficing:

o What are the advantages and disadvantages of satisficing? o In what fields is satisficing most/best used?

- Skimming:

o What kind of difference is there between skimming/reading from paper or screen texts?

o What do information seekers actually try to achieve during skimming and what type of errors do they make?

o How could summaries complement “document triage”? - Text Summarization:

o What efforts for automatic text summarization have been conducted in the past?

o What domains are interesting to research and what kind of corpora are available for those topics?

- Visual Representation of Textual information:

o What is a visually/cognitive beneficial method of presenting the summaries to the user?

(5)

2.1 Satisficing

In a world where Big Data is increasing exponentially, it has become near to impossible to make an optimal or best decision based on that data. As Moyer (2007) noticed:

"Optimizing, the art of finding the best choice among all choices is a luxury we can seldom afford. Instead, we settle for the first adequate solution we can find" This is called "satisficing” which means to not make the best decision, but the first decision that satisfies the needs of the decision-maker and so suffices (Moyer, 2007). Simon (1995) noted that true rationality, as depicted in the field of game theory, is not possible in real life. Rationality, the weighing of all costs and benefits to optimize the gain of decisions, was an idealistic and unrealistic view of the world (Simon, 1995). As a reaction to this an alternative model was presented: a satisficing choice model. This model was more realistic as it incorporates the bounded rationality (i.e. the notion that it is impossible to review all options/information) that individuals have in this world (Simon, 1995). This model entailed that most individuals go through their options or information in a sequential order, settling with the first option that satisfies (Caplin, Dean & Martin, 2011). The proposed choice model results in the fact that not all information has been viewed and that it therefore cannot be determined with certainty if the optimal choice was made. In a research of Caplin, Dean and Martin (2011) in the rationale of decision-makers during the incomplete information gathering process of satisficing, it was found that the participants acted more according to a rule of thumb (e.g. “making trivial decisions what information is relevant and in what order the information is viewed”), than according to a set strategy.

Byron (1998) defined the difference between satisficing and optimizing as local satisficing and global optimizing. Optimizing here was the search for the best means to the best outcome (global optimum) of all the information, whereas satisficing often results in the finding of a local optimum, due to the fact that not all information has been viewed (Byron, 1998). One could define the process of satisficing as optimization where all costs, including the cost of the optimization/search calculations themselves and the cost of getting information for use in those calculations, are considered (Simon, 1995). As a result, the eventual choice is, albeit optimal in the former sense, likely to be sub-optimal in regard to optimizing the solution itself.

Another remark of Byron (1998) was that there are two reasons for using a strategy while retrieving information. The first is to prospectively decide on a strategy in advance to speed up the process, and the second is to retrospectively explain search behavior to make it fit into a certain pattern (Byron, 1998). In the current research skimming will be viewed as a prospective satisficing method, where readers in advance know they do not have enough time to read a text and thus have to skim to maximize the gained information, while at the same time minimizing the amount of time of the retrieval process.

In a research by Schwarz et al. (2002) about maximizing versus satisficing, it was even shown that individuals who desire to maximize their decisions were less happy, less optimistic and had a lower self-esteem. Furthermore the researchers showed that maximizers/optimizers were less satisfied than satisficers with consumer decisions, and that they were more likely to engage in social comparison (Schwartz et al., 2002). Apparently maximizing the decision options and choosing the best one of those options is not always preferable, especially for the general consumer public.

(6)

To sum up satisficing is an information retrieval or decision-making strategy that has the most advantage when it is not possible (usually due to time limit) to review all information or options, which is often the case in a bounded rational world (Simon, 1995). A disadvantage of satisficing is that the outcome will often be suboptimal in regard to the optimizing the solution itself. Satisficing can be seen as a behavioral model that uses psychological concepts and economic theory (Kaufman, 1990). The fields in which satisficing is used, stretch therefore from economics, to game theory, information retrieval and mathematics.

2.2 Skimming

This new decision-making strategy could also be useful for the individual consumer, for instance in the reading of large (online) texts on a computer. As an example, imagine a consumer that has to base a decision on a large amount of textual information, but does not have the time to read through it entirely. In such a case the consumer has to decide which sections are useful and which are not. Usually, in such cases, consumers skim a text and by doing so extract partial information from the text (Duggan & Payne, 2011). By doing this, consumers try to focus on the important sections of the textual information, and try to ignore the unimportant sections. When readers skim through a text, they scan the sentences and are looking for keywords or phrases that stand out and are essential for the content of the text. Skimming enables them to significantly decrease the amount of time that they spent on reading the text (Ahmed et al., 2012). Skimming can be especially useful during document triage. According to Buchanan and Owen (2008) document triage is “the critical point in the information seeking process when the user first decides the relevance of a document to their information need”. During this information seeking process, e.g. searching for academic articles, individuals make snap decisions if a document is relevant or not. These snap decisions have been proven to have significant error rates (Buchanan & Owen, 2008).

Skimming can be seen as a separate method of document triage, but skimming often is used to augment document triage (Buchanan & Owen, 2008). Skimming and document triage, both being a decision-making heuristic that incorporates incomplete information gathering, can both be assumed to be a form of satisficing (Lowrance & Lea Moulaison, 2014). These satisficing methods of reading texts will become increasingly time-consuming, as the size of the texts increases. Moreover, when the size of texts increase, an information seeker is more likely to make errors in deciding which sections are actually of importance and which sections are not. This will decrease the overall information gain. These phenomena emphasize the importance of a reliable satisficing method to retrieve information from textual content. Extensive research has also been conducted in the difference between skimming (and reading) from paper and from screens. It has been the public’s opinion that reading from paper still is superior to reading from computer screens. With the improvement of the resolution of screens this opinion might not hold anymore (Holzinger et al., 2011). In a research by Holzinger et al. (2011) it was shown that 90% of the researched hospital personnel preferred reading from paper than from screens. In the same research however, it was shown that the performance had no significant difference. Apparently, the participants’ self-reported superiority of reading from paper did not result in a better performance (Holzinger et al., 2011). In another research it was also shown that the performance of skimming and reading did not differ between paper and screen representations of text (Kol & Schcolnik, 2013). However it was shown that students who were not experienced in screen reading performed significantly lower whilst skimming from a screen, than from paper. When students are trained to get more experienced with screen reading tough, these effects disappear (Kol & Schcolnik, 2013).

(7)

It can be concluded that the opinion of the public that screen reading is inferior to the reading on paper is somewhat outdated. In a society where information is frequently needed on-demand it is often not possible to get a printed version of the requested information. This results in the use of electronic screens and the Internet to retrieve the information.

2.3 Text Summarization

Summarizing is another technique that is used to reduce the time in which an information seeker can get a feeling of what the text is about. It does this by condensing the (textual) information into a summary (Ahmed, 2012). Summarizing can be categorized into two types of techniques. Like skimming, extractive summarization uses the concatenation of the most important sentences or phrases out of the original document (Ahmed et al., 2012). Abstractive summarization on the other hand, uses the main information from a text to generate a summary in the own words of the writer (Nenkova, Maskey & Liu, 2011).

Automatic summarization is the creation of summaries with the use of computational algorithms (Ahmed, 2012). Quite some research (for an overview: Nenkova, Maskey & Liu, 2011) has been conducted in the field of automatic summarization, but little to no research has been conducted in comparing the effectiveness of reading automatically generated summaries to the effectiveness of skimming. In the current research exactly that will be attempted: the use of automatic text summarization to improve the process of information retrieval.

To create abstract summaries it is a prerequisite to have a deep understanding of the semantics of the language. The reorganization, merging and modification of the textual information is needed for the creation of a compact and fluent summary (Nenkova, Maskey & Liu, 2011). Current state-of-the-art automatic summarization algorithms are not yet able to fully interpret textual information. This is why the vast majority of the current algorithms do not generate abstractive summaries, but instead generate extractive summaries based on the most important sentences (Nenkova, Maskey & Liu, 2011). Also, due to the fact that sentence compression (removing irrelevant words/information from sentence) often does not result in grammatical correct sentences, complete sentence extraction is chosen for the current tool (Nenkova, Maskey & Liu, 2011). The main objective of sentence extraction is determining what sentences are the most important for the topic of the to-be-summarized text.

The first contribution to the field of automatic sentence extraction was done by Luhn (1958), who suggested that some words in sentences are descriptive of the topic. Moreover sentences that contained the most of those descriptive words will be the sentences that are the most important for the topic of the to-be-summarized text (Luhn, 1958). To find the most descriptive words, he proposed to use the word frequency of words in the to-be-summarized text. He found out that a lot of words have a very high frequency, but do not have a value for the topic of the to-be-summarized text. Luhn (1958) solved this by creating a predefined list of “stop words” (e.g. “the”, “in” and “with”). Finally, Luhn (1958) used arbitrary thresholds to determine when a word frequency counted as a descriptive or as a stop word. Later attempts of automatic sentence extraction have elaborated on this early attempt. In section [3] more literature about automatic text summarization (by sentence extraction) will be offered to the reader to further explain the specifics of the current algorithm. This will make use of a more refined version of Luhn’s (1958) attempt that calculates the so-called “Term Frequency * Inverse Document Frequency” or short “TF*IDF” (Salton & Buckley, 1988).

(8)

2.4 Visual Representation of Text

In the current section various possible visual representations of text are discussed. This is important to facilitate the reading of the summaries in the best possible way. In section [4.2] a conclusion will be made, what the best visual representation will be for the generated summaries by the developed tool.

In a study by Darroch et al. (2005) the effect of font size and type on the legibility, reading speed and preference of text was researched. This study used elderly as participants. The main findings of this study were that large fonts (14pt) were more legible, faster read and more preferred over small fonts (12pt). Also “sans serif” fonts (e.g. Arial and Verdana) were preferred over “serif” fonts (e.g. Times New Roman and Georgia). However it was also found that a 14pt serif font facilitated the fastest reading speed (Darroch et al., 2005).

In another research by Mueller et al. (2014) the effects of font-size on “judgments of learning (JOLs)” were researched. Judgments of learning are subjective judgments of how well specific information is stored in memory. These can be judgments about how well recently studied information will be remembered. These so-called JOLs are important during e.g. studying when decisions have to be made on how long someone will have to study a certain aspect of the complete to-be-studied information (Mueller et al., 2014). In this study the participants had to study a set of words, make judgments on how well they would remember them and make a quiz to actually test their memory. It was found that words that were presented in a large font size (48pt) had a higher JOL (i.e. were judged to be better memorized) than words with a small font size (18pt) (Mueller et al., 2014). In addition to those findings it was also found that the JOLs actually reflected the participants’ belief about their memory (i.e. words with a high JOL were also memorized better) (Mueller et al., 2014). The importance of the effect of font size on JOLs and the importance of JOLs for the accuracy of memory are also confirmed in other studies (McDonough & Gallo, 2012; Pillai, Katsikeas & Presi, 2012).

In a research by Bauer and Cavonius (1980) it was found that dark font colors on a light background are more legible than light font colors on a dark background. The participants were found to read more accurately with the conventional dark text on light background (Bauer & Cavonius, 1980).

3. Algorithm

3.1 Corpus

In the current section the implementation of the developed tool is discussed. To summarize the textual information an algorithm was developed. The algorithm uses a large corpus of English texts as reference to summarize the text. This corpus will be used to retrieve word frequencies that can be compared to the word frequencies in the to-be-summarized text. For the current research the Reuters corpus, containing 10,788 news documents, is used (NLTK, 2015). Reuters is a primarily financial online news agency. The Reuters corpus contains a total of 1.3 million words. Just for reference, the Oxford English dictionary contains a little more than 250,000 unique words, from which 171,476 are in current use (Oxford English Dictionary, 2014). This corpus has been categorized into 90 topics. The documents are split into sentences and words to facilitate natural language research (Reuters, 2015). For the current research the Reuters corpus is chosen because the news articles are about past topics that the participants can relate to in the user study. These articles can then be summarized for the user study.

(9)

According to literature it is more reliable to change the corpus according to the topic/type of the text that will be summarized (Bischof and Airoldi, 2012). This is the reason that the algorithm will be designed in such a way that it is ‘corpus-independent’ (i.e. the corpus can be easily changed according to the type of text that needs to be summarized). For the current research the Reuters corpus will be used and thus the type of texts that it can summarize are financial news articles from Reuters.

3.2 Toolkit

With the use of the Python toolkit Natural Language ToolKit (NLTK) a large variety of corpora, including some more domain-specific than Reuters, is available (NLTK, 2015). Furthermore NLTK offers numerous additional functionalities (e.g. retrieving sentences and words automatically from texts in corpora) that facilitate the development of the current algorithm. These are the reasons Python and NLTK are chosen for the development of the current tool. Eventually, in the near future, the tool could be made web based with the Django framework.

3.3 Algorithm in Detail

The specifics of the algorithm are explained in more detail now. Roughly the algorithm can be explained in four steps. The to-be-summarized text will now be addressed as “input text”. The first step is to load and normalize the input text. The second step is to calculate the earlier mentioned “TF*IDF” for every word in the input text. The third step is to calculate the summed TF*IDF per sentence and determine the sentences for the summary. The fourth and final step is to print the sentences in the correct manner and order.

1. The first step is that the input text is loaded into the tool. For the current research news articles from Reuters are summarized. The specifically used articles are described in the materials section of the methods [4.1].

a. With the use of NLTK the input text is separated into sentences.

b. The entire corpus (all news documents in the corpus) is also cut into sentences. Subsequently the input text is made into a Python dictionary with the sentence index as “key” and a list of the words of that sentence as “value”.

c. Every word is converted to lower case letters. This is done because the comparing of words is case sensitive. This means that e.g. “The” will be seen as a different word as “the”. Converting all words to lower case normalizes the words and solves this problem.

d. Additionally the total amount of words in the input text is counted and subsequently the word frequencies of all the words in the input text are calculated.

2. These before mentioned word frequencies are needed to calculate the so-called “Term Frequency * Inverse Document Frequency” (TF*IDF) to determine the ‘importance’ of words (Salton & Buckley, 1988). This is the second step of the algorithm. The formula to calculate the TF*IDF is:

TF ∗ IDFw   =  𝑐(𝑤)  ×  log   𝐷 𝑑(𝑤)  

a. In this formula c(w) is the word frequency of word w in the input text, and d(w) is the amount of documents the term appears in the corpus of D documents (Nenkova, Maskey & Liu, 2011). The word frequencies are calculated by dividing the occurrence of a word w by the total amount of words in the input text.

b. The implementation of the TF*IDF requires the analysis of the entire corpus. To increase the performance of the tool a JSON (JavaScript Object Notation) file is created that stores all document as “keys” and the words that are contained in that

(10)

news article as “value”. A JSON file can store information in a standardized manner that facilitates easy retrieval. The amount of docs that contain a specific word can then be easily retrieved with Python by calculating the amount of keys that have that word as value. In the formula this is d(w). The total amount of documents in the corpus D can be easily retrieved as well counting the total amount of documents. The advantage of creating a JSON file (instead of doing the corpus analysis while summarizing) is that it only has to be executed once for the corpus.

c. The value of the TF*IDF can be interpreted using the following information. Due to the fact that so-called “stop words” (e.g. “the”, “in” and “with”) both are likely to be very frequent in the input text, as in the corpus, their TF*IDF will be close to zero. Keywords that are representative for the topical content of the textual information are more likely to be only be frequent in the input text, and not in the corpus, resulting in a TF*IDF close to 1 (Salton & Buckley, 1988; Nenkova, Maskey & Liu, 2011). The TD*IDF method countermeasures an effect called the Zipfian distribution that words follow in textual information (Nenkova, Maskey & Liu, 2011). The Zipfian distribution has a few words that have a very high frequency (stop words) and many words that have a low frequency (keywords) (Baayen, 2001).

3. The third step is to calculate the normalized sum of TF*IDF per sentence:

a. The sum of the TF*IDF of all words per sentence is taken by adding the TF*IDF of the words in a sentence.

b. These sums will be normalized for sentence length by dividing it by the total number of words in the sentence. This can be used to predict which sentences have the most value for the summary (high normalized summed TF*IDF of a sentence, means high occurrence of keywords in the sentence).

c. The sentences with the highest normalized sum of TF*IDF will be used for the summary (Yohei, 2003). This is done by sorting the sentences on the basis of the summed TF*IDF, which can be done with a sort function in Python.

4. The fourth and final step is actually outputting the summary in the correct manner and order. Finally the summary of the input text has to be printed:

a. The sentences with the highest TF*IDF are stripped from parentheses, comma’s and quotes (Python’s dictionary notation) and printed in the same order as they appeared in the original input text.

b. The text will be printed in a 14 pt. serif Georgia font (based on section [2.4]). In the appendix [8.1] the Python source code is given for the algorithm.

If a word from the input text were not present in the corpus, other measures would be necessary. However, in the current situation that is not possible since the input text is a part of the corpus. A possible solution, “smoothing”, is provided though, to explain how this could be solved. This problem of missing terms in a corpus is common in the information retrieval field and can be resolved with the use of “smoothing” (Zhai & Lafferty, 2001). A reliable method of smoothing is the combination of curve fitting and the so-called Good-Turing estimate (Song & Croft, 1999):

𝑃!" 𝑡 𝑑 =  

𝑡𝑓 + 1 𝑆(𝑁!"!!) 𝑆 𝑁!" 𝑁!

With this technique the probability P of a term t with frequency tf in document d can be calculated. This smoothing technique should solve the problem that can occur when a term in the to-be-summarized document is missing in the corpus.

(11)

3.4 Analysis of Tool

To understand the tool and the Reuters corpus more thoroughly some basic analysis has been done on the output of the tool. First the content analysis of the sentences will be discussed. After that the descriptive statistics of the news articles will be discussed.

Content of sentences

Sentences with a high normalized sum of TF*IDF have been compared to sentences with a low normalized TF*IDF. It appears that sentences that have a higher TF*IDF are usually concluding sentences (e.g. “International agreements exist for sugar and wheat.”), whereas sentences that have a relative lower TF*IDF are usually sentences that contain examples or quotes (e.g. “But Baron added, "No real sanctions are available for a country that doesn't stick to its obligations...””). This is in line with the concept behind the summarizing tool; concluding sentences will hold more information, than sentences containing examples or quotes. Therefore the summaries should contain sentences with a high normalized sum of TF*IDF to increase the amount of information in the summary.

Descriptive Statistics

For more detailed statistics, table [1] presents the 10 largest news documents (all 1000+ words) of the corpus. The longest news document in the corpus has a length of 1515 words. The average TF*IDF of the 55 sentences of that text is 0.0088. The average sentence length is 27.63 words. It appears that the TF*IDF summed per sentence varies around 0.01, which is understandable since, according to the Zipfian distribution, most words in a sentence are stop words and thus have a TF*IDF close to zero (Baayen, 2001). The sentence lengths vary around 28 words per sentence. This last metric might be useful for deciding the cut-off point for how many sentences should be in the summaries. In a research that also used Reuters articles, albeit in 1990 (so the corpus has changed) found that optimal summary lengths were between 85-90 words or between 3-5 sentences (Goldstein et al., 1999). These were human created abstractive summaries that did not use sentence extraction. Sentences can be written more concise, when using abstractive summarization, than when using sentence extraction (Goldstein et al., 1999). For the current algorithm, since sentences are less concise, a summary length of 9 sentences will be chosen to ensure that enough information will be in the summary in order for the readers to make good enough decisions on the basis of the gained information. This is double the size of the summaries from the research of Goldstein et al. (1999) to compensate for the conciseness of abstractive summaries. When a summary is generated from the largest article (1515 words), it has a summary length of 207 words. This is an 86.33% decrease in amount of words from the original news document. Table 1. Descriptive statistics analysis tool (ID’s with “*” are used for user study).

Doc ID Total words Average Sentence Length Average TF*IDF

11224 1515 27.55 0.0088 6657 (text A*) 1216 27.63 0.0101 5214 (text B*) 1208 30.20 0.0117 5985 1202 26.71 0.0117 2521 1122 28.76 0.0091 7135 1202 27.95 0.0114 11083 1207 33.52 0.0107 1963 1008 25.85 0.0111 4944 1088 24.77 0.0109 8746 (text C*) 1037 28.81 0.0110 AVERAGE 1180.5 28.18 0.0107

(12)

4. User Study

To test the performance of the text-summarizing tool a user study will be conducted. The user study will have 3 conditions (read summary, skim and read entire text) and the participants will be randomly divided between the conditions. The conditions will be further elaborated in the protocol section. A repeated-measures design is deliberately not chosen to prevent sequence effects of the participants. According to Tague and Sutcliff (1992) sequence effects can be either positive (i.e. learning effect) or negative (i.e. fatigue effect). These sequence effects would have a severe effect on the validity of the results of the user study (Tague & Sutcliff, 1992). Participants are therefore only tested in one condition as a counter measure. In this section first the materials and participants will be discussed, next the protocol for the user study will be presented and finally the analysis of the results will be discussed.

4.1 Materials and Participants

Selected news documents

Length

Three news documents are chosen from the Reuters corpus with a length of over 1000 words. The length of these texts is of utmost importance, since it would be useless to feed very short texts to the summarizing tool. As mentioned in section [3.4], based on a research by Goldstein et al. (1999), it was decided that the summary length is determined on the 9 most important sentences. In the same research about evaluation metrics of summaries it was found that the size of a summary is independent of the document length (Goldstein et al., 1999). For the sake of the duration of the experiment therefore texts are chosen of 1000+ words. This should ensure an order of magnitude reduction in word length of the summaries, compared to the original text.

The chosen three texts are 1216 (text A), 1208 (text B) and 1037 (text C) words in length. This should ensure a significant decrease in length for the generated summaries. The summaries, containing the 9 most important sentences, are respectively 238, 188 and 215 words in length. The word compression is respectively 80.43%, 93.57% and 79.23% in comparison to the original texts. The texts and summaries that are used for the user study can be found in appendix [8.3].

Difficulty

The texts are chosen to be similar and to be of the same difficulty level. This increases the validity of the user study. This will be achieved by selecting three articles that share more or less the same topic. The three chosen articles are all financial articles about respectively Latin-American sugar prices, European beet plantings and about the U.S. economy. To a neutral reader the financial topics will all be judged as relatively difficult. This is done to prevent a ceiling effect, where all the participants would score high in all conditions. Due to the fact that the participants will not have prior knowledge about the topic, they are dependent on the text or summary to gain information.

Questionnaire

The online questionnaire will be composed of three main components. The first component will be the introduction with some basic demographic questions. The second part will be the experiment where the participants are given some more specific instructions, are presented with the text and afterwards are prompted with some questions about the text. The third

(13)

part will be an expertise and manipulation check. These components will be discussed in more detail now.

Demographics

To get a better picture of the kind of participants that are tested some demographic questions (i.e. age, education and gender) will be asked. In the beginning of the online questionnaire an introduction is given in which participants will be explained the goal of the research, how long the questionnaire will take, what the entry requirements are (18+, proficient in English) and that the results will be collected anonymously.

Experiment

Subsequently to the introduction the experiment starts with some more specific instructions, depending on the condition. After this, the participant is presented with the text or the summary. When the timer has passed a reading comprehension test will be presented to the participants. The participants will be prompted to make true/false decisions on statements about the text. The method for producing these statements will be based a renowned methodology that was developed by Masson (1983). This method was also used by Duggan and Payne (2006, 2009) and Lowrance and Lea Moulaison (2014) and has been proven to work to compare skim reading to conventional reading. This method originally works by thoroughly reading the 3 news documents and extracting 36 sentences that have a focus on the theme of the news document. By altering half of those sentences to make them incongruent with the topic of the news document the false statements are produced. The 36 sentences are ranked on importance of the topic to determine which sentences will be falsified and which will remain true (i.e. original) statements. The 9 most important and the 9 least important sentences remain true statements. The 18 remaining statements will be altered in order to make them the false sentences. The “importance” will stay unaltered, only the congruency will be changed (Lowrance & Lea Moulaison, 2014).

In the current research the ranking of the sentences on importance will be based on the TF*IDF scores of the sentences. The subjective rating of importance in the research of Lowrance and Lea Moulaison (2014) was performed by linguistic students. For the current research the resources required for such an analysis were not available, and therefore the TF*IDF based ranking is chosen. This should ensure that the ranking is done objectively (in comparison to the subjective rating of importance as seen in e.g. Lowrance and Lea Moulaison (2014)). However, the 9 most important sentences, which would account for the true important sentences, would be the same as the summary. This would bias the experiment towards an advantage for the summary condition. That is why the sentences of the summary are excluded from the selection for the true/false statements. The remainder of the sentences of the text is divided into 4 sections, in descending order of importance (i.e. TF*IDF). For the sake of the duration of the experiment, in the current research only 16 sentences will be used. A selection of 4 of the first section will result in 4 true important statements, a selection of 4 of the second section will result in 4 false important, a selection of 4 of the third section will result in 4 false unimportant and a selection of 4 of the fourth section will result in 4 true unimportant statements. The participants then will have to decide whether a statement is true or false. In the summary condition this means that they have to decide, based on the knowledge they obtained from the summary, if the statements are congruent with the summary or not. Additionally, to not force the participants to guess, an “I don’t know” option is added. Since satisficing is about making 'good enough' choices, the performance of the summary condition will not have to outperform the conventional reading condition; the made choices should have to be as good or almost as good as the conventional reading condition to be ‘good enough’. More on the scoring of the participants

(14)

will be elaborated in section [4.2]. The statements that were used in the user study can be found in the appendix [8.4].

Expertise and manipulation check

Finally some expertise questions will be asked to check whether the participants had no extensive prior knowledge on the topic of the text. This could bias the results, since someone with expertise on the topic can make correct decisions on the topic without reading the text. Finally a manipulation check will be performed to check whether the set timer correctly estimated their reading pace and if the participants understood the presented text and questions. A manipulation check measures if the intended effect was indeed present on the participants. Furthermore it emphasizes the validity of the experiment (Cozby, 2009). In the current case the effect on the participants should be that in the skimming condition the participants do not have the time to read through the text carefully. In the other two conditions the participants should have enough time to read through the text (entire text or summary) carefully. Participants, that (1) had extensive prior knowledge on the topic, (2) did not understand the text and/or (3) did not understand the questions, did not experience the intended effect and are therefore excluded from the analysis.

Participants

The participants will be contacted via email and other social media (e.g. Facebook). This will ensure an efficient way of collecting a large amount of participants. To increase the validity of the user study a sample size (N) of “greater than 25 to 30” per condition is desired (Hogg, Tanis & Rao, 2006). Since the participants will be divided between the three conditions a minimum of 30 participants per conditions is sought after. A total N of 90 (3x30) participants is thus needed to ensure a high validity for the user study. Participants will not be given a reward for their participation, but, if requested, will receive the results of the research when finished.

4.2 Protocol

The participants will be randomly assigned to 9 groups. These groups will be assigned to either reading the tool-generated summary, skimming or reading the entire text. This will be done for three different news texts: A, B or C. These texts will be presented to the participants through the online data collection software Qualtrics, which is chosen due to the support of functions like a randomized block design and timers on screens. In table [2] the set-up of the grouping of the user study is presented. As depicted in table [2] group 1, 4 and 7 will be in the summary condition, group 2, 5 and 8 will be in the skimming condition and group 3, 6 and 9 will be in the reading condition. The conventional reading (entire news article) condition will function as a control condition, which is also included in the user study to compare the conditions with and to increase the validity of the user study.

Table 2. Group assignment conditions user study.

Text A Text B Text C Summary (N ≥ 30) Group 1 Group 4 Group 7

Skimming (N ≥ 30) Group 2 Group 5 Group 8

Reading (N ≥ 30) Group 3 Group 6 Group 9

Three different texts are chosen to decrease the dependency on a specific text. During analysis the three groups per condition will be taken together for the calculations. As mentioned before, the protocol for the user study will be fixed with the use of the online software Qualtrics. This software provides a way of accurately setting different times that a participant has to read for every condition. A different timer is set for every condition to

(15)

ensure that for example in the skimming condition participants do not have the time to read the entire text, or in the conventional reading condition that participants do not have the time the memorize the text. After the timer will have passed the participants will be asked the earlier mentioned reading comprehension questions in a new window. These questions will be presented in a random order. The scores of the reading comprehension tests will be compared between conditions.

College students read on average 300 words per minute and skim on average 450 words per minute (Carver, 1992). These measures can be used to make sure that e.g. during the skim condition participants do not get enough time to read the entire text carefully. According to Carver (1992) a college student would take about 5 minutes to read a text of 1500 words, 3.3 minutes to skim the text and less than a minute to read a summary of 200 words. With these findings correct timers can be set during all conditions. The reading rates mentioned before will be used to set the timer (Carver, 1992). In table [3] the times in seconds are presented for the user study.

Table 3. Time per condition for reading (in seconds), length of text and summary in parentheses.

Text A (1216 – 238) Text B (1208 –188) Text C (1037 – 215) Skimming 162 seconds 161 seconds 138 seconds

Summary 48 seconds 38 seconds 43 seconds

Reading 243 seconds 242 seconds 207 seconds

On the basis of earlier-mentioned findings (section [2.4]) it can be concluded that the texts should be represented in a large font (i.e. 14pt). Furthermore a trade-off has to be made between preferred fonts (i.e. sans serif) and a font that enables fast reading (i.e. serif). Since the satisficing algorithm has the use to save time for readers the 14 pt. serif font (Georgia) is chosen. Finally, the texts will be represented in a conventional black font color on a white background to ensure reading accuracy.

The procedure of the experiment is as followed: First the participants will be briefed about the goal of the user study and receive some instructions. Then they will be asked some demographic questions. Before the conditions the participants will be given different instructions on how to read the text to stimulate the correct type of reading. Afterwards a manipulation check will be performed to check if, for example, the participants really had to skim, and did not have time to read the whole text. Also an expertise check will be performed afterwards to check whether participants did not have extended prior knowledge on the topic of the text. As mentioned before, on the basis of the expertise and manipulation questions participants might be excluded from the final analysis (e.g. the participant did not understand the presented news article).

4.3 Analysis

The scores between reading the summary, skimming the text and the reading of the text will be compared to measure the performance of the text-summarizing tool in comparison to skimming the text. The performance will be measured by summing the total amount of correct answers per participant. Correct answers for ‘important statements’ will be scored with 2 points, whereas correct answers for ‘unimportant statements’ will be scored with 1 point. Vice versa, incorrect answers for ‘important statements’ are fined with -2 points, whereas incorrect answers for ‘unimportant statements’ are fined with -1 point. Statements answered with “I don’t know” are given 0 points. This scoring method has been customized for the current research. This has been done due to the fact that the “I don’t know” option

(16)

was added. Also it has been shown that the important statements represent the ‘macrostructure’ of the text and the unimportant the ‘microstructure’ of the text (Duggan & Payne, 2009). The macrostructure is more important for the overall understanding of the text (Duggan & Payne, 2009). Therefore those statements should therefore be rewarded more in comparison to the unimportant statements.

The achieved score will then be a measure of how well the participants understood the news article by using the method (summary reading, skimming or conventional reading) of their condition. Statistical analysis (i.e. one-way ANOVA) will show if the reading comprehension will have improved compared to skimming by using the text-summarizing tool (Field, 2013). If there is no significant difference between the summary and conventional reading condition it would mean that reading a summary is ‘good enough’ to make true/false decisions about a Reuters news article. This would endorse the text-summarizing tool. What also would endorse the tool is that if the scores of the summary condition are significantly higher than that of the skimming condition. This would indicate that it is better to read a generated summary than to skim through the entire text. Also, for explorative purposes, two more analyses will be performed. The first will analyze the effect of the self-reported level of expertise on the scores of the test. The second analysis will uncover the amount and type of statements that were answered with “I don’t know” (IDK responses) per condition.

5. Results

In the current section the results from the user study are presented. As mentioned in section [4.2] the groups of the 3 texts of the same condition are taken together for the analysis to decrease the dependency on a certain text.

In the current survey a total of 106 participants conducted the research. Of those participants 40% was female and 60% male. The majority of the respondents were college students or graduates of the age between 18 and 29 years old. In table [4] and [5] the more specific demographics are presented.

Table 4. Age groups participants user study.

Age Percentage

18-29 years 77%

30-29 years 10%

40-64 years 10%

65 years or older 4%

Table 5. Highest completed education participants user study.

Highest completed education Percentage

High school 10%

College undergraduates 26%

Trade/technical/vocational training 3%

College graduates 45%

Postgraduates 17%

In the following sections first the analysis will be conducted in which the participants are partially filtered. After that the results of the analysis with complete filtering will be presented. The difference between the filtering processes will be explained in further detail in those sections.

(17)

5.1 Results post partial-filtering

For the analysis of the results of the 106 conducted experiments some filtering had to be conducted. First, all not fully completed cases were filtered out and excluded from analysis. Also all participants that indicated that they had extensive prior knowledge on the topic of the text were excluded from analysis. These participants were excluded to ensure that all participants depended on the presented text to gain the information needed to answer the questions. Finally, participants who strongly indicated that they did not understand the text or the reading comprehension questions were filtered out. Of the initial 106 participants 22 cases were filtered out accordingly and 84 participants remained to be analyzed. To clarify these were participants who had not indicated to be an expert, who understood both the text and the questions and had completed all the components of the experiment. In section [5.2] an additional filter, participants who strongly indicated that they did not have enough time to read or skim through the presented text, is applied as well. This resulted in only 58 participants to remain for the last analysis. In this section the results are analyzed without this filter, because even though some had indicated to not have completely finished reading or skimming, they all understood the text and the associated reading comprehension questions.

Table 6. Descriptive statistics conditions post partial-filtering.

Condition N Mean Std. Deviation

Summary 29 4.34 5.479

Skimming 30 3.83 4.829

Conventional reading 25 7.84 6.349

Total 84 5.20 5.741

In table [6] the descriptive statistics are presented. The mean scores are the averaged scores of all three texts in the same condition. The minimum possible score is -24 (all statements incorrect) and the maximum score is 24 (all statements correct). Furthermore, if a participant had selected “I don’t know” for every statement, a score of 0 would be achieved (the reason for the explorative analyses at the end of the sections). As can be seen in table [6] the spread of the values is reasonably high (i.e. standard deviation is of the same order of magnitude as the mean).

(18)

Also in graph [1] this reasonably high spread or dispersion (i.e. the range) can be seen in the boxplot. It can also be seen that the summary condition has the lowest interquartile range. Also the summary condition had one outlier (i.e. case 58), who scored significantly higher than average in that condition. Furthermore, when looking at the median, it can be observed that the skimming condition scored the slightly lower than the summary condition and that the conventional reading scored the highest.

First it will be determined whether the data is normally distributed, next if the assumption of equality of variances is not violated, and after that it will be determined whether the observed differences are significant.

Normality test Shapiro-Wilk & Levene’s test of Homogeneity

According to the Shapiro-Wilk test, which is most appropriate in small samples (<50), none of the conditions differed significantly (alpha = 0.05) from a normal distribution (Field, 2013). The results of the Shapiro-Wilk test can be seen in table [7].

Also the assumption of equality of variances (homogeneity) is not violated. According to Levene’s test of Homogeneity there was no indication of unequal variances between the conditions (F(2,81 = 2.094, p = .130).

The fact that the data is normally distributed and that variances between conditions are not significantly unequal, even though the spread of the values is reasonably high, means that it is possible to perform a one-way ANOVA to reveal whether there is a significant difference between the conditions (Field, 2013).

 

Table 7. Shapiro-Wilk test for normality.

Condition Statistic df Significance

Summary .963 29 .379

Skimming .951 30 .176

Conventional reading .969 25 .629 One-Way ANOVA

There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,81) = 4.100, p = .020). A Tukey post-hoc test revealed that the score on the reading comprehension test was not statistically significantly different after skimming the text (3.83 ± 4.829, p = .060) and reading the entire text (7.84 ± 6.349, p = .933) compared to reading the summary (4.34 ± 5.479). However skimming the text resulted in significantly lower scores than reading the entire text (p = .024). The significant difference in mean score is 3.65 points. This last result means that the conventional reading condition performed significantly better than the skimming condition, whereas there was no significant different performance between the summary and the conventional reading condition.

Explorative Analysis: influence expertise on scores

To test what the influence of the self-reported expertise level was on the scores of the experiment, a one-way ANOVA test was conducted. In table [8] the descriptive statistics are presented. As can be seen in table [8] no participant self-reported a high level (“agree” or “strongly agree”) of expertise to the question “I had extensive prior knowledge on the subject of the text”. The filter for expertise was obviously removed for this analysis. Participants that indicated to “strongly disagree” with the statement were participants that strongly did not have any extensive prior knowledge on the subject.

(19)

A Levene’s test of Homogeneity was performed to check whether the assumption was met to perform a one-way ANOVA. The normality check was already performed for the previous ANOVA and showed that the data was normally distributed. According to Levene’s test of Homogeneity there was no indication of unequal variances between the conditions (F(2,81) = 1.595, p = .209). This means the assumptions are met to perform one-way ANOVA.

Table 8. Descriptive statistics expertise levels post partial-filtering.

Condition N Mean Std. Deviation Strongly Disagree 56 4.11 5.423

Disagree 26 6.92 5.782

Undecided 2 13.50 .707

Total 84 5.20 5.741

There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,81) = 4.652, p = .012). However, a Tukey post-hoc test revealed that the scores did not differ significantly, when individually compared. As can be seen when looking at table [8], the “undecided” or neutral group had the highest scorers and the “strongly disagree” group the lowest. This indicates that the level of expertise was indeed a valid influence on the scores in the reading comprehension test.

Explorative Analysis: amount of IDK responses

Using a multiple response cross-tabulation test the percentage of IDK responses per condition was uncovered. As can be seen in table [9], when looking at all statements, the summary condition has the highest percentage of IDK responses. The skimming condition had a slightly higher percentage than the conventional reading condition of IDK responses. These percentages did not differ significantly, however, as revealed by an unpaired sample t-test. Also, it can be seen that there are no large differences in the percentages between important and unimportant statements. In the summary condition the percentage of unimportant statements was slightly higher. In the other two conditions however, this was vice versa. These percentages did not differ significantly, as revealed by a paired sample t-test. Apparently the participants did not know the answer to an almost equal amount of important and unimportant statements.

Table 9. Percentage of IDK responses, post partial-filtering.

Condition All statements Important Unimportant Summary 44.21% 42.86% 45.56%

Skimming 28.88% 28.75% 27.29%

Conventional Reading 28.46% 29.71% 28.45%

Total 33.50% 33.77% 33.76%

Knowing that the participants in the summary condition did not see the sentences before, these results are not striking. When looking at the scores compared to the percentages however, the scores of the summary were not the lowest. A possible explanation might be that the skimming condition made more errors (i.e. incorrect answers). These errors might have occurred because they had seen the statement before and therefore did not answer them with “I don’t know”, but did not correctly store the congruity of the sentence correctly in their memory. Also the skimming and conventional reading condition had a similar percentage of IDK responses. The scores of the skimming condition did, however, differ

(20)

significantly from the conventional reading condition. A possible explanation might be that the participants in the skimming condition incorrectly judged their knowledge on sentences causing the lower scores. It seems as if the skimming condition was overconfident in their judgments of learning (JOL, mentioned in section [2.4]), whereas the summary condition had a more correctly assessed JOL.

5.2 Results post complete-filtering

Additionally on the filter from the previous section, the participants that strongly indicated that they did not have enough time to read or skim (depending on the condition) the presented text are also filtered out for the following analysis. After that filtering, only 58 of the 106 participants remained to be included in the analyses. These 26 additional excluded participants had indicated though to have understood the presented text and associated reading comprehension questions.

 

Table 10. Descriptive statistics conditions post complete-filtering.

Condition N Mean Std. Deviation

Summary 19 4.47 6.372

Skimming 18 4.61 4.705

Conventional reading 21 8.84 6.349

Total 58 5.97 6.170

Graph 2. Distribution of scores per condition post complete-filtering.

As can be seen in the descriptive statistics table [10] the spread or dispersion of the values is also reasonably high (i.e. standard deviation is of the same order of magnitude as the mean). This is also depicted in graph [2]. Similar to the first analysis the interquartile range of the summary condition is the smallest. Moreover it can be seen that the summary condition scored the lowest and the conventional reading condition the highest. The skimming condition scored just marginally better than the summary condition. When looking at graph [2] the same order in mean is also reflected in the order in medians.

(21)

Like in section [5.1] first a check for normality and test for homogeneity will be performed, afterwards a one-way ANOVA will be conducted to determine whether the observed differences are significant. Finally the same two explorative analyses as in the previous section will be performed as well.

Normality test Shapiro-Wilk & Levene’s test of Homogeneity

According to the Shapiro-Wilk test, which is most appropriate in small samples (<50), none of the conditions differed significantly (alpha = 0.05) from a normal distribution (Field, 2013). The results of the Shapiro-Wilk test can be seen in table [11].

Table 11. Shapiro-Wilk test for normality.

Condition Statistic df Significance

Summary .954 19 .454

Skimming .975 18 .880

Conventional reading .972 21 .786

Furthermore the assumption of equality of variances (homogeneity) is not violated. According to Levene’s test of homogeneity there was no indication of unequal variances between the conditions (F(2,55 = 1.669, p = .198).

The fact that the data is normally distributed and that variances between conditions are not significantly unequal, even though the spread of the values is reasonably high, means that it is possible to perform a one-way ANOVA test to reveal whether there is a significant difference between the conditions (Field, 2013).

One-Way ANOVA

There was no statistically significant difference between groups as determined by one-way ANOVA (F(2,55) = 2.911, p = .063). Even though similar results were found the significant difference was not present anymore. To sum up in this section an additional 26 participants were excluded from the analysis, which resulted in disappearance of the significant difference in score between the skimming and conventional reading condition, which was found in the previous section.

Explorative Analysis: influence expertise on scores

For the complete-filtering selection the influence of the self-reported expertise level on the scores of the experiment was also analyzed. A one-way ANOVA test was conducted to test whether there was a significant difference between expertise levels. In table [12] the descriptive statistics are presented. As can be seen in table [12], no participant self-reported a high level (“agree” or “strongly agree”) of expertise to the statement “I had extensive prior knowledge on the subject of the text”. Like in the previous analysis, the filter for expertise was obviously removed for this analysis. Participants that indicated to “strongly disagree” with the statement were participants that strongly did not have any extensive prior knowledge on the subject.

Table 12. Descriptive statistics expertise levels post complete-filtering.

Condition N Mean Std. Deviation Strongly Disagree 31 4.71 6.187

Disagree 25 6.92 5.901

Undecided 2 13.50 .707

Referenties

GERELATEERDE DOCUMENTEN

Since the best shading is achieved at the lowest possible resolution, the font SHADE is not device independent — it uses the printer resolution.. Therefore the .dvi file is not

• Several new mining layouts were evaluated in terms of maximum expected output levels, build-up period to optimum production and the equipment requirements

The first case focuses more on the design process and how a designer’s intuition can be used in combination with data to create more confidence in the improved performance of the new

Using these place tags, we can draw up a list of places tagged by Rotterdam Instagram users who use the #hijabfashion hashtag.. We can further infer connections among these places

Russia is huge, so there are of course many options for you to visit, but don’t forget to really enjoy Moscow.. But don’t panic if you don’t understand how it works, just ask

De interviewer draagt bij aan dit verschil door zich wel of niet aan de vragenlijst te houden, want of de interviewer zich aan de standaardisatie houdt of niet, heeft effect op

Heselhaus, Deutsche Literatur; derselbe, Deutsche Lyrik; Conrady, Moderne Lyrik; Burger, Struktureinheit 13) vgl.. bevor der Autor dessen Text kennt. Aus der

We found that applying Bloom’s Revised Taxonomy to our five games provided a structured discussion of player actions in relation to cognitive processes, knowledge levels, and