1
Faculty of Electrical Engineering, Mathematics & Computer Science
Detection and tracking of events using open source data
Jordy M. van der Zwan M.Sc. Thesis December 2020
Supervisors:
dr.ir. M. van Keulen
dr. M. Theune
Faculty of Electrical Engineering,
Mathematics and Computer Science
University of Twente
P.O. Box 217
Detection and tracking of events using open source data
Jordy M. van der Zwan
Abstract
This research focuses on designing a generic event detection system that uses
open source data. Good performance is also a requirement to ensure that private
individuals are able to use the system as well. The system must be able to detect
events in real time based on messages from a message stream. To achieve this
goal, we firstly explore what should be considered an event by looking at existing
definitions and building our own definition based on observed components. Sec-
ondly, an overview is created of which pieces of information can be displayed to the
user of the system in order to communicate the event to the user. An event detection
system was designed which relies on a user defined reference model supported by
Named Entity Recognition. The reference model plays a key part in the linking of
keywords with the same meaning and the extraction of meaning from the messages
from the message stream. The design was evaluated on both recall and precision
using a Twitter datastream as the message stream. Taking into account the limita-
tions of the available data, the design reached a peak recall of 80% and precision
of 66%. The design performed sufficiently and still has potential to be improved in
future work.
Contents
1 Introduction 5
1.1 Motivation . . . . 5
1.2 Use cases . . . . 6
1.3 Challenges for event detection . . . . 8
1.4 Research questions . . . . 9
1.5 Research method . . . 10
1.6 Thesis overview . . . 11
2 Defining an event 12 2.1 Existing definitions . . . 12
2.2 Problem description . . . 16
2.3 Event properties . . . 17
2.4 Event hierarchies . . . 22
2.5 Reframing an event . . . 23
2.6 Conclusion . . . 24
3 Communicating an event 25 3.1 Informal user study . . . 26
3.2 Internal information . . . 28
3.3 External information . . . 31
3.4 Coding systems for events . . . 32
3.5 Communicating the big picture . . . 33
3.6 Conclusion . . . 35
4 Detecting and tracking an event 36 4.1 Related work . . . 36
4.2 Requirements . . . 38
4.3 Global design . . . 40
4.4 Detailed design . . . 43
4.5 Summary . . . 47
5 Evaluation 48
5.1 Event metrics . . . 50
5.2 Evaluating Recall . . . 50
5.3 Evaluating precision . . . 56
6 Discussion 60 6.1 Discussion of recall evaluation . . . 60
6.2 Discussion of precision evaluation . . . 62
6.3 Performance . . . 62
6.4 Limitations . . . 63
7 Conclusion 64 7.1 Answers to the research questions . . . 64
7.2 Evaluation . . . 65
7.3 Future work . . . 65
References 67
Appendices
Chapter 1 Introduction
The world is increasingly generating an abundance of information, some of which is relevant to an user, but most of which is irrelevant. Finding the relevant information among the vast amount of data is a task that has become impossible for humans.
This work focuses on detecting real world events that happen and are relevant to the user of the system.
Aims The aim of this work is firstly, to analyse the factors that need to be con- sidered when deciding what should be considered an event in the context of event detection. The second goal is to provide an overview of how these events can be represented. Based on these answers to these questions, a light-weight generic event detection system which can be configured to be useful in multiple use cases will be designed. Although not specifically aimed at Twitter data, due to the exten- sive related work already done in Event Detection using Twitter data, Twitter will be used as an example in many cases.
1.1 Motivation
A tremendous amount of data is available on the Internet to everyone who wants to use it. The amount of openly available data is increasing in places like online social media such as Twitter. The users on such platforms produce enormous amounts of messages, the record being 143,199 tweets per second in August of 2013
1. This is a drastic difference with the daily average of 500 million tweets per day which translates to 5700 tweets per second. Among the messages about what people had for breakfast, more ’valuable’ information is tweeted as well. Journalists and other people use Twitter to disseminate news about things that happen in the world.
Detecting these happenings through the messages that are being sent through data
1