Event Detection Using Machine Learning Classifiers in the Context of Real-World Objects

(1)

Event Detection Using Machine Learning

Classifiers in the Context of Real-World

Objects

Timon Langlotz

January 25, 2016

Program: Master Software Engineering Organization: Peerby B.V. Company Supervisor: Eelke Boezeman

UvA Supervisor: Dr. Vadim Zaytsev

(2)

Abstract

Within this thesis, an Internet of Things framework is explored which utilizes machine learning algorithms to detect learned events in the context of real-world objects. Several areas are identified where this technology will be a valuable asset. A prototype based on an Android smart phone is developed and results based on classification algorithms are presented to the reader.

(3)

Acronyms Acronyms

Acronyms

API Application Programming Interface. x, 12, 15–18 CRF conditional random field. 10, 11

EP emerging pattern. 10, 11

ERP enterprise resource planning. 6

ETHZ Eidgen¨ossische Technische Hochschule Z¨urich. 5 GPS Global Positioning System. 16

HMM hidden Markov model. 10

IoT Internet of Things. iv, 1, 2, 4–7, 14

MIT Massachusetts Institute of Technology. 5 RFID radio-frequency identification. 5

RIPPER Repeated Incremental Pruning to Produce Error Reduction. 14 SCCRF skip-chain conditional random field. 10

WEKA Waikato Environment for Knowledge Analysis. 2, 4, 5, 11, 12, 15, 18, 21, 23–25, 27, 29

(4)

CONTENTS CONTENTS

1 Introduction

Continuous miniaturization and cost reduction of electronic devices is in-creasingly enabling Internet of Things (IoT) applications, i.e. the connec-tion of physical objects to computaconnec-tional devices in order to create “smart” objects. Today, IoT technology is applied in various areas with various pur-poses. Such applications include, but are not limited to, real-time tracking of goods in logistics, resource-friendly light and climate control in residential buildings, and self-monitoring and calling for maintenance of machines in fac-tories. In order to achieve the functionality of the aforementioned purposes in a reliable way, a certain level of context awareness is needed to identify the events which should trigger the corresponding actions [33]. While events might be intuitive for and easy to perform or recognize by humans, it is dif-ficult to formalize static program instructions that identify them. A viable solution comes in the form of machine learning algorithms. Such algorithms demonstrate their strength in this field by building models from exemplary input and allow to make automated decisions based on hard data [5]. Within this thesis, a framework for IoT devices is researched that utilizes machine learning algorithms in order to recognize events based on sensor data input.

1.1 Research Questions

Two research questions have been formulated and were investigated over the course of this project:

• Can we identify events in the context of real-world objects by analysis of available sensor data with machine learning algorithms?

• Which machine learning concepts are suitable to do so?

1.2 Related Work

The research that was conducted over the course of this project was largely inspired by research in the field of human activity recognition. In relevant publications of this field, sensor data mostly from accelerometers is used to detect human activities such as walking, running, or standing. Bussmann et al. [8] use accelerometers at the thighs, trunk, and lower arms in order to detect postures and motions of human beings. Similar to this work, features of the recorded sensor data are extracted and analyzed, but then classified based on threshold values. Lee and Mase [38] utilize an accelerometer and gy-roscope in an odometry system for humans. Part of the work is a unit motion recognition system which uses the standard deviation of accelerometer data

(6)

1.3 Outline 1 INTRODUCTION

for rule-based classification of activities. Bao and Intille [4] classify human activities with data from five accelerometers by using supervised machine learning algorithms while focusing on naturalistic, non-laboratory settings. As in this work, the authors use features of the accelerometer data and algo-rithms from the Waikato Environment for Knowledge Analysis (WEKA) [21], a Java machine learning software library, in order to classify 20 activities. In contrast to this, Ravi et al. [47] reduced the number of accelerometers to one and the number of activities to eight while still obtaining high recognition accuracy (mostly >90%). Finally, Kwapisz et al. [34] perform activity recog-nition based on smart phone accelerometer data. Again, features such as average and standard deviation were used in a supervised machine learning approach in order to classify activities with WEKA.

In this work, a more general approach is taken to recognize any form of event which exposes as a typical pattern to its environment that is measurable by sensors. Applications of this research exist in several fields, including machine health checking in industry, usage-based rental services, or in the field of entertainment. Some examples for these applications are detailed in Chapter 2.

1.3 Outline

After introducing the general idea of this thesis in the above section, the following chapter will provide motivating real-world examples in order to illustrate the value of this work. Chapter 3 discusses the necessary back-ground knowledge regarding IoT, machine learning, human activity recogni-tion and WEKA and will give a brief overview of these areas. In Chapter 4, the research method is outlined by providing the operating principle of the developed framework prototype. Chapter 5 shows the actual research and summarizes the gathered results. The thesis will be concluded in Chapter 6 by providing an analysis of the results and conclusions are drawn based on this work.

(7)

2 MOTIVATING EXAMPLES

2 Motivating Examples

This chapter introduces a number of real-world scenarios from different fields in order to demonstrate the potential of the concepts introduced in this project. These theoretical scenarios are believed to result in economical and ecological advantages for individuals and organizations in the relevant areas.

2.1 Machine Health Checks

One viable application is the maintenance of machines in heavy duty in-dustry. Most of these machines expose typical operation patterns to their environment. Measurements such as vibration, sound, or temperature, or combinations of these can be sensed by conventional sensors. In the case of attrited or damaged parts in the machine, these typical operation patterns usually change. Running a machine under these conditions may increase the damage exponentially due to higher forces on all parts and result in a deterioration of the machine output quality.

A “smart” machine is aware of its typical behavior patterns and is capable of reporting any abnormalities to a person or system that is responsible of taking further measures, which in return potentially reduces permanent damage.

2.2 Collaborative Consumption of Ownerless Products

Collaborative consumption describes a phenomenon where participants share access to goods or services, often for reasons of sustainability, economic bene-fits, reputation, or enjoyment. Information and communication technologies contribute to the vast growth of this area by providing a platform for sharing communities [23] such as Peerby. Peerby is a website and mobile appli-cation that connects individuals to their neighbourhood, enabling them to borrow products if they are available and others are willing to lend them out. The concept of Peerby has proven itself to be successful: since its launch in September 2012, Peerby has grown to approximately 150,000 users with mature communities in the Netherlands, Belgium, UK, Germany, and is currently emerging in six USA pilot cities.

During the last years, Peerby has identified factors that hinder collabo-rative consumption. One of these factors is a reciprocal imbalance between borrowers and lenders, where borrowers feel guilt or shame towards lenders. Another factor is believed to be the desire for autonomy of humans. When borrowing a product which is the property of somebody else, people feel de-pendent on others. Removing ownership from certain products and paying a

(8)

2.3 Smart Trash Cans 2 MOTIVATING EXAMPLES

small share of the product for using it is believed to reduce these factors. In order to make this approach work, it is necessary to know when a product is being used. Similar to machine health checks, many products expose typ-ical operation patterns to their environment which can be utilized to detect product usage. Furthermore, it is desirable to bill the user according to the time the product is being used, and the product needs to be passed on to the next person, if it has not been used for a certain amount of time. More information about the concept of ownerless products can be found in the appendix.

2.3 Smart Trash Cans

Another viable application can be found in municipal garbage collection which is usually based on fixed schedules. However, those schedules are based on estimates of when the trash cans are believed to be full instead of the actual necessity to empty them. As a result, trash cans are rarely emp-tied at times when there is an actual need to do so. Potential consequences are economical and ecological disadvantages: garbage trucks are required to collect the trash more often than they need to, causing higher expenses to the municipality as well as air pollution. Yet, during big celebrations and in the vicinity of well-attended events, trash cans become full too fast.

“Smart” trash cans that report their fill-level to the local garbage collec-tion services have the potential to counteract these disadvantages. Proximity sensors monitor the fill-level of the trash can and notify the municipal garbage collection services, if it reaches a certain level. In order to avoid errors, the fill level could be monitored at three points, namely bottom, middle, and top of the container. As an example, this could avoid the detection of a full trash can that was caused by trash that is stuck at half height. Garbage collection can then be scheduled as necessary.

In this chapter, possible applications of a general IoT framework have been provided to the reader. The following chapter will introduce Internet of Things (IoT), machine learning, human activity recognition and WEKA to the reader in order to supply her with the necessary background knowledge.

(9)

3 BACKGROUND AND CONTEXT

3 Background and Context

In this chapter, an overview of IoT and machine learning is provided, since these topics build the foundation for this work. However, both fields are vast research areas, and cannot be covered to full extent here. The interested reader is advised to deepen her knowledge on IoT with “Internet of Things” by Weber and Weber [53] and on machine learning with “Pattern Recogni-tion and Machine Learning” by Christopher M. Bishop [5]. These resources also serve as the primary source of the background knowledge provided in this chapter. Furthermore, the current research in human activity recogni-tion is summarized since it serves as a reference field for this work. Lastly, WEKA [21] as well as a subset of its available algorithms that are used for classification tasks in the context of this work are introduced.

3.1 Internet of Things

The IoT links physical objects to the virtual world by means of computa-tional devices, sensors, software, and communication modules, thus allowing for management and monitoring of these objects. By embedding these tech-nologies, objects are capable of perceiving their context and can communicate and interact with other devices, services, and people. “Smart” objects in re-turn have the potential to generate considerable added value. Collection of information on real world objects can be automated in great detail, at low cost, and in near real-time, thus providing the ability to react to events in a quick, automated, and informed way. This yields advantages on complex and critical situations, therefore increasing the chances of man and machine to successfully handle them.

The IoT field has grown rapidly over the last years. The Auto-ID Cen-ter at the Massachusetts Institute of Technology (MIT) contributed greatly to this with their work on a cross-company radio-frequency identification (RFID) infrastructure. In 2002, Forbes Magazine coined the term “Internet of Things”, based on a quote by Kevin Ashton, co-founder and former head of the Auto-ID Center, saying “We need an internet for things, a standardized way for computers to understand the real world”. Already in 2005, the first books had the term in their title [39], and in 2008, the first scientific confer-ence was organized by members of University of St. Gallen, Eidgen¨ossische Technische Hochschule Z¨urich (ETHZ), and MIT [41].

Porter and Heppelmann [45] describe a typical implementation of an IoT product as a technology stack of three layers consisting of software and hard-ware components as depicted in Figure 1. The first layer comprises the object itself with its embedded core software, such as the operating

(10)

sys-3.2 Machine Learning 3 BACKGROUND AND CONTEXT

tem, and hardware, such as microcontrollers, sensors, and actuators. The communication protocols are located on the second layer and enable com-munication between the product and other parties, such as a server or other products. The third layer typically holds the intelligence of the product, such as databases, software applications, analytics, and development and execu-tion environments. Commonly, all layers are accompanied by identity and security tools to manage authentication and system access. The product may furthermore have access to external information sources, such as weather or traffic information providers, and can be connected to existing systems of a business, such as enterprise resource planning (ERP) systems.

Product Software Product Hardware Network Communication Smart Product Applications

Rules/Analytics Engine Application Platform Product Data Database Identity and Security External Information Sources Integration with Business Systems

=

Figure 1: The IoT Technology Stack, Porter and Heppelmann [45]. Despite the considerable interest in this field, there is still no common understanding what IoT actually comprises. A great number of definitions have been proposed typically emphasizing one of the three main aspects of the IoT: 1. The things that become connected, 2. the aspects regarding the Internet itself and its underlying technologies, and 3. the challenges that are being faced in the field, such as information gathering and processing. Fields of application are diverse and are being incorporated in various areas of everyday, such as smart industry, smart homes, smart transport, smart health, and smart cities [55].

The aim of this thesis is to provide a framework that allows to extend physical objects easily and intuitively in order to become smart IoT devices.

3.2 Machine Learning

Machine learning has gained increasing importance in our daily lives over the last decades. Applications include, but are not limited to, predicting strokes and seizures [30], detecting financial fraud [49], identifying spam messages in email accounts [20], and recognition of postal addresses [37]. Machine learn-ing is a class of data-driven algorithms that allow the use of historical data to

(11)

3.2 Machine Learning 3 BACKGROUND AND CONTEXT

learn from examples by extracting common patterns and transforming these generalizations into an adaptive and abstract model. In contrast, manual creation of rules or heuristics leads to a vast amount of rules and exceptions for such rules with predominantly inferior results.

Most practical applications use some form of data pre-processing where machine learning algorithms can identify patterns more easily. In case of letter or digit recognition, a typical pre-processing transformation is scaling all image data to the same size. Furthermore, pre-processing can possibly reduce computation times, e.g. for a face recognition task which is usually based on high-resolution video streams. Processing a huge amount of raw image data per second might be computationally infeasible for a pattern recognition algorithm. Raw data is therefore commonly replaced by a set of features that were computed based on the raw data.

Typically, machine learning is partitioned into three categories, namely supervised learning, unsupervised learning, and reinforcement learning. Su-pervised learning describes a task where training data comprises mappings from input vectors to output values. The desired result is a model which maps an input vector to the corresponding output, either from a finite space (e.g. “low”, “medium”, or “high” for the chance of rain) or infinite space of values (e.g. R, the set of all real numbers).

In unsupervised learning, no output values are previously defined and in-put is not labeled. Here, the task of the algorithm is to find a structure in the input vectors and to group them accordingly. One application of unsu-pervised learning is clustering which is used for example in image processing. Figure 2 shows the original picture of a cow on the left, and the version clustered by unsupervised learning on the right. The clustered version shows how similar areas have been grouped together, such as the shadow or the dark parts of the cow. On the other hand, the limitations of unsupervised algorithms also become clear in this picture: The grass, even though it is one coherent area in reality, is divided into multiple areas due to differences in some features such as color. Supervised learning and unsupervised learning are sometimes combined in order to increase learning accuracy by providing some fundamental knowledge about given data to the algorithm.

Finally, reinforcement learning describes a task where a certain goal needs to be reached while achieving the highest reward for that task. Based on a situation, suitable actions need to be found which have been discovered before based on a trial and error process. The algorithm is presented a set of states and actions in sequential order and allows the device running the algorithm to interact with its environment. Typically, a decision at an earlier point additionally influences the outcome at a later point in time. An example of reinforcement learning is playing a board game where the

(12)

3.3 Human Activity Recognition 3 BACKGROUND AND CONTEXT

Figure 2: Original image (left) compared to image clustered with unsuper-vised learning algorithms (right) [40].

reward is victory. For this purpose, the algorithm would play against another instance of itself for a large number of times, perhaps a million games. In case of winning a game, all moves that lead to victory will be attributed appropriately, even though some moves will be a poor choice. Due to the large number of games, strong moves will lead more often to victories and therefore receive higher scores on average. In reinforcement learning, a trade-off needs to be made between trying out new actions (exploration) and using actions that are known to lead to a high reward (exploitation).

Machine learning tasks can also be categorized based on the desired out-put. Classification tries to assign an input vector to a finite number of cate-gories. Regression on the contrary is concerned with assigning an output from a continuous space to an input vector. Classification and regression are typi-cally supervised machine learning problems. Thirdly, clustering describes the grouping of input vectors, where the groups are not known upfront. Density estimation is concerned with determining the distribution of data of the in-put vectors. Clustering and density estimation tasks are typically performed by unsupervised learning algorithms.

In this thesis, classification algorithms are used in order to recognize relevant events that smart objects are exposed to. Motion, position, and environmental sensors provide the necessary data to analyze whether a cer-tain event has occurred and custom software allows to take desired actions based on those events. As described above, determining events on the basis of rules, conditions, or threshold values is neither effective nor efficient, and was therefore eliminated as a possible approach.

3.3 Human Activity Recognition

One of the fields that motivated and inspired this project to great extent is the field of human activity recognition. Human activity recognition, as the

(13)

3.3 Human Activity Recognition 3 BACKGROUND AND CONTEXT

name implies, aims to recognize actions and goals of a person by monitor-ing the person’s actions and environment. Monitormonitor-ing is typically performed by dedicated sensors such as accelerometers, but different monitoring ap-proaches exist. Gu et al. [19] for example use ambient radio signals from a WiFi network to abstract footprints of different activities based on the signal strength. While recognition of simple activities has been researched widely, recognizing complex activities remains a challenging task. Some of the challenges are listed in the following paragraph.

Many activities can be performed in a concurrent way, such as listening to music while washing the dishes. An approach for sequential activities might not recognize these activities properly. Similarly, activities can be in-terleaved, e.g. when answering a phone call while cooking. Furthermore, activities can be interpreted differently in different situations. Walking for example is both part of vacuum-cleaning and mowing the lawn. Lastly, mul-tiple persons can perform activities in a coherent space. Here it is necessary to recognize these activities in parallel.

Kim et al. [31] list six modeling techniques that are considered as the standard approach for model based human activity recognition. The hidden Markov model (HMM) is a generative probabilistic model which is capable of generating hidden states from observed data. The main purpose of this model is to determine the sequence of hidden states that corresponds to a se-quence of observed data. Furthermore, the HMM is supposed to learn model parameters from previously observed data sequences. However, HMMs are subject to several limitations, such as the representation of multiple interact-ing activities and the limitations implied by strict independence assumptions on observed data. These independence assumptions require that the future state only depends on the current state and that the observable variable at a given time only depends on the corresponding hidden state. Additionally, an HMM needs a significant amount of training in order to consistently recognize all potential observation sequences for a given activity.

A conditional random field (CRF) provides a more flexible approach to activity recognition with regard to the non-deterministic nature of human ac-tivities, where some steps can be performed in varying order. Furthermore, it tackles some of the challenges that HMMs are facing. Like the HMM, the CRF is a generative probabilistic model, but additionally discriminative describing the dependence of a hidden variable on an observed variable. In contrast to the HMM, the CRF tries to find the conditional probability in-stead of the joint probability distribution. Additional flexibility is gained by allowing for arbitrary and non-independent relations between observed sequences and relaxation of the independence assumptions. While the CRF with its potential function provides an enhancement to HMMs, it is still

(14)

lim-3.3 Human Activity Recognition 3 BACKGROUND AND CONTEXT

ited to sequential activities. In order to account for far reaching (skip-chain) dependencies, a more sophisticated potential function is required.

The skip-chain conditional random field (SCCRF) extends the above men-tioned Linear-Chain CRF by using a more sophisticated potential function in order to allow capturing of far reaching dependencies. Essentially, the potential function of a SCCRF is the product of multiple Linear Chains, thus making larger distances between variables possible. By doing so it fa-cilitates modeling of complex activities that are performed in a concurrent way or have interleaved sub-activities. As to be expected, this more complex variant of the CRF comes with the trade-off of being computationally more expensive.

An emerging pattern (EP) captures significant variations between datasets and is defined as item sets where the support in dataset D2 is higher than

in dataset D1. The support is calculated as the quotient of the number of

instances containing a given attribute in a dataset and the total number of instances in the dataset. If the quotient of the support in D2 and the support

in D1 is higher than a given threshold value, an EP is present. An exemplary

area of application of EPs can be found in the classification of edibility of mushrooms. Odorless mushrooms with large gills and a single ring have been found to be edible in 63.9% of the cases, and similarly poisonousness could be classified based on different attributes for 81.4% of the mushrooms sharing these attributes [11]. A major advantage of an EP over a CRF is that an EP does not need to have the model in the training dataset, since it is unlikely to have all possible variations of an interleaved or concurrent activity in the training data and a training dataset for complex activities is already large by itself [18].

While the previous four models mostly use supervised approaches (EP partially supports unsupervised techniques for labelling), Kim et al. list two other mostly unsupervised approaches to discover patterns in activities. Unsupervised approaches do not only allow to track pre-selected activities, but also allow to gain insights into other habits of the humans that are observed. Furthermore, automatically discovered activities do not need to be trained, reducing the amount of preliminary required work.

Topic model based daily routine discovery uses a set of low-level activities such as walking and sitting in order to build a hierarchical activity model. Low-level activities are recognized by means of a supervised learning algo-rithm, while high-level activities are built from combinations of low-level ac-tivities. Activity patterns are then recognized similarly as in a bag-of-words model that is used to discover topics in a document.

Activity data pattern discovery recognizes activities from combinations of postures which can be extracted from video data. These postures can then

(15)

3.4 WEKA 3 BACKGROUND AND CONTEXT

be used as an alphabet for a probabilistic context-free grammar, which in turn can then be used to represent activities. Rules can then be extracted from sequences of postures and diverse combinations, and when combining a recognized activity such as kicking with a recognized object such as a soccer ball, even more specific activities (in this case ”playing soccer”) can be recognized.

3.4 WEKA

WEKA is a data mining and machine learning open source Java software suite that had its inception in 1992. Providing access to state-of-the-art machine learning algorithms and data preprocessing tools, it is enjoying great pop-ularity among academia and industry and has a thriving community which uses and constantly extends the project. WEKA incorporates a modular and extensible architecture in order to allow for quick integration by use of a simple Application Programming Interface (API) as well as plug-in and integration automation mechanisms. It provides a number of graphical user interfaces which give access to the available functionality [21]. A time series analysis and prediction environment exists, allowing for modelling and anal-ysis of time-dependent data points. By default, the project provides access to a large amount of machine learning classifier algorithms1 _{and therefore}

provides a sophisticated basis for classification tasks in this project. The algorithms can be further adjusted to the user’s needs by providing custom options. Only a subset of these algorithms was selected for classification tasks in the context of this thesis. Some were not capable of handling data in the chosen format or yielded obviously erroneous results. A brief description of the algorithms is given below on the basis of the WEKA software companion book [22] and information in the WEKA API documentation [42].

BayesNet Bayes Network learning which uses various search algorithms

and quality measures.

HoeffdingTree An incremental, anytime decision tree algorithm which

uses the Hoeffding bound in order to estimate the number of examples needed for a desired quality of an attribute [27].

IBk A k-nearest neighbour classifier [1].

(16)

KStar An instance-based classifier that uses an entropy-based distance function in order to classify a test instance on the basis of a training in-stance [9].

LMT Builds classification trees with logistic regression functions at the leaves [35].

LWL An instance-based learning algorithm which uses locally weighted

learning [14].

Logistic Algorithm for building a classifier using a multinomial logistic regression model with a ridge estimator. Modifications were made compared to the original paper of le Cessie and van Houwelingen [36].

LogitBoost Performs classification with additive logistic regression using a regression scheme [17].

MultilayerPerceptron Builds an artificial neural network using

back-propagation in order to classify test examples.

NaiveBayes Classifier that applies the Bayes’ theorem to assign instances to the class with the highest probability. [29].

RandomForest Combines randomly grown tree predictors to vote for a

class [7].

SMO A support vector classifier that implements the sequential minimal optimization algorithm by John Platt [44].

SimpleLogistic Builds a linear logistic regression model [35, 50]. J48 A Java implementation of the C4.5 decision tree algorithm [46]. PART In multiple iterations, a partial C4.5 decision tree is built. After-wards, the “best” leaf is converted into a rule and added to a rule set [15].

RandomCommittee Classifies based on the average classification of an

(17)

RandomSubSpace This decision tree classifier is built by generating mul-tiple trees from a subspace of the feature vector [25].

AdaBoostM1 The AdaBoost M1 method uses a second learning

algo-rithm, the weak learning algoalgo-rithm, and calls it repeatedly with the goal to find a hypothesis where the training error is minimal [16].

ClassificationViaRegression As the name implies, this algorithm

per-forms classification by regression. This is done by building a regression model for each class value [13].

DecisionTable This algorithm classifies with decision tables as a hypoth-esis space [32].

OneR The OneR classifier uses the minimum-error attribute in order to

predict the class value. This means that the classification is done on the basis of a single attribute, and can therefore be described as one-level decision tree classification [26].

RandomTree A decision tree algorithm that regards K attributes

ran-domly per node.

REPTree A decision tree algorithm that builds multiple regression trees and selects the one which performs best. Pruning, i.e. reducing the size of de-cision trees by removing segments that have minimal impact on classification results, is performed with the mean square error as the criterion.

Bagging Bagging is a meta-algorithm that produces an aggregated predic-tor by generating multiple versions of a predicpredic-tor. Classification is done by plurality voting [6].

JRip A propositional rule learner that implements the Repeated Incremen-tal Pruning to Produce Error Reduction (RIPPER) algorithm by William W. Cohen [10].

The following section now introduces the prototype of the IoT frame-work that was developed over the course of this thesis. In this prototype a smart phone is utilized to collect and preprocess environmental data in order to detect previously learned events with machine learning techniques. For

(18)

detection, or classification in machine learning terms, the data is sent to a different system via an Internet connection and then analyzed.

(19)

4 RESEARCH METHOD

4 Research Method

This chapter covers the setup of the developed prototype as depicted on Figure 3. Starting with collection and pre-processing of event data with the device on the left, information is sent via an Internet connection to an API endpoint, saving the information to a server’s database. After a sufficient amount of training data has been recorded, a custom software using the WEKA API is capable of building and validating a general model of the event. Future incoming data can then be analyzed whether it represents a meaningful event. Database API Internet Connection Phone Server

Figure 3: Overview of the Experiment Setup.

4.1 Data Sources

In order to build a general model of an event, an Android based smart phone in combination with a mobile app was chosen to collect relevant data for training and later on classification purposes. Modern phones are equipped with a wide range of sensors and provide a cost efficient, highly available, and extensively documented framework for the aforementioned task. Eleven sensors have been identified as possibly interesting for this work, describing motion, position, and environmental properties in twelve variants. A brief overview over those sensors is given in the following section based on the An-droid API documentation [2]. Unless stated differently, the chosen frequency resulted in one sample per approximately 150 milliseconds on the testing device, a Samsung Galaxy S4.

Accelerometer The accelerometer provides data about acceleration and

gravity forces in meter per square second (m/s2_{) applied to the three physical}

(20)

4.1 Data Sources 4 RESEARCH METHOD

Microphone Raw microphone data is being pre-processed by calculating

the sound pressure level in decibels (dB) of 100 millisecond audio segments with equation (1), where Lp is the sound pressure in decibel, p the root mean

square sound pressure, and p0 the reference sound pressure. The commonly

used reference sound pressure in air of p0 = 20µPa was used as a constant [48].

The sound pressure is measured in micropascal (µPa). Lp = 20log10

p p0

(1)

Humidity Sensor The humidity sensor measures the relative ambient

hu-midity in percent (%).

Light Sensor The light sensor provides the ambient illumination in lux (lx).

Global Positioning System (GPS), Cellular, and Wi-Fi Network Data Location data is acquired by use of the FusedLocationProvider API which is part of the Google Play services client library. The Android de-velopment guide suggests the usage of this API over the standard Android framework location API, since it provides more accurate data while also being more battery efficient [2]. The data is represented as a latitude and longi-tude in decimal degrees (◦), and the accuracy of the location in meters (m). The frequency of reporting location data is set to be triggered approximately every 5 seconds, but can be as frequent as once every second. Since location services are very battery draining, some mechanisms have been added to im-prove battery life by turning off location requests if the phone is not exposed to any significant motion. The ActivityRecognition API, which is also part of the Google Play services client library, can identify several activities as well as a “still” state. If the still state is identified several times in a row, pe-riodic location requests will be turned off. In order to turn location requests on again, the phone needs to be exposed to significant motion again. This is realized by use of Android’s Significant Motion Sensor.

Magnetism Sensor The magnetism sensor measures the ambient

geomag-netic field for the three physical device axes (x, y, and z) in microtesla (µT), and is typically used to detect the cardinal direction relative to the device. Pressure Sensor The pressure sensor provides data about the ambient air pressure in hectopascal or millibar (hPa or mbar, respectively).

(21)

4.1 Data Sources 4 RESEARCH METHOD

Proximity Sensor In order to get data about the distance of the device to an object in front of it, the proximity sensor is being used. It measures the distance in centimeter (cm) between the view screen and any object in front of it. The sensor of the used device is limited to only detecting if an object is closer than 8.00 cm to the device and therefore always shows either 0.00 cm or 8.00 cm.

Gyroscope The gyroscope is used to calculate the rotation vector compo-nents of the device among the three physical axes (x, y, and z), and thereby provides the phone’s orientation in space.

Temperature Sensor The temperature measures the ambient room

tem-perature in degrees Celsius (◦C).

Camera The camera provides the most complex data and is therefore also pre-processed into a more abstract format. Raw Y0CBCR image data is split

into the the three components describing luma (Y0) and the two chromi-nance (CB and CR) planes. A difference in percent (%) between the prior

and the current image is then calculated by dividing the image in squares of 100 pixels, calculating the average color intensity per square, and calculating the absolute difference between the average square intensity of the previous image with the current image. The result of this operation is a percentage indicating the difference per component. This data is then weighted with a factor of 0.8∗∆Y0+0.1∗∆_C

B+0.1∗∆CR, where ∆X describes the difference in

percent per component. Weights were chosen based on the importance of the components [54], but have not been analyzed for optimization opportunities. Due to performance concerns, frequency of image data processing has been limited to two frames per second.

Each of the sources provide data which is relatively simple to analyze and process for computers and machine learning algorithms, since the data is represented as double precision floating point values. However, especially microphone and camera data can be used in a more valuable manner by utiliz-ing machine learnutiliz-ing algorithms dedicated to these sources, and is a subject to be considered in future work. Adding a new data source is unproblematic, as long as the new data source provides sequential data as double precision floating point values.

Some of the sensors provide greater value to demonstrate the concepts of this thesis, such as the accelerometer, since they are applicable in a broader set of scenarios without having the necessity for complex framework

(22)

condi-4.2 Capturing a Training Record 4 RESEARCH METHOD

tions. For example, in contrast to the accelerometer, the humidity sensor was not used, since no environment with a controlled humidity was available in the scope of this project. Real world applications do however exist, such as in an emergency system for a greenhouse in order monitor the optimality of conditions.

4.2 Capturing a Training Record

In order to build a general model of an event, a set of training records is necessary as a source to derive it. The data mining application running on an Android phone can be configured by providing an event description, a set of data sources to be used (i.e. the sensors), and a target database where the records will be stored for further analysis. The user can then select if the event to be recorded is part of training data or not, and start and stop recording the event by simply pressing a button. For the duration of the record, the selected sources provide data in the frequency as described in section 4.1. A confirmation dialog is shown after the record finished where the user can indicate whether the record was successful or not. If the user confirms, the record will be sent to the selected database via a database API. From then on it can be used for training purposes or evaluated whether it represents an instance of a previously generated event model.

4.3 Building an Event Model

In order to build a model, the Java machine learning library WEKA [21] was utilized for the reasons provided in section 3.4. In this context, two approaches have been identified as possibly valuable.

4.3.1 Time Series Based Forecasting

In the first approach, a time series based forecasting as depicted in Figure 4, was applied by concatenating all available training records (in the given ex-ample frame t1, t2, and t3) and then predicting an amount of data points

which is equal to the average amount of data points per record, therefore the following frame (tp in the given example). The result of this approach was a

series of data points which was visually clearly related to the frames it origi-nated from. However, analysis and comparison of this data series with other records resulted in additional machine learning problems regarding sequen-tial data. These problems can be visualized as the problems of comparing two graphs with each other and deciding whether they are similar enough to represent the same event. This naive approach was therefore discarded.

(23)

4.3 Building an Event Model 4 RESEARCH METHOD

t2

t1 t3 tp time

data

Figure 4: Concept of Time Series Based Forecasting.

4.3.2 Feature Extraction

In the second approach, features were extracted from sensor data in order to build a classifier for events. This is a common technique used in machine learning and its usefulness has been shown in related work. Bao and Intille [4] as well as Ravi et al. [47] used this approach for human activity recognition. The features mean, standard deviation, and correlation are calculated for each available data source. Correlation is however limited to the existence of at least two data series per data source (e.g. x-axis and y-axis for accelerometer data). If this is the case, the correlation is calculated as the ratio of the covariance and the product of the standard deviation as in equation (2).

corr(x, y) = cov(x, y) σxσy

(2) In this section, the general idea behind the research method was provided to the reader. The next section describes the conducted experiments and the results.

(24)

5 RESEARCH AND RESULTS

5 Research and Results

In this section the results of the research are presented to the reader. Two experiments were conducted and evaluated in a multiclass classification and a one-class classification approach. In multiclass classification, input data is mapped to one output value from a set of choices, whereas in one-class classification input values are mapped to be either a “target” or an “outlier”.

1 2 Figure 5: Walk 1 2 3 Figure 6: Swing 1 2 Figure 7: Wave

5.1 Experiments

For the experiments, three activities were selected as the meaningful event to detect, namely walking (Figure 5), a swing movement (Figure 6), and waving (Figure 7) with the recording device in the palm of the hand and the accelerometer selected as the data source. The swing movement was per-formed non-recurring, walking over a distance of approximately four meters, and waving with two repetitions from point 1 to point 2. Table 1 shows the number of recorded repetitions per activity. 25 WEKA classification algo-rithms were selected in order to build the desired classifiers. The recorded repetitions were divided into two partitions with the first one holding the first 80% and the second one holding the remaining 20% of the total repetitions. The classifiers were built with the first partition and afterwards validated with both partitions, while only the second partition was taken into account when calculating the accuracy. Harnessing a large amount of classification algorithms contributes to building a higher quality classifier, since the best one can be automatically selected from all available ones. The classification algorithms were run without setting any custom options.

(25)

5.2 Results 5 RESEARCH AND RESULTS

5.2 Results

While one-class classification performed rather poor, multiclass classification showed very good results in classification accuracy. The following section lists these results more in detail for the conducted experiments.

Activity Recorded Repetitions

Walk 101

Swing 103

Wave 107

Table 1: Overview of the Amount of Recorded Repetitions.

Activity Average Duration (in ms)

Walk 5086

Swing 3062

Wave 2568

Table 2: Average Durations per Activity.

5.2.1 Multiclass Classification

Table 3 shows the results for multiclass classification. Here, 24 algorithms were able to correctly classify all repetitions from the remaining 20% of the instances in the evaluation set correctly, the remaining one achieved an accu-racy of over 96%. All algorithms except for OneR were capable of classifying the training repetitions with 100% accuracy. OneR had two errors during classification of training repetitions, which can be explained relatively easy. Out of the features mean, standard deviation, and correlation, OneR only selects the feature which classifies the result most accurately. This means, that if for example solely mean classifies the meaningful event correctly in most of the cases, and the other two features perform worse, only mean will be used as the one rule to classify the meaningful event. In return, classifi-cation of training instances might be incorrect, since mean did not classify correctly in all cases.

In order to obtain results for a scenario that is closer to reality, a se-quence of activities was recorded in the next step. The sese-quence consisted of the activities “walk”, “swing”, and “wave” in this order. Afterwards, this sequence was divided into overlapping frames with an overlap of 66.66% per activity. The length of one frame was calculated by averaging the length of

(26)

WEKA Algorithm Accuracy Errors

BayesNet 100.00% -NaiveBayes 100.00% -Logistic 100.00% -MultilayerPerceptron 100.00% -SMO 100.00% -SimpleLogistic 100.00% -IBk 100.00% -KStar 100.00% -LWL 100.00% -AdaBoostM1 100.00% -Bagging 100.00% -ClassificationViaRegression 100.00% -LogitBoost 100.00% -RandomCommittee 100.00% -RandomSubSpace 100.00% -DecisionTable 100.00% -JRip 100.00% -PART 100.00% -HoeffdingTree 100.00% -J48 100.00% -LMT 100.00% -REPTree 100.00% -RandomForest 100.00% -RandomTree 100.00%

-OneR 96.72% 2 “wave” class. as “swing”

Table 3: Classification Accuracy for Multiclass Classification.

all recorded training repetitions per activity and is shown in Table 2. Each classifier was then assigned to classify all frames. The results are shown in Figure 8. The x-axis describes the time relative to the stream data and the y-axis lists the used WEKA algorithms. Horizontally to the algorithm name, the activities that were classified are marked in the corresponding color. As one can see, most algorithms correctly identified the sequence of “walk”, “swing”, and finally “wave”.

5.2.2 One-class Classification

Table 4 shows the results for one-class classification per activity. In one-class classification, the classifier algorithms are indirectly used by the

(27)

OneClass-5.2 Results 5 RESEARCH AND RESULTS BayesNet NaiveBayes Logistic MultilayerPerceptron SMO SimpleLogistic IBk KStar LWL AdaBoostM1 Bagging ClassiﬁcationViaRegression LogitBoost RandomCommitte RandomSubspace DecisionTable JRip PART HoeﬀdingTree J48 LMT RepTree RandomForest RandomTree OneR time (in ms) WEKA Algorithm 2000 4000 6000 8000 10.000 0

walk swing wave

Figure 8: Multiclass Activity Classification per Frame.

Classifier WEKA class. The OneClassClassifier generates an artificial “out-lier” dataset on the basis of the provided “target” dataset, therefore trans-forming the problem into a two-class classification problem, and then builds the classifier on this basis [24]. Overall, the accuracy of one-class classifi-cation for the evaluation data is lower compared to multiclass classificlassifi-cation. However, some classifiers still show promising results. This changes drasti-cally for the sequence of activities as described in the last section. Figure 9 shows the classification of the single frames from the sequence of activities. The sequence of “walk”, “swing”, and finally “wave” does not appear as clearly as it does for multiclass classification and some of the classifiers do not recognize a single activity in the sequence.

(28)

5.2 Results 5 RESEARCH AND RESULTS BayesNet NaiveBayes Logistic MultilayerPerceptron SMO SimpleLogistic IBk KStar LWL AdaBoostM1 AttributeSelectedClassifier ClassificationViaRegression FilteredClassifier IterativeClassifierOptimizer LogitBoost MultiClassClassifier RandomSubSpace RandomizableFilteredClassifier DecisionTable OneR PART HoeffdingTree J48 LMT RandomForest RandomTree Bagging RandomCommittee REPTree JRip time (in ms) WEKA Algorithm 2000 4000 6000 8000 10.000 0

walk swing wave

(29)

Accuracy

WEKA Algorithm “walk” “swing” “wave”

KStar 100.00% 100.00% 100.00% RandomCommittee 100.00% 100.00% 100.00% RandomTree 100.00% 100.00% 100.00% ClassificationViaRegression 92.50% 100.00% 100.00% PART 100.00% 96.34% 95.29% J48 100.00% 96.34% 95.29% JRip 88.75% 96.34% 100.00% REPTree 100.00% 87.80% 60.00% NaiveBayes 96.25% 97.56% 97.65% Logistic 95.00% 97.56% 97.65% MultilayerPerceptron 97.56% 97.56% 96.47% IBk 95.00% 97.56% 97.65% LogitBoost 96.25% 97.56% 96.47% RandomSubSpace 96.25% 97.56% 96.47% HoeffdingTree 96.25% 97.56% 97.65% BayesNet 93.75% 96.34% 95.29% SimpleLogistic 95.00% 96.34% 97.65% LWL 95.00% 96.34% 96.47% AdaBoostM1 95.00% 96.34% 96.47% DecisionTable 93.75% 96.34% 95.29% LMT 92.50% 96.34% 97.65% RandomForest 95.00% 96.34% 97.65% Bagging 95.00% 95.12% 96.47% SMO 80.00% 86.59% 92.94% OneR 78.75% 74.39% 69.41%

(30)

6 ANALYSIS AND CONCLUSIONS

6 Analysis and Conclusions

In the last chapter, the results of the conducted experiments are analyzed and conclusions are drawn. Furthermore, the research questions from Chap-ter 1.1 are answered based on this project’s findings.

Two approaches to recognize events in the context of real-world objects have been outlined for the reader, namely multiclass classification and one-class one-classification. Multione-class one-classification achieved high accuracy in deter-mining an activity both for evaluation records and the sequence of activities. All classifier algorithms that were selected in this thesis seem to be well suited for the given task with only minor differences in accuracy. The conducted experiments during this work demonstrate the suitability of machine learn-ing techniques in order to classify frames of a sequence of activities from several well-defined previously known activities. Features of sensor data pro-vide an abstraction level where machine learning approaches can be applied relatively easy and a necessary level of uniqueness between different events is maintained.

In the second approach, one-class classification was examined for its suit-ability to detect events in the context of real-world objects. Even though it performed well on the evaluation data, performance for a sequence of activ-ities was poor. None of the classification algorithms that were analyzed in this thesis were able to detect the correct sequence of activities. It has been shown that one-class classification is not a reliable method in order to detect events in the context of objects from the real world. A probable explana-tion is the fact that the classifiers were build with only “target” instances which makes it very difficult for an algorithm to derive the exact boundaries between “target” and “outlier”.

To summarize the findings of this thesis, various machine learning algo-rithms were found to be suitable to detect events from a well defined set of events. However, if only a single activity or event is well defined, these machine learning techniques have not proven themselves to be suitable.

6.1 Evidence

In order to cater for a certain quality of the research that was conducted over the course of the project, various measures have been taken. A machine learning software suite that is as widely used as WEKA and written in a stable programming language as Java has been selected to ensure a certain level of quality of the classifiers that have been used during this project. An independent implementation of all 25 used classifiers would not have been

(31)

6.2 Threats to Validity 6 ANALYSIS AND CONCLUSIONS

feasible for the time frame of this project and would have most likely resulted in lower quality of the algorithms. Due to the research in the project being closely related to activity recognition, techniques from that field could be reused and adjusted where necessary, building a strong scientific foundation for this project.

Along with these points that provide certain prerequisites for the project, results were validated rigorously to ensure that they are reproducible. In order to counteract random effects that could be caused by selecting the most optimal instances for training and evaluation of the classifier, a randomized cross-validation approach with 100 iterations has been applied to the training and evaluation process of the classifiers. For this, the total amount of event instances were shuffled and then partitioned into training and and evaluation instances. Table 5 shows the results of the cross-validation. Compared to the data from Table 3, a minor deterioration of the classifier quality is observable. This deterioration should be negligible in real-world scenarios, but gives some additional insight into the quality of the given classifiers for given task. As seen earlier, OneR is the least accurate classifier in the given list due to its simplicity.

Another point that indicates validity of this project is the fact that all classifiers led to similar accurate results for classification. With the accuracy being higher than 96% for all classifiers in multiclass classification, it is likely that classification will achieve good results for real-world problems. The classification results for the experiment where activities were performed in a continuous sequence showed that the approach identifies events in a realistic scenario correctly most of the time.

6.2 Threats to Validity

In the light of the results of this thesis some threats to validity need to be mentioned. The first point to take into account is the fact that the activities were performed by a single person and recorded under laboratory-like circumstances. For the initial experiments, the recording device was held in the same orientation while recording all of the activities. It remains to be explored whether training the classifier with the recording device held in different positions only requires more training repetitions to work properly, or if this results in losing the necessary uniqueness between different events. Secondly, confusion for similar events needs to be explored. If two events expose similar patterns to the environment, either because the activities are related (e.g. “walking” and “running”) or if it is just by coincidence, these events might be misinterpreted as an instance of the other event.

(32)

6.2 Threats to Validity 6 ANALYSIS AND CONCLUSIONS

WEKA Algorithm Accuracy

BayesNet 100.00% ClassificationViaRegression 100.00% HoeffdingTree 100.00% IBk 100.00% KStar 100.00% LMT 100.00% LogitBoost 100.00% MultilayerPerceptron 100.00% NaiveBayes 100.00% RandomForest 100.00% SMO 100.00% SimpleLogistic 100.00% AdaBoostM1 99.98% RandomCommittee 99.97% RandomSubSpace 99.95% LWL 99.92% RandomTree 99.90% Bagging 99.67% Logistic 99.57% REPTree 99.56% J48 99.31% PART 99.31% JRip 99.10% DecisionTable 98.16% OneR 97.74%

Table 5: Classification Accuracy for Multiclass Classification with Random-ization of Instances.

use of microphone and camera data, the usefulness of the framework pre-sented in this project for the possible applications has been shown to the reader.

(33)

References

[1] D. Aha and D. Kibler. Instance-Based Learning Algorithms. Machine Learning, 6:37–66, 1991.

[2] Android. API Guide. http://developer.android.com/guide/index. html. Online - Accessed: 2015-06-23.

[3] Aamir Nizam Ansari, Mohamed Sedky, Neelam Sharma, and Anurag Tyagi. An internet of things approach for motion detection using Rasp-berry Pi. In Intelligent Computing and Internet of Things (ICIT), 2014 International Conference on, pages 131–134. IEEE, 2015.

[4] Ling Bao and Stephen S. Intille. Activity Recognition from

User-Annotated Acceleration Data. In Pervasive Computing, volume 3001, pages 1–17. Springer, April 2004.

[5] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[6] Leo Breiman. Bagging Predictors. Machine Learning, 24(2):123–140, 1996.

[7] Leo Breiman. Random Forests. Machine Learning, 45(1):5–32, 2001. [8] J. B. J. Bussmann, W. L. J. Martens, J. H. M. Tulen, F. C. Schasfoort,

H. J. G. van den Berg-Emons, and H. J. Stam. Measuring daily behav-ior using ambulatory accelerometry: The Activity Monitor. Behavbehav-ior Research Methods, Instruments, & Computers, 33(3):349–356, August 2001.

[9] John G. Cleary and Leonard E. Trigg. K*: An Instance-based Learner Using an Entropic Distance Measure. In 12th International Conference on Machine Learning, pages 108–114, 1995.

[10] William W. Cohen. Fast Effective Rule Induction. In Twelfth Interna-tional Conference on Machine Learning, pages 115–123. Morgan Kauf-mann, 1995.

[11] Guozhu Dong and Jinyan Li. Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43–52. ACM, 1999.

(34)

[12] Raspberry Pi Foundation. Homepage. https://www.raspberrypi. org/. Online - Accessed: 2015-07-21.

[13] E. Frank, Y. Wang, S. Inglis, G. Holmes, and I.H. Witten. Using Model Trees for Classification. Machine Learning, 32(1):63–76, 1998.

[14] Eibe Frank, Mark Hall, and Bernhard Pfahringer. Locally Weighted Naive Bayes. In 19th Conference in Uncertainty in Artificial Intelligence, pages 249–256. Morgan Kaufmann, 2003.

[15] Eibe Frank and Ian H. Witten. Generating Accurate Rule Sets With-out Global Optimization. In J. Shavlik, editor, Fifteenth International Conference on Machine Learning, pages 144–151. Morgan Kaufmann, 1998.

[16] Yoav Freund and Robert E. Schapire. Experiments with a New Boosting Algorithm. In Thirteenth International Conference on Machine Learn-ing, pages 148–156, San Francisco, 1996. Morgan Kaufmann.

[17] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive Logis-tic Regression: A StatisLogis-tical View of Boosting. The Annals of StatisLogis-tics, 28(2):337–407, 04 2000.

[18] Tao Gu, Zhanqing Wu, Xianping Tao, Hung Keng Pung, and Jian Lu. epsicar: An Emerging Patterns based Approach to Sequential, Inter-leaved and Concurrent Activity Recognition. In IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 1–9. IEEE, 2009.

[19] Yu Gu, Lianghu Quan, and Fuji Ren. WiFi-Assisted Human Activity Recognition. In 2014 IEEE Asia Pacific Conference on Wireless and Mobile, pages 60–65. IEEE, 2014.

[20] Thiago S. Guzella and Walmir M. Caminhas. A review of machine

learning approaches to Spam filtering. Expert Systems with Applications, 36(7):10206–10222, 2009.

[21] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10–18, 2009.

[22] Mark Hall, Ian Witten, and Eibe Frank. Data Mining: Practical Ma-chine Learning Tools and Techniques, volume 3. Morgan Kaufmann Publishers, 2011.

(35)

[23] Juho Hamari, Mimmi Sj¨oklint, and Antti Ukkonen. The Sharing Econ-omy: Why People Participate in Collaborative Consumption. Journal of the Association for Information Science and Technology, 2015. (Forth-coming 2015).

[24] Kathryn Hempstalk, Eibe Frank, and Ian H. Witten. One-Class Clas-sification by Combining Density and Class Probability Estimation. In Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, volume 5211 of Lecture Notes in Computer Science, pages 505–519. Springer, September 2008.

[25] Tin Kam Ho. The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 20(8):832–844, 1998.

[26] R.C. Holte. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning, 11:63–91, 1993.

[27] Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining

Time-Changing Data Streams. In Knowledge discovery and data mining: Pro-ceedings of the seventh ACM SIGKDD international conference, pages 97–106. ACM, 2001.

[28] Adafruit Industries. Homepage. http://www.adafruit.com/. Online -Accessed: 2015-07-21.

[29] George H. John and Pat Langley. Estimating Continuous Distributions in Bayesian Classifiers. In Eleventh Conference on Uncertainty in Arti-ficial Intelligence, pages 338–345, San Mateo, 1995. Morgan Kaufmann. [30] Aditya Khosla, Yu Cao, Cliff Chiung-Yu Lin, Hsu-Kuang Chiu, Jun-ling Hu, and Honglak Lee. An Integrated Machine Learning Approach to Stroke Prediction. In Proceedings of the 16th ACM SIGKDD in-ternational conference on Knowledge discovery and data mining, pages 183–192. ACM, 2010.

[31] Eunju Kim, Sumi Helal, and Diane Cook. Human Activity Recognition and Pattern Discovery. IEEE Pervasive Computing, 9(1):48–53, 2010. [32] Ron Kohavi. The Power of Decision Tables. In 8th European Conference

(36)

[33] Hermann Kopetz. Internet of Things. In Real-Time Systems: De-sign Principles for Distributed Embedded Applications, pages 307–323. Springer, 2011.

[34] Jennifer R. Kwapisz, Gary M. Weiss, and Samuel A. Moore. Activity Recognition using Cell Phone Accelerometers. ACM SIGKDD Explo-rations Newsletter, 12(2):74–82, December 2010.

[35] Niels Landwehr, Mark Hall, and Eibe Frank. Logistic model trees. Ma-chine Learning, 95(1–2):161–205, 2005.

[36] S. le Cessie and J.C. van Houwelingen. Ridge Estimators in Logistic Regression. Applied Statistics, 41(1):191–201, 1992.

[37] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backprop-agation Applied to Handwritten Zip Code Recognition. Neural compu-tation, 1(4):541–551, 1989.

[38] Seon-Woo Lee and Kenji Mase. Activity and Location Recognition Using Wearable Sensors. IEEE Pervasive Computing, 1(3):24–32, July 2002. [39] Friedemann Mattern and Christian Floerkemeier. From the Internet of

Computers to the Internet of Things. In From Active Data Management to Event-Based Systems and More, pages 242–259. Springer, 2010. [40] Javier B´ejar (Technical University of Catalonia). Unsupervised Learning

(Examples). _{http://www.cs.upc.edu/~bejar/apren/docum/trans/}

09-clusterej-eng.pdf. Online - Accessed: 2016-01-02.

[41] Internet of Things International Conference for Industry and

Zurich Academia March 26-28, 2008. Homepage. http://www.

iot-conference.org/iot2008/. Online - Accessed: 2015-07-21. [42] University of Waikato. WEKA API Documentation. http://weka.

sourceforge.net/doc.dev/. Online - Accessed: 2015-08-18.

[43] Pentaho. WEKA Community Documentation. http://wiki.pentaho. com/display/datamining/classifiers. Online - Accessed: 2015-08-17.

[44] J. Platt. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.

(37)

[45] Michael E. Porter and James E. Heppelmann. How Smart, Connected Products Are Transforming Competition. Harvard Business Review, 92(11):11–64, 2014.

[46] J. Ross Quinlan. C4.5: Programs for Machine Learning. Elsevier, 2014. [47] Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L. Littman. Activity Recognition from Accelerometer Data. IAAI’05 Pro-ceedings of the 17th Conference on Innovative Applications of Artificial Intelligence, 3:1541–1546, 2005.

[48] Ross J. Roeser, Michael Valente, and Holly Hostford-Dunn. Audiology: Diagnosis, Band 1, volume 1. Thieme, 2007.

[49] Salvatore Stolfo, David W Fan, Wenke Lee, Andreas Prodromidis, and P Chan. Credit Card Fraud Detection Using Meta-Learning: Issues

and Initial Results. In AAAI-97 Workshop on Fraud Detection and

Risk Management, 1997.

[50] Marc Sumner, Eibe Frank, and Mark Hall. Speeding up Logistic Model Tree Induction. In 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 675–683. Springer, 2005. [51] Vladimir Vujovi´c and Mirjana Maksimovi´c. Raspberry Pi as a

Wire-less Sensor Node: Performances and Constraints. In 37th International Convention on Information and Communication Technology, Electron-ics and MicroelectronElectron-ics (MIPRO), pages 1013–1018. IEEE, 2014. [52] Vladimir Vujovi´c and Mirjana Maksimovi´c. Raspberry Pi as a Sensor

Web node for home automation. Computers & Electrical Engineering, 2015.

[53] Rolf H. Weber and Romana Weber. Internet of Things. Springer, 2010. [54] Stefan Winkler, Murat Kunt, and Christian J. van den Branden Lam-brecht. Vision and Video: Models and Applications. In Vision Mod-els and Applications to Image and Video Processing, pages 201–229. Springer, 2001.

[55] Felix Wortmann and Kristina Fl¨uchter. Internet of Things: Technol-ogy and Value Added. Business & Information Systems Engineering, 57(3):221–224, 2015.

(38)

A

Appendix

A.1 Hardware

In this section, an exemplary hardware setup is described. The Raspberry 2 Model B, a credit card-sized single board computer, serves as a basis for the smart object. A 900 Megahertz quad-core processor combined with 1 Gi-gabyte of internal memory provides sufficient performance to capture sensor data and for some basic real time processing activities. In earlier work, the Raspberry Pi has proven itself to be an adequate solution for IoT technol-ogy [3, 51, 52]. The purpose of this exemplary setup is not to list the ideal components, but rather to provide the reader with an overview of prices and available hardware. Component Price Raspberry 2 Model B 35.00 $ GPS Sensor 39.95 $ 3-Axis Accelerometer 14.95 $ Motion Sensor 9.95 $ Proximity Sensor 9.95 $ Force-Sensitive Resistor 7.00 $

Hall Effect Sensor + Magnet2 _{4.50 $}

Tilt Sensor 2.00 $

Analog Temperature Sensor 1.50 $

Light Sensor 0.95 $

Microphone 0.95 $

Vibration Sensor 0.95 $

Table 6: Exemplary Hardware Components and Prices [12, 28].

A.2 The Shareable Drill Project

During the first two month of my project, I worked on a concept that pro-vided the foundation for the research the was conducted over the course of this thesis: The Shareable Drill. The Shareable Drill is a concept which was originally envisioned by Peerby B.V. and contributes to the idea of a sharing economy. Instead of having an amount of individuals that own a certain product, people share access to a smart and ownerless product which moves

2_{A Hall effect sensor in combination with a magnet can be used to measure magnetic} fields.

(39)

around its consumers partially autonomously. The product knows whether it is being used, and if it is not being used for a while by the current in-dividual, it communicates with the community around it that it would like to be passed on the next person. By doing so, products that lie around unused most of the time can be used more efficiently. While the delivery of an actual hardware prototype, which was supposed to be developed by a third party, took longer than expected, I started to work with an Android based smart phone mimicking the drill and communicating with a server, sharing information about itself and its environment. The following series of pictures describes the functionality of the prototype system consisting of an Android application acting as the client software and a Node.js server managing registration, tracking, and exchange between users of the device.

Figure 10: Registering a new device on the server is simply done by providing a product name and a category.

(40)

Figure 11: The device was registered on server side. A QR code holds the information for the client.

Figure 12: The QR code can now be scanned on the client device to simplify the registration and configuration process.

(41)

Figure 13: The information from the QR code is displayed to confirm the correctness of the data. Pressing ”SETUP” configures the client appropri-ately.

(42)

Figure 14: The client has been configured successfully. A simple checkable box serves as a mockup in order to allow the user to decide whether the product is being used. The most recent state is shown on screen.

(43)

Figure 15: Information about the device such as the battery level as well as the current and past locations can be monitored in a browser window. Various parameters can be set to gain more insights about the product. Determining the location is a battery-heavy operation, but has been highly optimized so that the standby time of the phone has been increased from initially less than one day to now about five to seven days. While at first the phone would send an update periodically after a fixed amount of time (around every 30 seconds), it has been optimized later to use Android’s Significant Motion Sensor as well as Activity Recognition from the Google Play services client library API to decide autonomously when an update should be triggered, i.e. when the product is moving. Battery usage of the remaining sensors has been observed to be negligible.

(44)

Figure 16: When the device has not been used for a certain amount of time, it will announce that it is available for the next user and send the current user an email with a link where he or she can select when to product can be picked up.

Event Detection Using Machine Learning Classifiers in the Context of Real-World Objects