A hierarchical lazy smoking detection algorithm using smartwatch sensors

(1)

A Hierarchical Lazy Smoking Detection Algorithm

Using Smartwatch Sensors

Muhammad Shoaib, Hans Scholten, and Paul J.M. Havinga

Department of Computer Science

University of Twente, The Netherlands

Email: {m.shoaib,hans.scholten,p.j.m.havinga}@utwente.nl

Ozlem Durmaz Incel

Department of Computer Engineering, Galatasaray University, Istanbul, Turkey

Email: odincel@gsu.edu.tr

Abstract—Smoking is known to be one of the main causes for premature deaths. A reliable smoking detection method can enable applications for an insight into a user’s smoking behaviour and for use in smoking cessation programs. However, it is difficult to accurately detect smoking because it can be performed in various postures or in combination with other activities, it is less-repetitive, and it may be confused with other similar activities, such as drinking and eating. In this paper, we propose to use a two-layer hierarchical smoking detection algorithm (HLSDA) that uses a classifier at the first layer, followed by a lazy context-rule-based correction method that utilizes neighbouring segments to improve the detection. We evaluated our algorithm on a dataset of 45 hours collected over a three month period where 11 participants performed 17 hours (230 cigarettes) of smoking while sitting, standing, walking, and in a group conversation. The rest of 28 hours consists of other similar activities, such as eating, and drinking. We show that our algorithm improves recall as well as precision for smoking compared to a single layer classification approach. For smoking activity, we achieve an F-measure of 90-97% in person-dependent evaluations and 83-94% in person-independent evaluations. In most cases, our algorithm corrects up to 50% of the misclassified smoking segments. Our algorithm also improves the detection of eating and drinking in a similar way. We make our dataset and data logger publicly available for the reproducibility of our work.

I. INTRODUCTION

Smoking is one of the main causes for premature deaths [1]–[3]. The authors in [3], [4] argue that detecting various physical triggers (physical activities) can enable applications for effective interventions in achieving behavior change. Smok-ing is one such trigger and its reliable detection can enable applications for an insight into one’s smoking behavior as well as for automated self-reporting in smoking cessation programs [3], [4]. In such programs, self-reporting is an important aspect where smokers keep track of their cigarette consumption which can be used for evaluating the progress of the relevant behavior change. However, it causes an additional burden on the users to keep track of the smoking sessions [5]. Such self-reporting can be automated with the use of wearable sensors for recognizing smoking.

Smoking is a less-repetitive activity which can be per-formed in various postures (sitting and standing) and in combi-nation with other activities (while walking and conversing in a group). Moreover, it has similarities to other activities such as drinking and eating. These reasons make it difficult to detect. Its detection has been studied in various ways such as from computer vision to text-messaging, but the ubiquitous use of

smartphones and popularity of wearables such as smartwatches may offer a better alternative for smoking recognition [3]. In recent years, a few researchers have shown the potential of using wearable sensors for smoking detection [2], [3], [6], [7]. However, these studies have mainly focused on the puff (hand-to-mouth gesture) recognition within the smoking activity. Some of them have additionally used a second layer of decision making such as threshold-based rules [3] or condition random fields [2] to identify smoking sessions or episodes. Instead of a fine grain analysis of puffing or hand-to-mouth gestures, we consider smoking as a less-repetitive activity recognition problem. Our approach makes it easier to label the data because we only need to know the start and end of the smoking session rather than a fine grain gesture labeling which can be cumbersome. Not all these studies have considered the most commonly used smoking variations and similar activities whereas we consider these variations and similar activities in our 45-hour dataset as shown in Tables I and II.

We propose a two-layer hierarchical approach where first a classifier is used to detect smoking segments and then a lazy context-rule-based algorithm is applied to correct misclassified segments. We utilize the smoking context information in the second layer by inspecting the neighbouring segments. Our approach improves the smoking recognition performance (both recall and precision) as well as that of eating, and drinking. In most cases, it corrects up to 50% of the misclassified segments for these three activities, mainly for smoking and drinking. We use a two layer-approach, however, a simple clustering or threshold-based rule can be used on top to count the number of smoking sessions or cigarettes. We summarize our main contributions as follows:

• We propose to use a two-layer hierarchical lazy smoking detection algorithm (HLSDA) that outper-forms single-layer classification approach. Moreover, we evaluate various smoking variations together with similar activities that can be confused with smoking, such as drinking and eating.

• We developed a data logger for smartwatches and smartphones which collects various sensors data con-currently from these devices. Our 45-hour dataset contains 17 hours of smoking while walking, standing, sitting, and while conversing in a group collected over a three months’ period. As far as we know, this is the largest dataset compared with similar studies, as we discuss in Section II. For reproducibility, we make our data logger and dataset publicly available.

(2)

TABLE I. COMPARATIVEANALYSIS OFRELATED WORK ANDOURSTUDY

Studies Year Sensors Sensor positions Classifiers Performance (F-measure) Activities Performed Users Duration [7] 2012 A one wrist GMM (59%, PD) SSD, others 4 NP

[6] 2012 LA, G one wrist/foot SVM (95.6%, PID),(100%,PD) SSD, others 3 NP

[3] 2014 A (2) both wrists RF/THR (79%, PD) SSD, SST, SG, SW, SE, SD, SU P 6 11.5h

[2] 2014 A,G,M one wrist RF/ CRF (85.7%, PD) SSD, SG, SW, E, D, others 15 28h

[5] 2015 A,G, RIP both wrists,Chest SVM/ THR (90%, PD) SSD, SST, SG, SW 6 40h

Our study 2016 A,G one wrist HLSDA (90-97%, PD), (83-94%, PID) SSD, SST, SG, SW, E, D, W, SD, ST 11 45h Legend_{A:accelerometer, G:gyroscope, LA:linear acceleration sensor, M:magnetometer, GMM:Gaussian mixture model, SVM:support vector machine, RF:random forest,}

THR:threshold-based, CRF: conditional random field, PD:person-dependent, PID:person-independent, smoking while standing:SSD, while sitting:SST, while in a group:SG,

while walking:SW, while eating:SE, while drinking:SD, while using phone:SU P, SD:standing, ST:sitting, W:walking, E:eating, D:drinking, NP:not provided

The rest of the paper is organized as follows. We describe the related work in Section II, and data collection and its preprocessing in Section III. We present our algorithm in IV and performance evaluation in Section V. In Section VI, we present the conclusions, limitations, and future work.

II. RELATEDWORK

In recent years, there have been a few works in recog-nizing smoking activity and puffing (hand-to-mouth gestures). For example, the authors in [8] use a smart lighter and E-cigarette for smoking detection. Computer vision is used for smoking detection in [9], however, it requires the smoking areas to be under video surveillance. Some studies have used only respiratory inductive plethysmography (RIP) sensors for detecting smoking through deep inhalation and exhalation in the breathing patterns [10], [11]. However, RIP sensors can be obstructive at the chest position. With advances in wearable sensor technology, there has been a shift towards using smartwatches and wrist-bands for detecting smoking and other similar activities such as drinking and eating. In this section, we describe studies which mainly use wrist-worn sensors as summarized in Table I because these sensors are less obtrusive.

One of the first works on smoking detection is a feasibility study using a wrist-worn accelerometer [7]. The authors re-ported a user-specific recall of 70% and precision of 51% for smoking. Though they reported a low recognition performance, they highlighted interesting insights into recognizing smoking activity using wrist-worn sensors. This study only considered smoking while standing in its evaluations. Similarly, the au-thors in [6] use two SVM classifiers, one for first detecting a high-level activity in a window of sensor data and another for detecting micro-activities within the same window. They reported a very high detection rate for smoking detection. However, this study also only considered smoking while stand-ing and the amount of trainstand-ing and teststand-ing data is not provided. Moreover, besides wrist, this study uses data from sensors at foot position which can be a very obtrusive position. These two studies have not considered the confounding gestures such as smoking while walking, and smoking while talking in a group. Moreover, they do not consider the temporal information within the smoking activity which can be useful. In [3], the authors use two accelerometers at both wrists for smoking detection. Six participants performed smoking and other activities for a total of 11.8 hours (consisting of 34 smoking episodes or 481 puffs). They considered many variations of smoking such as smoking while sitting, standing, eating, walking, using a phone, and while talking in a group. They used a two-layer approach for smoking classification.

The output of random forest classifier is used to calculate puff-frequency for a recent window and then a threshold-based rule is applied to identify a smoking session. They reported an F-measure of 79% in person-dependent evaluations. However, their person-independent evaluation results were pretty low, mainly because the puff-frequency can be very different for different smokers or even for the same smoker in different situations so a simple threshold based method may not be effective.

The authors in [2] use quaternions, calculated from ac-celerometer, gyroscope and magnetometer at wrist, to recog-nize smoking puffs. Fifteen participants performed a total of 17 smoking, 10 eating, and 6 drinking sessions. The smoking sessions included smoking while standing alone, in a group, and smoking while walking. They achieved a precision of 0.91 and recall of 0.81 for recognizing smoking gestures and were able to recognize 15 out 17 sessions. They also applied their model on 4 users who wore 9-axis inertial sensors for 4 hours each on 3 days in the field. On this dataset, they reported a recall of 90% (27/30 sessions detected) and a false positive rate of 2/3 episodes per day (8 false sessions in 12 person days). However, they did not consider smoking while sitting which is an important variant of smoking. We observed that it is relatively difficult to recognize smoking while sitting than while standing because the former can be performed in various ways. The mentioned study also uses magnetometer which increases energy-consumption. Moreover, drinking sessions are very few (6) compared to 17 smoking sessions, which can influence classification results, especially when drinking is mainly confused with smoking gestures. One of the recent studies on smoking recognition and its use in smoking ces-sation study was conducted in 2015 [5]. They achieve a recall of 96.6%, and a precision of 87% for smoking. However, they achieved this performance by combining accelerometer, gyroscope from the wrist position and RIP sensor from the chest position which can be very obtrusive.

Our study advances the existing work and fills the gaps mentioned above. We focus on smoking as a less-repetitive activity recognition problem and propose a two-layer smoking detection algorithm which improves both recall as well as precision of smoking and other similar activities as discussed in Section V. As shown in II, we collected data for the most common forms of smoking such as smoking while sitting, standing, walking, and in a group for around 17 hours by 11 participants which is higher than all previous works. Additionally, we collected data for drinking while sitting and standing for almost the same amount of time. We need data for such similar activities because they are confused with smoking while sitting and standing. We use a smartwatch whereas prior work has mainly used dedicated sensors except one [5].

(3)

TABLE II. DETAILS ABOUTPARTICIPANTS AND OURCOLLECTEDDATASET

Activities PerformedLegend Duration of Each activity (minutes)

total duration for all

activities (minutes) Gender Height (cm)

Age (years)

Cigarette Usage Participant1 SST, SSD, DST, DSD, E, ST, SD, SG, SW, W 43 430 male 180 25 8-10 per day

Participant2 SST, SSD, DST, DSD, E, ST, SD, SG, SW, W 47 470 male 172 30 0-10 per week

Participant3 SST, SSD, DST, DSD, E, ST, SD, SG, SW, W 48 480 male 175 25 2-6 per day

Participant4 SST, SSD, DST, DSD, E, ST, SD, SG 37 296 male 156 28 0-10 per week

Participant5 SST, SSD, DST, DSD, E, ST, SD, SG 18 144 male 174 23 18-20 per day

Participant6 SST, SSD, DST, DSD, E, ST, SD, SG 20 160 Female 164 20 3-7 per week

Participant7 SST, SSD, DST, DSD, E, ST, SD, SG 16.8 134.4 male 181 20 9-11 per day

Participant8 SST, SSD, DST, DSD, E, ST, SD, SG 20 160 Female 172 29 4-6 per day

Participant9 SST, SSD, DST, DSD, E, ST, SD 24 168 male 167 35 0-10 per week

Participant10 SST, SSD, DST, DSD, E, ST, SD 19 133 male 181 27 7-12 per day

Participant11 SST, SSD, DST, DSD, E, ST, SD 18.6 130.2 male 170 45 15-20 per day Legend_{Smoking while standing (S}

SD), while sitting (SST), while in a group conversation (SG), while walking (SW), SD:standing, ST:sitting, W:walking, E:eating, D:drinking.

Though our algorithm outperforms the existing approaches as for as the motion sensors at one wrist is concerned, it is hard to to compare the recognition performance alone due to the different experimental setups, feature sets, set of activities, and participants involved.

III. DATACOLLECTION ANDPREPROCESSING We collected a dataset of 45 hours for smoking and other similar activities such as eating, and drinking coffee or tea. Out of these 45 hours, the smoking activity was performed for 16.86 hours in various forms such as smoking while sitting, standing, walking and in a group conversation. The details about the duration of each activity by all participants is presented in Table II. Each activity was performed multiple times by each participant on various days over a period of three months. Usually, the participants smoked 1-4 cigarettes (1 cigarette per session) in a day. In the meanwhile, they were also performing eating and drinking activities on different days according to their availability. Participant 1-4 performed at least 10 sessions for each activity whereas participants 5-11 performed at least 5 sessions for each of their activities. A brief description about these activities are as follows:

• Smoking: There were four variants of this activity: smoking while sitting, standing, walking, and smoking in a group. Smoking in a group was done while standing in a smoking area where the participants were involved in a conversation with other smokers. • Drinking: They had a cup of coffee or tea while

sitting in our office lounge or standing outdoor. In all sessions, it was a group activity where two or more people were sitting or standing and drinking coffee while involved in a usual group conversation. • Eating: During the eating activity, users had a cup of

soup in their natural style while sitting alone in a lunch place. The participants used a spoon for eating soup which was in a cup on a table.

• Walking: Those participants who performed smoking while walking, also did walking as a separate activity. • Sitting/Standing: Some of the participants performed

sitting and standing for a few minutes.

In total, eleven participants took part in the experiment which consists of both regular and occasional smokers. Out of these, two were female. All were healthy and were within an age

range of 20-45. They also signed a consent form before experiment. Their details are given in Table II. Each participant wore a smart-watch (LG Watch R, LG Watch Urbane, Sony Watch 3) on the right wrist and a smartphone in the right pocket as all participants were right-handed. Only one of the previous studies have used a smartwatch [5]. We collected data from multiple sensors from both smartwatch and smartphone, however, we only use accelerometer and gyroscope in this study. The data was collected at 50 samples per second from these sensors. For data collection, we developed our own Android application which can collect data from multiple sensors, both from the phone and smartwatch in real-time at a user-provided sampling rate. In order to label the start and end of the activities, participants were told to make a waving gesture. The dataset and data logger can be accessed at [12].

In the preprocessing phase, we extracted six time domain features from accelerometer and gyroscope. They are mean, standard deviation, minimum, maximum, kurtosis, and skew-ness of the sample. Skewskew-ness is a measure of the asymmetry of a sample around its mean and is defined in Equation (1). Kurtosis is the measure of whether the data in the sample is heavy-tailed or light-tailed compared to normal distribution. In other words, it is an indication of how outlier-prone a sample or distribution is and is defined in Equation (2). Both of these features measure the shape of a sample.

Skewness = E(x − µ) 3 σ3 (1) Kurtosis = E(x − µ) 4 σ4 (2)

where µ is the mean of x, σ is the standard deviation of x, and E(t) represents the expected value of the quantity t.

Though we extracted six features, we used only maximum, minimum, kurtosis, and skewness after our initial performance analysis because the set of these four performs the best in detecting smoking. They are extracted over multiple window sizes:10, 15, 20, 25, 30, 35, 40, 50, 60 seconds. However, we only choose a window of 30 seconds because it was one of the optimal window size in terms of achieved recognition performance for smoking detection. We have also observed the same in some of our previous works [1], [13].

IV. A HIERARCHICALLAZYSMOKINGDETECTION ALGORITHM(HLSDA)

We observe that smoking is a continuous activity which usually lasts for a few minutes. Based on this observation, we

(4)

assume that if there is a misclassified segment in a smoking activity, it can be corrected by inspecting the neighbouring segments. We define a misclassified segment as the one which has neighbours with the same labels on its left and right but its label is different from these neighbours. In our approach, we correct misclassified segments on top of a classifier by inspecting their neighbours. We propose a two layer approach as shown in the flowchart in Figure 1. On the first layer, a trained classifier classifies each segment and then in the second layer, a lazy correction method is executed. We use three simple sandwich rules in this correction method. In the first rule, we replace a single misclassified segment with its immediate neighbour if it is different than its two neighbours on its left and right and they are same. In the second rule, we replace two consecutive misclassified segments with their immediate neighbour if its different than its neighbours and there are at least three same neighbors on their left and right. In these two rules, we only perform replacing if the immediate neighbor is not sitting or standing. For correcting misclassified segments due to random hand gestures while sitting or standing, we use a third sandwich rule, where a sandwiched segment is replaced by the most common class (sitting or standing) of a moving window if there is only one instance of that segment in that window. We present these rules in Figure 2. Although HLDSA can be simply considered as smoothing, rather than using a standard filter or an algorithm, we propose a rule-based approach which stems from the characteristics of the smoking activity or any repetitive activity.

For example, if at specific time the classifier detects a segment n as drinking whereas the segments before (n−1,n−2 etc.) and after this specific segment (n + 1, n + 2, etc.) are all smoking then we replace this drinking segment with smoking. The same concept applies to other similar activities such as drinking and eating. Let us assume that predicted represents all predicted segments from starting point to n at any time and and predicted[n] represents the most recent predicted segment. If one of the conditions in Figure 2 is true, then we use the simple replace correction method to deal with misclassified segments. For the correction method to work effectively, it is important that the classifier in the first layer recognizes the activities with a reasonable accuracy such that at least better than a guess.

One of the limitations of our correction algorithm is that it may not perform well if smoking and drinking or eating are being performed at the same time with the same hand on which the sensor is used because we use a large window size of 30 seconds. This problem can be resolved in the training phase where such instances are labeled as a compound activity, such as smoking-eating or smoking-drinking. However, based on our observations and literature study [3], smokers usually use a different hand for drinking or eating while smoking. It does not affect the smoking gesture, but it increases the inter-puff time which is not a concern for our correction algorithm. To deal with larger inter-puff intervals, a clustering or simple threshold based algorithm can be employed on top of our correction method to identify smoking sessions. Based on 30 seconds window size, HLSDA can deal with an inter-puff time of less than 90 seconds. However, we need a third layer to identify smoking sessions if inter-puff time is more than 90 seconds. Alternatively, the window size can be further increased.

Fig. 1. Flowchart of our hierarchical classification process 1:predicted(n) ← F irstLayerClassif ier(n)

2:movingwindow ← predicted(n : n − winlimit) . winlimit can be adjusted based on window size

3:mostcommonclass ← mode(movingwindow)

4:if predicted(n) = predicted(n − 2) 6= sitstand and predicted(n) 6= predicted(n − 1) then

5: predicted(n − 1) ← predicted(n)

6:else if (predicted(n) = predicted(n − 1) = predicted(n − 4) or predicted(n − 1) = predicted(n − 4) = predicted(n − 5)) and predicted(n − 1) 6= predicted(n − 2) and predicted(n − 1) 6= sitstand then

7: predicted(n − 2) ← predicted(n − 1)

8: predicted(n − 3) ← predicted(n − 1)

9:else if predicted(n − 3) = smoke and Counter(smoke) = 1 then . Counter(activity): number of activity segments in current moving window

10: predicted(n − 3) ← mostcommonclass

11: else if predicted(n − 3) = drink and Counter(drink) = 1 then

13: else if predicted(n − 3) = eat and Counter(eat) = 1 then

Fig. 2. Rule-based Algorithm running at Level 2 in our classification process. predictedmeans predicted class of a segment by the first-layer classifier.

V. RESULTS ANDDISCUSSION

For performance analysis, we used Scikit-learn (version 0.17) which is a Python based machine learning toolkit [14]. We selected three classifiers which are commonly used for practical activity recognition: decision tree, random forest tree, and support vector machine. They are suitable for running on mobile phones with a reasonable recognition performance [15]. They are implemented on mobile phones in various studies; i.e. decision tree (DT) in [16], [17], SVM in [18] and random forest (RF) classifier in [2]. For the decision tree, Scikit-learn uses an optimized version of CART (Classification and Regression Trees) algorithm [14]. We use a balanced class weight option for all three classifiers which handles the class imbalance if any as shown below in Python code of our used models. m o d e l d e c i s i o n t r e e = D e c i s i o n T r e e C l a s s i f i e r ( r a n d o m s t a t e = 1 , c l a s s w e i g h t = ’ b a l a n c e d ’ ) modelsvm = svm . SVC ( C = 1 0 , c l a s s w e i g h t = ’ b a l a n c e d ’ , k e r n e l = ’ r b f ’ , gamma= 0 . 0 0 1 ) m o d e l R a n d o m F o r e s t = R a n d o m F o r e s t C l a s s i f i e r ( r a n d o m s t a t e = 1 , c l a s s w e i g h t = ’ b a l a n c e d ’ , n e s t i m a t o r s = 9 )

For person-dependent evaluations, we used 5-fold cross-validation where the whole dataset is divided into five equal parts or subsets. In each iteration, four of these parts are used for training purpose and one for testing. This process is repeated five times, thereby using all data for training as well as testing. For person-independent evaluation, we use leave one subject out method, where one subject is used for testing whereas the rest of them for training. In both these evaluations, we do not randomize the data and preserve its order. It is

(5)

TABLE III. SMOKINGEVALUATIONSCENARIOS

Scenario Participant Smoking Variations

scenario 1 all (1-11) SST, SSD, DST, DSD, E, ST, SD

scenario 2 1-8 SST, SSD, DST, DSD, E, ST, SD, SG

scenario 3 1-3 SST, SSD, DST, DSD, E, ST, SD, SG, SW, W

important for these activities as we want to use their temporal information. Though we analysed the recall, precision, and measure as performance metrics, we only present the F-measure, because it incorporates both recall and precision as defined in Equation (3).

F − measure =2 ∗ precision ∗ recall

precision + recall (3)

As shown in Table III, we define three scenarios for evaluation because not all activities were performed by all participants. In scenario 1, we consider all participants, but we do not include their smoking while walking and while in group conversation as not all participants performed these two variations of smoking. For scenario 2, we consider the first eight participants as all of them performed smoking while in a group conversation. However, in this scenario, we do not include the smoking while walking from the first three participants. For scenario 3, we consider the first three participants from Table II and all of their activities. We only picked these three for the third scenario because only they have performed smoking while walking and walking.

A. Person-Dependent Evaluation

For person-dependent case, we evaluated all eleven par-ticipants individually and observed improvements for all of them with HLSDA compared to single layer classification algorithms. We observe improvements for smoking, drinking and eating, especially for drinking and smoking as these two activities are usually confused with each other due to their similar hand-to-mouth gestures. We present the average of all participants’ recognition performance using a single layer clas-sification approach and the improvements due to our approach HLSDA for our three scenarios in Table IV. In most cases, HLSDA corrects up to 50% of the misclassified segments as shown in the increase column in Table IV. HLSDA achieves comparable results for smoking detection with a decision tree at the first layer as that of a random forest with nine estimator trees. For example, we achieve an F-measure of 95% for smoking detection with decision tree as first layer classifier and 96% with a random forest tree as shown in Table IV. This shows that we can use our correction method with a computationally cheaper algorithm at the first layer and still achieve a reasonable recall and precision.

B. Person-Independent Evaluation

For person-independent scenarios, we use leave one subject out method. Though we observe slightly lower recognition performance in such scenarios, our proposed approach still improves the precision and recall of smoking by a significant amount. We show the improvements due to HLSDA in Table V for our three scenarios. Moreover, our approach not only improves the recognition of smoking activity, but also drinking and eating activities as shown in Table V. We observe that drinking and smoking are confused with each other similar

TABLE IV. INCREASE INF-MEASURE DUE TOHLSDACOMPARED TO A SINGLE-LAYER CLASSIFIER INPERSON-DEPENDENT EVALUATIONS

SVM Increase DT Increase RF Increase smoke 0.90 0.05 0.85 0.07 0.89 0.04

eat 0.92 0.03 0.82 0.07 0.88 0.06 drink 0.87 0.06 0.80 0.10 0.85 0.07 Inactive 1.00 0.00 1.00 0.00 1.00 0.00

(a) scenario 1

(b) scenario 2

SVM Increase DT Increase RF Increase smoke 0.90 0.06 0.83 0.07 0.87 0.07 smoke walk 0.86 0.06 0.78 0.08 0.80 0.10 eat 0.95 0.03 0.85 0.05 0.91 0.04 drink 0.87 0.09 0.78 0.11 0.83 0.09 inactive 1.00 0.00 1.00 0.00 1.00 0.00 walk 0.97 0.01 0.94 0.04 0.95 0.02 (c) scenario 3

TABLE V. INCREASE INF-MEASURE DUE TOHLSDACOMPARED TO A SINGLE-LAYER CLASSIFIER INPERSON-INDEPENDENT EVALUATION

(a) scenario 1

(b) scenario 2

SVM Increase DT Increase RF Increase smoke 0.84 0.06 0.80 0.08 0.84 0.05 smoke walk 0.83 0.08 0.64 0.10 0.73 0.04 eat 0.87 0.05 0.71 0.15 0.73 0.06 drink 0.74 0.10 0.67 0.16 0.71 0.10 inactive 0.96 0.02 0.96 0.02 0.96 0.02 walk 0.97 0.02 0.91 0.03 0.96 0.01 (c) scenario 3

to person-dependent evaluation. As an example, we show the confusion matrices of a single layer classifier and HLSDA in scenario 2 in Figure VI where it can be seen that mainly smoking and drinking are confused with each other.

Our method can also be easily applied to commonly used simple physical activities such as walking, jogging, biking, sitting, standing, using stairs. For this purpose, we applied the same approach on one of our previous data-sets [19] and observed important improvements for all seven activities.

In both scenarios, smoking is mainly confused with drink-ing. These two activities are very similar. In some cases, we observed that some participants were holding their coffee or tea mug at the same position as their cigarette while smoking, in which case it becomes very difficult to differentiate them. Moreover, performing these activities while sitting can be more

(6)

TABLE VI. PERSON-INDEPENDENT EVALUATION IN SCENARIO2

Predicted Class smoke eat drink inactive Actual Class

smoke 1338 14 128 14 eat 0 439 59 5 drink 25 2 971 6 inactive 0 0 140 2642

(a) Confusion matrix of HLSDA with SVM in first-layer

Predicted Class smoke eat drink inactive Actual Class

smoke 1305 15 174 0 eat 5 433 65 0 drink 81 17 895 11 inactive 0 0 219 2563

(b) Confusion matrix of single layer classification: SVM

similar to each other compared to doing them while standing. As the recognition is based on the motion data, the classifier will struggle differentiating them if their motion pattern is exactly the same.

VI. CONCLUSION,LIMITATIONS,AND FUTURE WORK We propose to use a two-layer hierarchical smoking de-tection algorithm (HLSDA) that improves the recognition of smoking and similar activities compared to a single layer classification approach, as well as the state of the art. It corrects up to 50% of the misclassified segments. We achieve a very high precision as well as recall for smoking in both person-dependent (90-97% F-measure) and person-inperson-dependent sce-narios (83-94% F-measure) which is higher than the previous works. We also collected a large data-set which consists of eleven participants. For this purpose, we developed an Android application which collects various types of sensor data from both smartphone and smartwatch in real-time.

Our algorithm makes certain assumptions about smoking activity, hence it may or may not work in situations like smoking while cycling or when a cigarette is being shared among a group of people. However, we believe that it will handle the commonly used smoking scenarios in a reliably way. There are still some aspects of smoking and other less repetitive activities that need to be explored. We intend to implement this on Android phones in our next study. Moreover, we are exploring the effects of various postures on smoking recognition and ways to make the classification more generic. We also intend to do a smoking session-wise evaluation with leave-one-session-out approach.

ACKNOWLEDGMENT

This work is supported by Dutch National Program COM-MIT in the context of SWELL project, by the Galatasaray University Research Fund under the grant number 15.401.004 and by Tubitak under the grant number 113E271.

REFERENCES

[1] M. Shoaib, S. Bosch, O. D. Incel, H. Scholten, and P. J. Havinga, “Complex human activity recognition using smartphone and wrist-worn motion sensors,” Sensors, vol. 16, no. 4, p. 426, 2016.

[2] A. Parate, M.-C. Chiu, C. Chadowitz, D. Ganesan, and E. Kalogerakis, “Risq: Recognizing smoking gestures with inertial sensors on a wrist-band,” in Proceedings of the 12th annual international conference on Mobile systems, applications, and services. ACM, 2014, pp. 149–161.

[3] Q. Tang, D. J. Vidrine, E. Crowder, and S. S. Intille, “Automated detection of puffing and smoking with wrist accelerometers,” in Pro-ceedings of the 8th International Conference on Pervasive Computing Technologies for Healthcare. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2014, pp. 80–87.

[4] G. Chen, X. Ding, K. Huang, X. Ye, and C. Zhang, “Changing health behaviors through social and physical context awareness,” in Com-puting, Networking and Communications (ICNC), 2015 International Conference on. IEEE, 2015, pp. 663–667.

[5] N. Saleheen, A. A. Ali, S. M. Hossain, H. Sarker, S. Chatterjee, B. Mar-lin, E. Ertin, M. al’Absi, and S. Kumar, “puffmarker: a multi-sensor approach for pinpointing the timing of first lapse in smoking cessation,” in Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 2015, pp. 999–1010. [6] J. P. Varkey, D. Pompili, and T. A. Walls, “Human motion recognition

using a wireless sensor-based wearable system,” Personal and Ubiqui-tous Computing, vol. 16, no. 7, pp. 897–910, 2012.

[7] P. M. Scholl and K. Van Laerhoven, “A feasibility study of wrist-worn accelerometer based detection of smoking habits,” in Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth International Conference on. IEEE, 2012, pp. 886–891. [8] P. M. Scholl, N. K¨uc¨ukyildiz, and K. V. Laerhoven, “When do you

light a fire?: capturing tobacco use with situated, wearable sensors,” in Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication. ACM, 2013, pp. 1295–1304. [9] P. Wu, J.-W. Hsieh, J.-C. Cheng, S.-C. Cheng, and S.-Y. Tseng, “Human

smoking event detection using visual interaction clues,” in Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE, 2010, pp. 4344–4347.

[10] P. Lopez-Meyer, S. Tiffany, and E. Sazonov, “Identification of cigarette smoke inhalations from wearable sensor data using a support vector machine classifier,” in Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE. IEEE, 2012, pp. 4050–4053.

[11] A. A. Ali, S. M. Hossain, K. Hovsepian, M. M. Rahman, K. Plarre, and S. Kumar, “mpuff: automated detection of cigarette smoking puffs from respiration measurements,” in Proceedings of the 11th international conference on Information Processing in Sensor Networks. ACM, 2012, pp. 269–280.

[12] “Dataset available at:,” http://ps.ewi.utwente.nl/Datasets.php. [13] M. Shoaib, S. Bosch, H. Scholten, P. J. Havinga, and O. D. Incel,

“Towards detection of bad habits by fusing smartphone and smartwatch sensors,” in Pervasive Computing and Communication Workshops (Per-Com Workshops), 2015 IEEE International Conference on. IEEE, 2015, pp. 591–596.

[14] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[15] M. Shoaib, S. Bosch, O. D. Incel, H. Scholten, and P. J. Havinga, “A survey of online activity recognition using mobile phones,” Sensors, vol. 15, no. 1, pp. 2059–2085, 2015.

[16] H. Lu, J. Yang, Z. Liu, N. D. Lane, T. Choudhury, and A. T. Campbell, “The jigsaw continuous sensing engine for mobile phone applications,” in Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems. ACM, 2010, pp. 71–84.

[17] E. Miluzzo, N. D. Lane, K. Fodor, R. Peterson, H. Lu, M. Musolesi, S. B. Eisenman, X. Zheng, and A. T. Campbell, “Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application,” in Proceedings of the 6th ACM conference on Embedded network sensor systems. ACM, 2008, pp. 337–350. [18] K. Ouchi and M. Doi, “Indoor-outdoor activity recognition by a

smart-phone,” in Proceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, 2012, pp. 600–601.

[19] M. Shoaib, S. Bosch, O. D. Incel, H. Scholten, and P. J. Havinga, “Fusion of smartphone motion sensors for physical activity recognition,” Sensors, vol. 14, no. 6, pp. 10 146–10 176, 2014.