Advanced data analysis for the design of a personalized sports app

(1)

Advanced data analysis for the

design of a personalized sports

app

Joris Timmer

10636374

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Prof. B. Kr¨ose Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam June 30th, 2017

(2)

1 Abstract

For the design of a mobile sports application it is important to tai-lor the app to the user. This thesis has two major purposes: (1) to discover distinctive groups within participants of the Dam tot Dam loop 2014 regarding their preferences of mobile sports apps. (2) To investigate in which characteristics these groups differ significantly. To accomplish these tasks, the K-means clustering algorithm was ap-plied on the answers to 20 questions referring to various app features answered by participants which indicated that they utilize apps dur-ing their traindur-ing. Additionally, principal component analysis was conducted in order to gain insight on which of the proposed app features caused most variance and which features the participants agreed on. The results show that two groups can be defined which have considerably different preferences. One of which is mainly in-terested in tracking speed during their training and to keep track of their progress after. This group consists predominantly of males and a significantly higher percentage of participants of the 16km run (79.3% over 20.7% 6.4km). On the contrary, a group of predominantly women with a lower percentage 16km (59.2% over 40.8% 6.4km) par-ticipants was discovered. This group considered each of the proposed features relatively important, including features with a advisory or motivational role. Other significant differences were found between these groups including characteristics with respect to motivation and self perceived health.

(3)

2 Introduction

2.1 Background

Lack of physical activity in urban environments is an increasing public health concern because it can lead to serious health risks [Organization2009, Church et al.2005]. The Playful Data-driven Active Urban Living project (PAUL) aims to gain more scientific insight in how mobile technology can help to increase the physical activity of the population, and therefore its health [Chakravarty et al.2012, Dallinga et al.2016]. This project is a collaboration of the Digital Life Centre, University of Amsterdam, University of Utrecht and Federal Uni-versity of S˜ao Paulo (UNIFESP). While there has been various studies about mobile running applications [Dallinga et al.2016] [Glynn et al.2014] [Stephens and Allen2013], it remains unclear how to optimize the personalization of mobile sports apps. This thesis aims to support the PAUL project by defining groups within the preferences of runners regarding their mobile sports apps. To which group a runner belongs may depend on the gender, health, experience or other characteristics of the runners. This knowledge will be used as a foundation for a personalized supportive mobile running application. Once the personalized application is established it should help increase the physical activity of urban residents using the application. The target users of the PAUL project range from inexperienced, recreational runners to experienced athletes who train multiple times a week.

2.2 Problem definition

To accomplish this, the K-means clustering algorithm is applied on a data set containing the answers of questionnaires filled in by Dam tot Dam loop 2014 participants. Groups are discovered on answers to questions about preferences of application functionalities. Once these clusters are defined, the clusters are compared on other characteristics. These characteristics include motivational aspects and a list of questions about the self perceived health of the participant. This way groups with specific properties can be identified. This information can then be utilized in order to be able to design a mobile app tailored to the individual user. By applying the K-means algorithm this project aims to answer the following research question: What type of groups can be discovered in data on Dam tot Dam loop 2014 participants mobile sports app functionality preferences?, which leads to the second research question: In what way do these groups differ significantly from each other?. In this thesis it is hypothesized that a number of groups with different characteristics can be discovered with distinctive preferences regarding app functionalities.

2.3 Previous work

Dallinga [Dallinga et al.2015] describes a study which investigates the effects of the current usage of mobile technology in the domain of sports apps. It is

(5)

demonstrated that the usage of such technology is associated with a higher train-ing volume and an increased perceived health. Furthermore, it is suggested that the upcoming mobile technology is more popular among the less active runners. A positive correlation has been found between the usage of mobile technology and perceived health.

The aim of this thesis is to identify specific features of mobile technology which are desired by the users. These features are personalized based on running ex-perience and various demographics. A considerable amount of research has been done regarding this topic. A study which aims to gain insight on preferences of users for applications promoting physical activity is described by Rabin and Bock [Rabin and Bock2011]. The results demonstrate that a desirable feature is the automatic tracking of steps taken and calories burned. Other features such as advisory aspects were found to be less important to the subjects. A study conducted by Middelweerd et al. [Middelweerd et al.2015] yielded similar results in a specific domain (Dutch students aged 18 to 25). The conclusion of this study was that tracking and goal setting features were found highly desir-able. It is specifically interesting that it was discovered that the students were more interested in motivational features than middle aged women [Ehlers and Huberty2014]. This suggests that different groups of runners desire different features.

The aim of this thesis is to gain insight into these different groups. In order to achieve this, unsupervised learning algorithms are utilized. Wilson [Wilson2015] investigates the situation of data analysis where the number of clusters to be discovered is not known a priori and doing so with missing values present. This situation corresponds to the current domain as both problems are relevant.

3 Approach

3.1 Questionnaire data

This thesis is based on a questionnaire data set filled in by participants of the Dam tot Dam loop September 2014 and a follow up questionnaire completed in march 2015. Out of the 54,000 participants, a random selection of 15,000 runners were invited to participate in the research. By completing an online survey they answered 51 questions about among other subjects: demographic aspects, their running habits, what motivates them, their running performance and questions about mobile running apps. Out of the selection of 15,000 runners 28% responded.

3.2 K-means Algorithm

The algorithm used in this thesis is known as the K-means clustering algorithm [Wilson2015]. It seeks to assign all instances in the given data to K number of clusters. As the name of the algorithm suggests, the instances are assigned using the means of all the observations in each cluster. The clusters are randomly

(6)

initialized and are iteratively optimized in order to lower the sum of squared errors within each cluster. Let Ckbe the set of observations x assigned to cluster

ck, then the within cluster sum of squared error is defined as:

s(i) = K X k=1 X x∈Ck (x − Ck)2 (1)

After random initialization, the algorithm iterates as follows.

1. Assignment step: each observation is assigned to the cluster which results in the lowest sum of squares. This is the closest cluster centroid based on Euclidean distance.

2. Update step: each cluster centroid is updated. The new value is the mean of all observations assigned to that cluster:

ck= 1 mk X x∈Ck x (2)

Where mk is the number of observations assigned to ck.

This process of assigning observations and updating clusters is repeated until convergence, i.e. until the cluster centroids do not change after the update step. The standard K-means algorithm is criticized on various aspects. Firstly and most prominently, the algorithm needs the number of clusters to be specified a priori. Nevertheless, there are methods to decide what number of clusters would be the best fit for a given data set. This will be discussed in the next section (3.3). Secondly, it is a fact that the random initialization can greatly affect the final outcome of the clustering algorithm. To avoid running into local optima, the algorithm is run 100 times and the result with the minimum total sum of squared error is selected. Another aspect of criticism states that outliers can greatly skew the effectiveness of the algorithm. This is true for data in which variables have a great variance. However this will not be a problem considering the data set used here, as each variable represents only a small domain (integers ranging from 1 to 4).

3.2.1 Missing values

A general problem with clustering algorithms is that they can not function properly when missing values are present. The same applies to the K-means algorithm. Two solutions are available, which are to imputate the missing values or to ignore the missing data. Both solutions are not perfect as imputations are based on assumptions which can skew the outcome and by ignoring the data one may be missing valuable information [Wilson2015]. In this thesis it is decided to ignore the observations with missing values as it consists of a relatively small percentage of the participants which completed the questionnaires (≈ 6%).

(7)

3.3 Determining the optimal number of clusters

In this thesis, the K means algorithm is applied. One requirement of this al-gorithm is that the number of clusters must be known a priori. Determining the optimal number of clusters is an ambiguous task. This is mainly because increasing the number of clusters will always lower the mean squared error by definition. A common method to determine which number of clusters generate the most distinctive groups is to plot the number of clusters against the sum of squared or errors. This way an ’elbow’ can be found which indicates that the instances within each cluster are properly structured [Thorndike1953]. Figure 1 shows the result of this method applied to the K-means algorithm on the app functionality data. The algorithm continues to repeat until convergence and is run 100 times with random initializations. As stated before, out of these ran-dom initializations the iteration which resulted with the minimal sum of squared errors (1) is selected. The ’elbow’ in this plot is determined to be at 2 or 3 clus-ters. In this thesis it is decided to investigate the results of 3 clusters, as 2 out of 3 the clusters turned out to be more distinctive versions of the clusters found when using K = 2.

Another criterion to determine whether the found clusters are most distinctive is the silhouette method [Rousseeuw1987]. This is a method which can be used to evaluate how closely the data within each cluster is related to each other and how it is distinctive from data in other clusters. The silhouette value is calculated as follows. Let b(i) be the average distance to all instances of the mostly related cluster which i is not a part of. Let a(i) be the average distance to all instances within the cluster i is assigned to. The silhouette value s(i) is now defined as:

s(i) = b(i) − a(i)

max(a(i), b(i) (3)

The average of the silhouette value over each instance is taken and plotted against the number of clusters. The results of this are shown in Figure 2. A higher average silhouette value implies more distinctive clusters. Similarly to the elbow method an optimal number of clusters is suggested to be 2 or 3.

3.4 Principal Component Analysis

The Principal Component Analysis (PCA) is a mathematical tool used to gain insight in the distribution of variance among a data set [Wold et al.1987]. If a dataset is high dimensional, it can be interesting to analyze which variables are most distinctive for each observation and which variables add the least amount of information. The PCA is a procedure which uses an orthogonal transforma-tion to convert high dimensional data to a set of principal components. These components describe the variance in the original dataset in decreasing order while agreeing to the condition that each component is orthogonal to every other component. By applying the principal component analysis on the app functionality data containing 20 questions, an insight is created in to which of

(8)

Figure 1: Plot of Sum of squared errors against number of clusters, elbow around 2 or 3 clusters

the functionality variables contain a high amount of variance and which do not. In the context of this thesis that means that the interest for certain app func-tionalities with a greater variance are more specific to the user. A low variance means that every participant has approximately the same opinion about the given functionality.

4 Analysis and results

4.1 Data structures

The algorithm used in this thesis was applied to answers of participants who indicated they used an app with their training. Appendix A.1 contains the ques-tions which were used to gain insight what specific features were of importance for the participants (translated from Dutch). All questions could be answered from ’Very important’(1) to ’Very unimportant’(4).

Additionally, two lists of multiple choice questions were used to compare the clusters. These lists include questions about the participants motivation to run and a combination of questions about health, fitness and lifestyle. The 17 variables listed in appendix A.2 were included in the question list regarding motivational aspects with answering options ranging from ’Very important’ (1) to ’Very unimportant’ (4). Similarly, Appendix A.3 contains 12 questions about self perceived health and lifestyle. These questions were answered ranging from ’Completely not applicable’ (1) to ’Completely applicable’ (5).

Furthermore, a collection of separate variables were analyzed in order gain in-sight into the characteristics of the clusters. These include binary variables such

(9)

Figure 2: Plot of average silhouette value against number of clusters

as gender (1: male, 2: female) and distance (1: 16km, 2: 6.4km). It was also analyzed whether the discovered groups differ in their current application usage. A multiple choice question on this subject was transformed in order to cover it with two binary questions on the two most popular applications: Runkeeper (used by 44.8% of the participants who indicated they use mobile apps) and Nike+ (used by 12.6 %). Furthermore, three discrete variables were analyzed. These include the age of the participant and two multiple choice questions re-garding training frequency and current app satisfaction. Both these multiple choice questions were submitted in the follow up questionnaire in 2015. To compare the clusters on their app satisfaction a range of 1 (completely appli-cable) to 7 (completely not appliappli-cable) on the statement an app is a valuable addition to my training is used. To analyze the training frequency characteris-tic, the answers of the following variable were utilized:

How often did you exercise before your participation in the Dam tot Dam loop 2014?

1. Never

2. Once or twice a month 3. Less than once a week 4. Once a week

5. Twice a week

6. Three or four times a week 7. More than four times a week

Lastly, four continues variables are analyzed. These include the weight and length of the participants, in addition to their average running speed over both distances (16km and 6.4km).

(10)

4.2 Principal component analysis results

The principal component analysis was applied to the 20 questions regarding application functionalities (appendix B.1). Figure 3 shows the principal com-ponent numbers plotted against their eigenvalues. Evidently the most variance in the data is explained by the first principal component. The second princi-pal component also has an eigenvalue higher than 1, which means it explains more variance than it adds [Costello and Osborne2005]. Table 1 (a) shows the coefficients of principal components 1 and 2. From this table it can be derived that questions 3, 5, 10, 11,12,13,14 and 15 explain most of the variance in this dataset. On the contrary, questions 6,16,17 and 18 are almost completely in-significant regarding the variance. By plotting the data represented by the first two principal components and apply K-means with K = 3, it can be seen that the cluster assignment is nearly exclusively influenced by Principal component 1 (See Figure 4).

Figure 3: Scree plot resulting from PCA applied to app functionality questions

4.2.1 Construct discovery using PCA

Applying the principal component analysis on the app functionality data sug-gested that the indicated importance of many functionalities are correlated. Therefore it might be interesting to define the underlying constructs within the 20 questions about app functionalities. The PCA can indicate hidden factors that lie underneath the answers of the proposed questions [Heerink et al.2010]. The result of this method groups the functionalities in such a way that when a user is interested in one of the features in a specific group it might also ap-preciate the other features in that group. This way a sports application can be personalized by offering different types of features to different types of runners.

(11)

Figure 4: App functionality data represented by Pc1 and Pc2, clustered with K = 3. Each colour represents a different cluster.

In order to define the hidden constructs, a varimax rotation is applied the prin-cipal components with an eigenvalue greater than 1 [Costello and Osborne2005]. Table 1 (b) shows the results of this method with each highest value appearing in bold. Based on those results two new constructs have been defined. The func-tionalities which are suggested to be correlated and thus forming a construct can be seen in Table 2. The first construct consists of mostly advisory and motivational features. This is contrary to the second construct which consists of tracking and social features.

(12)

(a) Question nr. Pc1 Pc2 1 0.2122 0.0351 2 0.2110 0.0392 3 0.2970 -0.0610 4 0.1140 0.2585 5 0.2998 -0.0968 6 0.0540 0.1903 7 0.1479 0.2113 8 0.1818 0.1768 9 0.1895 -0.0051 10 0.3035 -0.1330 11 0.3192 -0.1468 12 0.3138 -0.2287 13 0.3256 -0.2307 14 0.2848 -0.0461 15 0.2722 -0.0155 16 0.0740 0.2566 17 0.0890 0.3596 18 0.0479 0.3133 19 0.1034 0.5368 20 0.2136 0.2769 Eigenvalue 4.2061 1.1477 % of variance 34.8825 9.5183 (b) Question nr. Pc1 Pc2 1 0.1908 0.0992 2 0.1884 0.1027 3 0.3012 0.0342 4 0.0282 0.2811 5 0.3150 0.0010 6 -0.0077 0.1977 7 0.0750 0.2468 8 0.1179 0.2245 9 0.1817 0.0539 10 0.3298 -0.0323 11 0.3490 -0.0405 12 0.3693 -0.1200 13 0.3811 -0.1182 14 0.2850 0.0446 15 0.2635 0.0697 16 -0.0092 0.2669 17 -0.0270 0.3695 18 -0.0516 0.3127 19 -0.0683 0.5424 20 0.1172 0.3295 Eigenvalue 4.2061 1.1477 % of variance 34.8825 9.5183 Table 1: (a) Coefficients of the first two principal components. (b) Varimax rotation applied to the first two principal components.

Construct 1 Construct 2

Before training Motivation to start training To be part of a community/group Route tips

Training tips Training schedule

During training Music while running To keep track of speed

To follow a training schedule To keep track of energy usage (calories burned) Tips regarding running technique To keep track of heart rate

Tips regarding training variance Tips regarding layout of training Motivation to keep up

Recommendation to adjust speed

After training To keep track of own progression

To keep track of personal records To review running route

To share activities with others Feedback on performance

(13)

4.3 Clusters

Applying the K-means algorithm with K = 3 yielded the clusters shown in Figure 5 (a). The clusters are represented as a colour coded line. The y axis represents 5 - the average score of each question in order to make the graph more intuitive. A higher value means the participants in that cluster indicate that they find that feature more important. Figure 5 (b) shows the values of each cluster relative to the global average answers for each question. Notice-able are the scores for questions 6, 16, 17 and 18. These questions represent To track of speed, keeping track of progress, To keep track of personal records and To review running route respectively. This corresponds with the results of applying principal component analysis (see Table 1 (a)). Cluster 1 and 3 differ substantially at every other variable. Cluster 1 can be defined as a group which indicated to exclusively consider functionality 6, 16 ,17 and 18 important (≥ 2.5). Cluster 3 can be defined as a group which considered every proposed app feature important. Lastly there is cluster 2, which highly correlates with the global average over all participants. These cluster numbers correspond to the characteristic values described in Table 3 and are further expanded on in the next section.

4.4 Characteristics

Table 3 shows the average values of a number variables corresponding to the 3 clusters found by applying K-means. Each variable is described in section 3.1. Following from this table, cluster 1 consists of a significantly higher percentage male participants than cluster 2 and 3. Cluster 3 consists of the lowest per-centage of male participants. The length and weight variables are parallel to this result, as the group with a higher percentage of male participants are on average taller and heavier. Secondly cluster 1 has an higher average age and less participants which signed up for the shorter distance (6.4 km) than cluster 2 and 3. Again, this is in contrast with cluster 3 which is generally younger and has a higher percentage of runners who signed up for the 6.4km than the overall average. In addition, cluster 1 has a higher average speed than cluster 2 and 3 on both distances. The differences in the training frequency variable are not significant (see section 5.2). Cluster 3 indicates to attach most value to their running app contrary to cluster 1.

(14)

(a)

(b)

Figure 5: (a) 3 clusters found by applying K-means on the 20 app functionality questions. For the exact values see appendix B.1. (b) 3 clusters found by applying K-means on the 20 app functionality questions, relative to the global average answers of each question.

(15)

Total Cluster 1 Cluster 2 Cluster 3 Count 712 169 352 191 % Male 46.0674 59.7633 45.7386 34.5550 Avg. length 176.6450 178.5621 176.8854 174.4868 Avg. weight 74.3698 75.6154 74.3573 73.2674 Avg. age 38.6123 40.0422 38.8075 36.9622 % 6.4km 31.1798 20.7101 30.9659 40.8377 Avg. speed 6.4km 9.3505 9.7628 9.4766 8.9962 Avg speed 16km 10.3791 10.7267 10.2920 10.1504 % Runkeeper 44.8034 49.7041 44.6023 40.8377 % Nike+ 12.6404 8.2840 13.0682 15.7068 % Other apps 42.5562 42.0118 42.3295 43.4555 Training frq 5.2261 5.1834 5.2443 5.2304 App satisfaction 5.5421 5.1338 5.5466 5.8636 Table 3: Values of various characteristics of each cluster

4.5 Motivational Characteristics

After applying the clustering algorithm on the app functionality questions, the answers of each cluster on the questions about the participants motivation are compared. The results of this are shown in Figure 6 (a) and (b). The greatest differences can be seen at question 9: To learn new skills, 12 A social activity, 13 Support from social contacts, 16 Not expensive and 17 Prestige/status. Cluster 1 scores significantly lower on each of these questions, contrary to cluster 3 which scores the highest on average. Following from Figure 6 (b), cluster 1 indicates to relatively find 9 performance, improving yourself and 2 To improve physical health the most important factors considering their motivation. Cluster 3 has relatively high scores on question 9 To learn new skills and 13 Support from social contacts. This is in accordance with the mobile app features those two groups prefer, as following from section 4.3 cluster 3 scores significantly higher on advisory features than cluster 1.

4.6 Self perceived health Characteristics

In addition to the motivational question list, a number of questions about the participant’s self perceived health is considered, shown in Figure 7. Significant differences between cluster 1 and 3 can be found comparing on question 3 I know sports are not for me , 4 There is a good chance that I will keep exercising, 5 I have a good feeling about myself, 6 I see myself as a sports person, 11 Sometimes I wonder if my body is right for the sports I participate in and 12 I have health issues and I wonder if I can keep exercising. Cluster 1 indicated relatively low values on the questions with a negative perspective towards healthiness (3,11,12) with the exception of question 7. However, the clusters do not differ significantly at that variable (see section 4.7.2). Additionally, cluster 1 scores relatively high

(16)

(a)

(b)

Figure 6: (a) Average answers to the questions about motivation as described in section 3.1 for each cluster and the overall average. For the exact values see appendix B.2. (b) Answers to the questions about motivation relative to the overall average for each cluster.

(17)

averages on the positive questions regarding health and fitness (4,5,6,9). Cluster 2 and 3 are not significantly dissimilar with the exception of question 11.

4.7 Statistical significance

4.7.1 Fischer test

In order to evaluate whether the found characteristics for each cluster are signif-icantly different from each other, various statistical tests are used. To test the significance of the differences in the binary characteristics, Fisher’s exact test is applied [Fisher1922]. The binary variables tested here include gender, distance run and the currently used apps by the participants.

Fisher’s exact test is a calculation to test whether the null hypothesis is true. The null hypothesis in this context means that each tested cluster has an equal distribution for the to be tested variable. The hypergeometric probability of observing the given data under the null hypothesis is calculated, this is the p-value. If p ≤ 0.05 the null hypothesis can be rejected. In this context this suggests that the clusters are significantly different for that specific variable. The test is applied pair wise to each cluster combination in order to gain insight on the significance of the difference between each cluster. Table 4 shows the computed p value for each binary variable. Following from this table, the clus-ters all vary significantly regarding gender. Cluster 3 consist of a significantly higher amount of short distance runners than cluster 1 and 2, however there is no significant difference between cluster 1 and 2. The test also indicates that the found clusters do not differ in their usage of Runkeeper or Nike+ apps.

Comparison Gender Distance Runkeeper Nike+ Cluster 1 - 2 0.0072 0.0674 0.6313 0.1039 Cluster 1 - 3 0.0000 0.0000 0.2001 0.0709 Cluster 2 - 3 0.0123 0.0028 0.3292 0.6127

Table 4: p-values resulting from Fisher’s exact test. Values ≤ 0.05 appear bold.

4.7.2 Mann–Whitney U test

A different statistical test is needed to compute the significance of the differences between the non binary variables. To test whether clusters are significantly different regarding the variables consisting of a range of discrete values, the Mann–Whitney U test or Wilcoxon rank-sum test is utilized [Mann and Whit-ney1947]. This test is chosen over the commonly used t-test because it does not require the data to come from a normal distribution. It is applied to the values found on the question lists about the participant’s motivation and self perceived health. Additionally, this test is applied to the age, training frequency and app satisfaction variables. The two sided Wilcoxon rank sum test tests whether the two given samples come from the same populations with equal medians, which is represented by the null hypothesis. The output is the p statistic. Similarly to

(18)

(a)

(b)

Figure 7: (a) Average answers to the questions about self perceived health as described in section 3.1 for each cluster and the overall average. For the exact values see appendix B.3. (b) Answers to the questions about self perceived health relative to the overall average for each cluster.

(19)

the outcome of the Fisher test, when this value is ≤ 0.05 it is suggested that the null hypothesis can be rejected which means the two clusters are significantly different.

The p statistics resulting from testing the differences of the age, training fre-quency and app satisfaction variables can be seen in table 5. Cluster 1 consists of participants with a significantly higher age than cluster 3. Additionally, par-ticipants in cluster 3 differs significantly from cluster 1 and 2 regarding the app satisfaction variable. Table 6 shows the results of applying this statistical test to the 12 questions about self perceived health (see section 3.1). It is suggested that cluster 1 and 3 are significantly different at questions 3 I know sports are not for me, 5 I have a good feeling about myself, 6 I see myself as a sports person, 11 Sometimes I wonder if my body is right for the sports I participate in and 12 I have health issues and I wonder if I can keep exercising. Cluster 1 and 2 have a p statistic lower than 0.05 for questions 5 and 6. The only significant difference between cluster 2 and cluster 3 is found at question 11.

Similarly, table 7 shows the results of applying the Mann-Whitney U test on the question list about the participants motivation. As a consequence of the average answers of each cluster on these questions being further apart, the p statistics are generally lower. The p-statistics suggest that cluster 1 and 3 are significantly different for each of the seventeen questions. The same applies to cluster 2 and 3 with the exception of question 17 Prestige/status. Cluster 1 and 2 have p statistics lower than 0.05 for question 3 To improve mental health, 9 To learn new skills, 11 Relaxation against the stress of daily life, 12 A social activity, 13 Support from social contact s, 15 Little/no travel time necessary, 16 Not expensive and 17 Prestige/status Prestige/status.

Comparison Age Training frq. App satisfaction Cluster 1 - 2 0.0620 0.7031 0.0505

Cluster 1 - 3 0.0032 0.8709 0.0001 Cluster 2 - 3 0.0934 0.5498 0.0024

Table 5: p-statistics resulting from the Mann-Whitney U test. Values ≤ 0.05 appear bold.

(20)

Question Cluster 1 - 2 Cluster 1 - 3 Cluster 2 - 3 1 0.3051 0.9886 0.0942 2 0.3825 0.6745 0.5335 3 0.0841 0.0329 0.4456 4 0.1545 0.0561 0.3862 5 0.0175 0.0073 0.3602 6 0.0348 0.0076 0.2289 6 0.9572 0.7825 0.7437 8 0.5768 0.1032 0.0900 9 0.3171 0.1801 0.4713 10 0.4449 0.1209 0.1817 11 0.2505 0.0112 0.0162 12 0.1956 0.0255 0.0729

Table 6: p-statistics resulting from the Mann-Whitney U test applied to 12 questions about self perceived health. Values ≤ 0.05 appear bold.

Question Cluster 1 - 2 Cluster 1 - 3 Cluster 2 - 3

1 0.3786 0.0009 0.0037 2 0.1796 0.0014 0.0214 3 0.0062 0.0000 0.0180 4 0.8278 0.0001 0.0000 5 0.2427 0.0001 0.0006 6 0.5177 0.0001 0.0001 7 0.4721 0.0083 0.0149 8 0.3251 0.0002 0.0005 9 0.0000 0.0000 0.0000 10 0.1119 0.0000 0.0002 11 0.0480 0.0006 0.0359 12 0.0082 0.0000 0.0071 13 0.0000 0.0000 0.0000 14 0.8920 0.0020 0.0002 15 0.0319 0.0004 0.0361 16 0.0436 0.0000 0.0008 17 0.0096 0.0004 0.1459

Table 7: p-statistics resulting from the Mann-Whitney U test applied to 17 questions about self perceived health. Values ≤ 0.05 appear bold.

4.7.3 Analysis of variance

The analysis of variance (ANOVA) is used to analyze the differences between the clusters with regards to their average weights, lengths and their average running speed on both the 6.4km and the 16km [Anderson2001]. The one way anova algorithm used here assumes that each observation is independent and are

(21)

distributed normally. It is tested whether the given groups have equal means versus the alternative hypothesis that one group is different. This algorithm can be applied to multiple groups in order to test if any of the given groups is statistically different. However, here ANOVA is applied pair wise with the purpose of gaining insight in the differences between each cluster. The output is the p-value which indicates that the null hypothesis can be rejected when p ≤ 0.05.

Table 8 shows the results of testing the normally distributed variables for each pair wise cluster comparison. The analysis of variance suggests that cluster 1 and 2 have significantly different means with respect to length and their average running speed on both distances. Cluster 1 and 3 are suggested to be signifi-cantly different on each of the four variables, as expected due to the relatively high difference in gender. Cluster 2 and 3 are only significantly different on their average running speed on the 16km.

Comparison Weight Length Avg. speed 6.4km Avg. speed 16km

Cluster 1 - 2 0.0786 0.0351 0.0251 0.0012

Cluster 1 - 3 0.0121 0.0004 0.0002 0.0000

Cluster 2 - 3 0.3062 0.1239 0.0558 0.0220

Table 8: p-values resulting from applying the analysis of variance (ANOVA) on normally distributed variables. Values ≤ 0.05 appear bold.

5 Conclusion & Discussion

The first goal of this thesis was to discover groups of runners with similar pref-erences regarding mobile running applications. No clear elbow point was found in the plot which was used to determine the optimal number of clusters, this suggests that the data is not clearly separated in various groups of closely re-lated data points. Principal component analysis suggested that most of the variance in the data was explained by features with an advisory or motivational role. Consequently, the answers to the app feature questions were separated in three different segments, primarily based on how much each cluster considered advisory and motivational features important.

The second goal of this thesis was to analyze those groups in order to find their distinctive characteristics. The algorithm applied here suggested three differ-ent types of preferences for mobile apps. One of which is exclusively interested in tracking features (6,16,17,18) and one of which indicated to consider all pro-posed features important with the exception of two social features (4,19). Lastly a group was discovered which closely resembled the overall average of each vari-able. The first group was found to consist of significantly more men, a higher percentage of participants of the 16km and view themselves as more athletic. Other significant differences were found with respect to the motivational ques-tion list. The group which indicated not to consider advisory features important in a mobile application, also indicated to be relatively less motivated by learning

(22)

new skills. Similarly this group is the least motivated by social contacts which corresponds to their low interest in sharing feedback with others. Each of the previously mentioned characteristics are in contrast with the characteristics of the group which considered all proposed features as the most important. This group consists of a relatively high percentage of females, the highest percentage of participants of the 6.4km and had the lowest average speed on both distances. The lower average performance may be not necessarily be correlation with the found clusters on the app functionalities, but rather be a consequence of the difference in gender ratios. The same applies to the significant differences which were found considering distance run, average weight and average length. Vari-ables which were found not to be significantly different for each cluster include training frequency, type of app usage and eight of the questions about self per-ceived health.

5.1 Future work

The results of this thesis suggest a correlation between the running experience of the participants and their interest in advisory app features. There was no specific variable assigned to running experience, however it can be interpreted as a combination of the performance of the participants, their interest in learning new skills and the questions about how they view themselves. In a following study it might be interesting to compare the sports app preferences of various groups with different levels of running experience. This would be an interesting variable which can be used in practice to personalize a sports app.

6 References

[Anderson2001] Anderson, M. J. (2001). A new method for non-parametric multivari-ate analysis of variance. Austral ecology, 26(1):32–46.

[Chakravarty et al.2012] Chakravarty, E. F., Hubert, H. B., Krishnan, E., Bruce, B. B., Lingala, V. B., and Fries, J. F. (2012). Lifestyle risk factors predict disability and death in healthy aging adults. The American journal of medicine, 125(2):190– 197.

[Church et al.2005] Church, T. S., LaMonte, M. J., Barlow, C. E., and Blair, S. N. (2005). Cardiorespiratory fitness and body mass index as predictors of cardiovas-cular disease mortality among men with diabetes. Archives of internal medicine, 165(18):2114–2120.

[Costello and Osborne2005] Costello, A. B. and Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical assessment, research & evaluation, 10(7):1–9.

[Dallinga et al.2016] Dallinga, J., Janssen, M., van der Bie, J., Nibbeling, N., Kr¨ose, B., Goudsmit, J., Megens, C., Baart de la Faille-Deutekom, M., and Vos, S. (2016). De rol van innovatieve technologie in het stimuleren van sport en bewegen in de steden Amsterdam en Eindhoven. Vrijetijdstudies, 34(2):43–57.

(23)

[Dallinga et al.2015] Dallinga, J. M., Mennes, M., Alpay, L., Bijwaard, H., and de la Faille-Deutekom, M. B. (2015). App use, physical activity and healthy lifestyle: a cross sectional study. BMC public health, 15(1):833.

[Ehlers and Huberty2014] Ehlers, D. K. and Huberty, J. L. (2014). Middle-aged women’s preferred theory-based features in mobile physical activity applications. Journal of Physical Activity and Health, 11(7):1379–1385.

[Fisher1922] Fisher, R. A. (1922). On the interpretation of χ 2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1):87– 94.

[Glynn et al.2014] Glynn, L. G., Hayes, P. S., Casey, M., Glynn, F., Alvarez-Iglesias, A., Newell, J., ´OLaighin, G., Heaney, D., O’Donnell, M., and Murphy, A. W. (2014). Effectiveness of a smartphone application to promote physical activity in primary care: the smart move randomised controlled trial. Br J Gen Pract, 64(624):e384– e391.

[Heerink et al.2010] Heerink, M. et al. (2010). Assessing acceptance of assistive social robots by aging adults. PhD thesis, Universiteit van Amsterdam [Host].

[Mann and Whitney1947] Mann, H. B. and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pages 50–60.

[Middelweerd et al.2015] Middelweerd, A., van der Laan, D. M., van Stralen, M. M., Mollee, J. S., Stuij, M., te Velde, S. J., and Brug, J. (2015). What features do dutch university students prefer in a smartphone application for promotion of physical activity? a qualitative approach. International Journal of Behavioral Nutrition and Physical Activity, 12(1):31.

[Organization2009] Organization, W. H. (2009). Global health risks: mortality and burden of disease attributable to selected major risks. World Health Organization. [Rabin and Bock2011] Rabin, C. and Bock, B. (2011). Desired features of smartphone

applications promoting physical activity. Telemedicine and e-Health, 17(10):801– 803.

[Rousseeuw1987] Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the inter-pretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65.

[Stephens and Allen2013] Stephens, J. and Allen, J. (2013). Mobile phone interven-tions to increase physical activity and reduce weight: a systematic review. The Journal of cardiovascular nursing, 28(4):320.

[Thorndike1953] Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18(4):267–276.

[Wilson2015] Wilson, S. E. (2015). Methods for clustering data with missing val-ues. url: https://www.math.leidenuniv.nl/scripties/MasterWilson.pdf (visited on 30/06/2017).

[Wold et al.1987] Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52.

(24)

7 Appendices

Appendix A.1

How important do you consider the following (possible) functionalities for an app before your training?

1. Motivation to start training 2. Route tips

3. Training tips

4. To be part of a community/group 5. Training schedule

How important do you consider the following (possible) functionalities for an app during your training?

6. To keep track of speed

7. To keep track of energy usage (calories burned) 8. To keep track of heart rate

9. Music while running

10. To follow a training schedule 11. Tips regarding running technique 12. Tips regarding training variance 13. Tips regarding layout of training 14. Motivation to keep up

15. Recommendation to adjust speed

How important do you consider the following (possible) functionalities for an app after your training?

11. To keep track of own progression 12. To keep track of personal records 13. To review running route

14. To Share activities with others 15. Feedback on performance

Appendix A.2

Please indicate the extent to which the following aspects are of importance for your motivation to run

1. Physical exercise

2. To improve physical health 3. To improve mental health 4. To improve appearance/shape 5. To lose weight

6. To build up endurance

7. Competition, to compare with others 8. Performance, to improve yourself 9. To learn new skills

10. Fun activity

11. Relaxation against the stress of daily life 12. A social activity

(25)

13. Support from social contacts 14. To be outdoors

15. Little/no travel time necessary 16. Not expensive

17. Prestige/status

Appendix A.3

Can you please indicate which of the following statements are applicable to you?

1. I eat healthy 2. I feel energetic

3. I know sports are not for me

4. There is a good chance that I will keep exercising 5. I have a good feeling about myself

6. I see myself as a sports person 7. I have an unhealthy lifestyle

8. I have convinced others in my area to start exercising 9. I have a healthy weight

10. I often feel tired

11. Sometimes I wonder if my body is right for the sports I participate in 12. I have health issues and I wonder if I can keep exercising

(26)

Appendix B.1

Question nr. Cluster 1 Cluster 2 Cluster 3

1 1.6026 2.1719 2.7440 2 1.6410 2.3037 2.7343 3 1.6859 2.5645 3.2174 4 1.6474 1.9685 2.1546 5 1.8526 2.7708 3.4010 6 3.3910 3.5387 3.7536 7 2.2372 2.4756 3.0000 8 2.1667 2.5043 3.0628 9 2.3077 2.8940 3.3913 10 1.9038 2.6590 3.4155 11 1.5769 2.3782 3.2029 12 1.6731 2.5072 3.2319 13 1.6859 2.6332 3.3140 14 1.6987 2.3754 3.2029 15 1.8333 2.6275 3.2802 16 3.2436 3.3926 3.6667 17 3.0192 3.1519 3.5411 18 3.2244 3.2350 3.5217 19 1.9423 2.2178 2.4444 20 1.8782 2.4556 2.9614

Table 9: Exact values for clusters found on answers to app app functionality questions.

(27)

Appendix B.2

1 3.5094 3.5580 3.6753 2 3.4104 3.4730 3.5773 3 3.1934 3.3351 3.4611 4 2.9198 2.9431 3.1804 5 2.4905 2.5978 2.8446 6 3.3868 3.4205 3.5907 7 2.1981 2.2534 2.4021 8 3.0425 3.1051 3.2680 9 2.1232 2.4394 2.7409 10 3.0991 3.1995 3.3613 11 3.1185 3.2453 3.3557 12 2.3175 2.4973 2.6804 13 2.1327 2.3962 2.6736 14 3.1981 3.2033 3.3866 15 2.7925 2.9380 3.0670 16 2.5425 2.6775 2.9381 17 1.6274 1.7636 1.8454

Table 10: Exact values for clusters found on answers to app motivational ques-tions.

Appendix B.3

1 0.9494 1.0393 0.9345 2 1.1164 1.2051 1.1694 3 3.7342 3.6113 3.5804 4 0.3480 0.4775 0.5273 5 0.7468 0.9944 1.0545 6 1.0253 1.2845 1.3841 7 3.1646 3.2225 3.1855 8 1.6329 1.5927 1.4255 9 1.1519 1.2451 1.3236 10 2.4937 2.4101 2.2914 11 3.0886 2.9326 2.7045 12 3.6521 3.5730 3.3964

Table 11: Exact values for clusters found on answers to app self perceived health questions.

Advanced data analysis for the design of a personalized sports app