Raise Driver: An adaptive exergame

(1)

UNIVERSITY OF AMSTERDAM

Raise Driver: An adaptive exergame

by

Je↵rey Michael Bruijntjes

A thesis submitted in partial fulfillment for the

degree of Master of Science

in the

Faculty of Physics, Math and Informatics (FWNI)

UvA, Amsterdam

(2)

Raise Driver: An adaptive exergame

Improving adaptation by taking into account a player its fitness level

Jeffrey Michael Bruijntjes

Science Park 402 Amsterdam, The Netherlands

jeffrey.bruijntjes@gmail.com

ABSTRACT

Games have the potential to increase the motivation of play-ers to perform certain tasks like exercising. The intensity of an exercise is highly dependent on the player that performs is. In this paper we investigate a method for adapting the intensity of an exercise dynamically to create a uniform exer-cising performance and experience for all players. In order to achieve this we determine the fitness level of a player based on his body fat percentage. We created a cycling game, which we named ’Raise Driver’, in which the player his heart rate had to follow a certain predefined pattern. In our experiment we measure the player his performance as well as the perceived di↵erence between a version of our game that takes into account the player his fitness level and another version that does not. We found that more players liked the adaptive game better than the static game. No significant di↵erence was found in the performance between the two conditions.

Categories and Subject Descriptors

K.8.0 [Personal Computing]: General—Games

Keywords

Exergames, adaptive framework, bike game, exertion user interface, serious games

1. INTRODUCTION

Games are inherently a feedback-loop. A player performs an action (or lack of it) and the system responds according tot the game rules. This feedback system is a big part of why people play games, as they get rewarded for what the game considers good behaviour in a very direct way. It is also a very ’safe’ environment for performing badly as games allow you to perform certain tasks again if you have previ-ously failed them (by usage of save games, checkpoints et cetera). These two aspects have the potential to create a very motivating environment for performing the tasks the game requires of the player.

With this intended motivational influence games have evolved to a point where entertainment is no longer their only func-tion. Early games like Pong and Spacewar! served a mere entertainment purpose. However as games evolved so did their application(s), going from entertainment to other areas like learning and assessments. These games with purposes other than entertainment are also called serious games. Serious games can be used to motivate users in tasks that are often regarded as work or chores. An example of which is using games for learning. In their paper, Baur et al. [3] created a game for collecting the corpus of a child learning to speak a foreign language. They created a web-based game in which German-speaking children learned to speak English. The game uses speech recognition to give users an interactive learning experience. The children were motivated to learn because they were able to obtain scores and badges for their performances. Another example of the application of serious games comes from Brezinka with the game Treasure Hunt[5]. Treasure Hunt is a psycho-therapeutic computer game, de-veloped for eight to twelve year old children who are part of some sort of cognitive-behavioural treatment. The game en-ables the children to rehearse the concepts they learn during therapy at home.

1.1 Exergames

From these examples one could make the case that serious games can be used to increase motivation for users doing certain tasks. Exercise is an excelent example of a task that could benefit from increased motivation. The Centers for Disease Control and Prevention reported a 34.9 percent of the American population who were obese in 2014, which translates to roughly 78.6 million people. These numbers are a reminder that a significant amount of people have un-healthy lifestyles. The cause of which can be over-eating, eating unhealthy food and/or exercise too little. Exercis-ing, as one of the aspects, is a task many people often lack motivation for. It can, however, be made more appealing if users have more fun while exercising, for instance by the aid of a serious game. Over the last few years game manufac-turers have made attempts to use serious games to motivate people to exercise more. Nintendo1 _{released their}

motion-controlled gaming system named the Wii in 2006[2]. To play games with the Wii, players use controller that re-quires movements that are much more physically demanding than traditional gaming controllers like a joypad. In 2010

(3)

Microsoft2 released their motion-capturing camera dubbed Kinect[1]. The Kinect allows players to use their whole body as a controller, which too is more physically demanding than traditional controllers. Serious games that use a so called ex-ertion interface as the main input for the game are referred to as exercise games or exergames. According to Vossen an exergame is a game where:

”Physical activity must actually influence the game outcome by either omission or commission” – Vossen, 2004 [13]

1.2 Adaptive games

Even though serious games might have a positive e↵ect on the motivation of the player there is no guarantee that this e↵ect takes place. The type of game, for example, might not match with the player his preferred type. It might also be that the game is not challenging the player enough. For a player to be sufficiently challenged by a game (without over-challenging a player) it is import the player is kept in a state of flow. Flow-theory states that in order to keep a player engaged he needs to be in a state of flow [6]. This is a state where a player his skill level is equal to the challenge that is presented to him. While in this state of flow, the player is presented with enough challenge to improve his skills, without over- or under-challenging him, which would be demotivating. The progression of challenge can easily be observed in classic role-playing video games. In these games the player starts out with a very limited set of skills and abilities while facing easy challenges to overcome enemies. While progressing through the game the player obtains more skills and abilities which are needed to overcome increasingly more challenging enemies.

One problem with the progression as presented in the last paragraph is that the game progression is the same for all players. This means that a player with a higher level of skill has to plow through the start of the game while being under-challenged, which is likely demotivating. If a player is not motivated enough to play a game he will most likely not continue playing and might not return to the game at a future point in time at all. If we take into account certain aspects of the player, like his skill level, past actions in the game or demographic information, we can create a dynamic set of game rules specific to this player, making it easier to keep the player in a state of flow. Games that takes into account aspects specific to its player are called adaptive games. We can achieve adaptation of the game dynamics by adapting the level of difficulty to match the skill level the player has at that time.

1.3 Adaptive exergames

Looking back at the aforementioned exergames, it is clear that exergames can also expose players to situations where the player is not in a state of flow. In such situations the exercise presented could either be too challenging or not challenging enough. To illustrate the e↵ect of a static ex-ergame, let us create a hypothetical exergame which uses adaptation. Regardless of which player plays the game the performance and experience of the player remains constant.

2_{http://www.microsoft.com/}

Figure 1: Performances in a hypothetical generic exergame

Adam Bob Carl Target

Results of an hypothetical exergame that does not use player spe-cific adaptation. With the same instructions these three players get di↵erent result because their skill level is not taken into ac-count. If we apply adaptation we can reduce the slope of Adam by making the game less challenging and increase the slope of Carl by making the game more challenging.

Our game would base the difficulty level based on the player its heart-rate performance, as a heart-rate is inherently a personal factor. A common rule-of-thumb is that a person can have a heart rate between his resting heart rate at the minimum and a heart rate of 220 minus the person his age at maximum. Within this range we create a pattern that each user must follow exactly. If we did this the intensity for both players would be exactly the same theoretically. We make this statement under the assumption that if one person has an exertion level of X percent and he has a heart rate of Y percent that this holds for everybody else. One nuance to be made in the previous statement is that the player has to fol-low the pattern completely for the intensity to be identical. In order to achieve this we must increase the intensity of the exercise when the pattern is rising and we must lower the intensity when the pattern is descending. Since the human heart rate is not constant it is possible that the heart rate rises or drops beyond the target heart rate. In this case we need to adjust the intensity of the training to compensate for the di↵erence.

The problem with the hypothetical game from the previous paragraph is that the adaptation of the training intensity is the same for everybody. Imagine that this game that a five kilogram dumbbell that has to be lifted at a certain pace in order to raise the heart rate. There are three players; a man named Adam, a man named Bob and a man named Carl. Adam is a very unfit person who rarely exercises, Bob is an average person who works out on occasion and Carl is very fit as he works out on a daily basis. All three of them perform the exercise at a medium pace (as instructed by the game), with a starting heart rate of 75 beats per minute (bpm) and a target of 85 bpm. Adam puts in lots of his energy to lift the dumbbell which results in a heart rate of

(4)

90 bpm. Bob does the same thing and his heart rate goes up to 85 bpm but as he is more used to exercising the increase is not as much as one would see with Adam. Finally, Carl does the same exercise but his heart rate only goes up to 80 bpm. In this case, Adam his heart rate is 5 bpm over, Bob his heart rate is equal to the desired heart rate and Carl’s heart rate is 5 bpm below. For a graphical representation of this scenario see figure 1. If we had adapted the game to take into account the fitness level of Adam, Bob and Carl we might have gotten better results. For Adam we would have set a lower pace, making sure his heart rate did not rise as fast as it did and for Carl we would have set a higher pace such that his heart rate would rise more quickly.

1.4 Research question

In this paper we focus our research on creating exergames that adapt the intensity of an exercise dynamically based on a player his personal fitness level in order to have an identical training performance and experience for all players. To investigate this we try to answer the following question:

1. Can we create an adaptive exergame for which the per-formance of all players is the same?

2. What di↵erence do players notice between a generic and an adaptive exergame?

2. RELATED WORK

Serious games, as discussed in section 1, as well as exergames could have positive e↵ects on the motivation of players to do the exercise(s). However, within studies that researched these exergames di↵erent types of exercises were performed, using di↵erent hardware and for di↵erent purposes. For in-stance, Kn¨uwer in 2014 did a systematic review of scientific work in which he aimed to summarize what has been done in the field of using the Nintendo Wii for rehabilitation in-terventions for stroke patients [9]. Also in 2014, Bolten et al researched the increase of immersion in virtual reality us-ing novel natural game input devices like a cyclus-ing bike, an Oculus Rift and a Microsoft Kinect [4]. In 2013, Witkowski did research on how the GPS-enabled game Zombie, Run! a↵ected the quality of the running experience [14]. In 2012, Garcia et al explored the use of a Microsoft Kinect-based system to train elderly in fall-prevention [7].

Adaptation techniques that dynamically adapt games to the player are also used for various purposes. In 2007, Togelius et al. used evolutionary algorithms to create personalised tracks for a racing game[12]. By creating models of the player his driving style and combining them with definitions of when a track is fun they tried to maximise the entertain-ment value of the game for a particular player. In 2010, Kazmi et al. investigated using the player his action to dy-namically adapt the gameplay of first person shooters[8]. By monitoring the player his in-game actions the strategies of the non-player characters were dynamically adapted to fit the player its playing-style.

There are some researches which also use exercise bikes to research adaptive exergames. In their paper, Silva et al. created an adaptive game-based exercising framework [10]. Their game uses an exercise bike as an exertion interface to

achieve a heart rate within a certain target zone. The target zone is a fixed zone between a minimum and maximum heart rate for which the extremes are dependent on the age of the player. If the heart rate of a player comes above or below the target zone the game adapts both in-game dynamics such as the e↵ect the RPM has and out-game dynamics such as the resistance on the exercise bike.

Another example is the work done by Sinclair et al. whom created an exercise-bike based exergame in which the player had to follow a fixed heart-rate-based training program [11]. In order to estimate the target RPM of the exercise bike they simulated games being played using software. In this game, whenever the player needed to in- or decrease his exertion level the in-game dynamics were altered to in- or decrease the e↵ectiveness of the RPM of the player.

2.1 Summary of contribution

In our research we create an adaptive exergame in which players have to follow a predefined heart-rate-based train-ing program. The adaptation strategy takes into account an estimation of the player his fitness level to make the perfor-mance and experience as identical as possible among players (see figure 2 for a graphical representation). Adaptation is done based on the current heart rate of the player in refer-ence to the target heart rate, making it applicable for any exertion based interface. Adaptation of the game is done only in software. Additionally, we research some aspects of to what extend we can enable designers to make such an ex-ercise program. Finally, we evaluate which game the players found more challenging and enjoyable.

3. METHOD

In this research we evaluate whether incorporating an in-dication of the user his physical fitness level could improve accuracy in achieving a predefined target heart rate pattern compared to when we omit this factor. To investigate this hypothesis, we test users with di↵erent fitness levels using a game we build, called ’Raise Driver’.

Raise Driver is an exergame in which an exercise bike is the input to the game. It is the player his goal to drive as far as possible in the game within a six minute time-frame. Each meter the player drives gives him points to reach the high score. Additional points can be gained by collecting coins the player encounters along the way. The player also encounters roadblocks which will diminish the player his cur-rent score. In the game, in the top-right corner, there is a ’speed-indicator’ which consists of five bars. These bars in-dicate at what degree the player should lower or higher his exertion intensity. Figure 3 shows the speed-indicator. A program was created for the six minutes the players had to play. This programs tries to adjust the players his heart rate to the target heart rate each second of the game. Dur-ing the game the players their heart rate was measured and provided as input to the game. Participants had to perform the same program twice, once for the control condition and once for the experimental condition. In the control condi-tion we used static, evenly spreading values for the limits in which the players their heart rate could drift from the target heart rate within the speed indicator. In the experimental condition we shifted the limits up for more physically fit

(5)

Figure 2: Representation of our goal

An adaptive system in which, by taking into account a player his fitness level, we can achieve identical results for every player.

players and down for less physically fit players. E↵ectively, less fit players would be better guided towards the line from the lower end, as we assume these players their heart rates increases faster than less fit players. On the other side the fit players would be driven faster to the line and then guided more from the top side as these players heart rates go up less fast but down faster than less fit players. We used the ’aver-age fitness level’ category as a baseline, resulting in identical training for this category in the control condition and the experimental condition.

3.1 System overview

3.1.1 Hardware

Our aim was to use a minimal set of hardware so that the research would better reflect the current commercial possi-bilities. For this reason we chose to use the following con-sumer electronics: A Wii cyberbike (magnetic edition) with a controller-to-USB adapter, a Polar H7 heart rate monitor, a Macbook Pro and a consumer grade weight-scale. In order to standardise our method we put the resistance of the bike at its maximal level for all participants. The MacBook Pro was sufficiently powerful to play the game without observable lag. An overview of the components used when playing the game may be found in figure 4.

3.2 Software

Our aim was to use a minimal set of software so that the re-search would better reflect the current commercial

possibil-Figure 3: Speed indicator in Raise Driver

Indicates to the player if his current heart rate is to high, optimal, or too low. The center represents a player exhibiting the optimal heart rate. The bars above the center represent a heart rate that is too high and the ones below the center represent a heart rate that is too low.

Figure 4: Overview of system components

Indicates to the player if its current heart rate is to high, correct, or too low. The center represents a player exhibiting the correct heart rate. The bars above the center represent a heart rate that is too high and the ones below the center represent a heart rate that is too low.

ities. For this reason we chose to use the following software: to create the application we used Unity game engine and IDE (Integrated Development Environment), version 4.3.4f1 for MacOS X, created by Unity Technologies. For reading the heart rate from the Polar H7 we used the sample project ”CoreBluetooth: Heart Rate Monitor”3, created by Apple

4. INTERMEDIATE FINDINGS

In total, three iterations of the application were made. We found some significant findings in the first two iterations that gave us reason to create new iterations and restart testing. Within every iteration a new set of respondents were used to perform the experiment in order to avoid any bias about the experiment. The findings for the for first two iterations of the application, which we will from now forward refer to as exploratory iteration and verification iteration, will be discussed in the following sections.

4.1 Exploratory iteration

In the first iteration of the application we created a random program for the participants. The data for this program was

3_{https://developer.apple.com/library/mac/samplecode/He}

(6)

based on a randomly generated polynomial using the on-line polynomial generator from marthportal.org4 _{and looks}

as follows:

30 + 8x + 18x2 x3 8x4

After this the data was normalised. We then modified this data by multiplying all values by 0.75. This was done be-cause we hypothesised that a target heart rate of a theoreti-cal hundred percent was not likely to be attained in practice, or at least not within the six minute span the participates would perform the experiment.

With the state of the program so far we did some initial tests ourselves. Two things that were noticeable were the slope of the first part of the graph (from the start up until the eighty-first second, where the top of the parabola is) and the second part of the graph (from roughly the 300th second until the end in which the graph has a rapid decrease) was to steep. None of the attempts we made were we able to come close to these trends. For this reason we decided to stretch out the data to double its size. This e↵ectively divided the slope by two for each second, making the required in- and de-crease in heart rate more gradual. Additionally, this meant that each trial test was expanded from six minutes to twelve minutes. With these changes completed we performed some more tests ourselves and were able to approximate the over-all trends in the graph. After this observation we decided to proceed by letting respondents partake in the experiment.

4.1.1 Exploratory iteration - Participants

For the experiments of iteration one we had a total of three participants. No participants had prior knowledge of the project. They were aged twenty, twenty-three and forthy-one. One of the participants had astma and one smoked half a pack of cigarettes per day, both aspects which could influ-ence optimal sport performance. However, during the test, none of them showed signs of obstruction by this, before, during or after the test. All participants were tested on the same day in the same environment. They all self-reported to play games regularly and all of them reported to work out at least once in a while.

4.1.2 Exploratory iteration - Results

While observing the experiment it was clear that the partic-ipants were not able to follow the first part of the program (from the first to the eightieth second) as desired. The re-ported reason for this was that the speed indicator could not clearly communicate how far o↵ their target speeds they were. Looking at their actual performance we see that they continuously increased their speed by a large amounts but not as much as the program desired. This is verified when we look at figure 5, where the data from all participants is combined. As can be seen in the figure, on average the par-ticipants were not able to follow the target hart rate trend line at the beginning of the program. This part of the pro-gram required a more explosive physical activity than the novel players felt it did.

4

http://www.mathportal.org/calculators/polynomials-solvers/polynomial-factoring-calculator.php

Figure 5: Performance of participants during the ex-ploratory iteration 0 200 400 600 0.0 0.2 0.4 0.6 0.8 1.0

Game time (in seconds)

Hear t r ate (in % − nor maliz ed) Control Experiment Target

We can also see a mismatch in the desired performance and the actual performance at the end of the program, where the target heart rate has a more rapid decrease than other parts of the program. At this point in the program all par-ticipants simply stopped exercising and sat completely still while watching the program run out. Still, when we look at the data in 5, we see that the last part of this decline was not achieved by the participants. We conclude this is the case because the desired heart rate decreased faster than what, on average for our participants, was physically possible to do. Additional research is needed into what the maximum slope for the in- and decrease in target heart rate is.

4.2 Verification iteration

After finding that the slope of the program was to high, both at the beginning and at the end, we decided to create a new iteration of the program. Our main goal in this iter-ation was to reduce the values of the slopes so participants could more easily follow the desired heart rate trend line. In order to do this we made the program one-forth easier, by multiplying every target heart rate with 0.75. The di↵er-ence in programs, with relation to the first iteration, will be made clear in figure 6. Before we started testing with the verification iteration of the program we first pre-tested the program ourselves and were able to approximate the desired heart line and thus saw no reason participants would not be able to do the same.

4.2.1 Verification iteration - Participants

For the experiments of iteration two we had two partici-pants. The participants had no prior introduction to the project. The participants were twenty-one and twenty-three years old. The participants had no illness, habit or other factor that could influence their sport performance. The participants self-reported to play games regularly and work out at a regular basis.

(7)

Figure 6: Performance of participants during the ex-ploratory iteration 0 200 400 600 0.0 0.2 0.4 0.6 0.8 1.0

Hear t r ate (in % − nor maliz ed) Exploratory iteration Verification iteration

4.2.2 Verification iteration - Results

Unfortunately, the data of the experimental condition of one of the two participants got corrupted. Therefore, we will only be using the data of one participant to present our find-ings of iteration two. However, both participants showed the same observable behaviour during the experiment and had similar results in the control condition of the experiment. The focus was on testing the verification iteration of the program on the same problems we encountered in the ploratory iteration. We found similar results as in the ex-ploratory exploration; at the beginning of the program the slope increased faster than the novel players felt it did. Also, at the end of the program, the slope decreased faster than the heart rate could go down, even though the participants completely stopped exercising. The participants did seem to come closer to the target trend line than in the exploratory iteration as maybe seen in our plotted data in figure 7. The graph of the exploratory and verification iteration showed similar patterns. From the players their heart rate relative to the desired trend line we hypothesised that the prob-lem was not the absolute rise of the target heart rate but rather the rise of the target heart rate relative to the pre-vious rise. To test this theory we compared the results of the exploratory iteration to the verification iteration. The graphical output of the comparison may be seen in figure 8. To ease comparison we also drew the target heart rates of the exploratory iteration of the program. As may be seen in the figure, the participants of the exploratory iteration of the program approximated the first part of the verification iteration of the program even though this required a faster rise in exertion of these players than the players of the ac-tual verification iteration exhibited. From this we draw the conclusion that the speed- indication of the target heart rate was not able to convey to player the required information when the relative slope of the program is above a certain

Figure 7: Performance of participants during the verification iteration 0 200 400 600 0.0 0.2 0.4 0.6 0.8 1.0

threshold. Additional research is needed into how the slope height influences players drifting from the target heart rate pattern.

4.3 Conclusions of intermediate studies

We have exposed a problem within our application which gives us insights into the limits of our system, the impor-tance of these limits within our current research and we proposed a direction of research for both of these problems. The problem is that the slope of the target heart rate should be higher or lower than certain amounts. More research is need into what the ranges of these amounts are. Knowing the values helps us to better understand the limits of our ap-plication and what kind of programs developers can create using this or a similar system.

5. EXPERIMENT

In the following sections we discuss the experiments con-ducted. We will go into the actual performance between the control condition and experimental condition as well as the perceived performance.

5.1 Participants

For the experiments of the third and final iteration we had a total number of eighteen participants. None of the par-ticipants had prior experience with the project. The ages ranged from nineteen to eight with a median of twenty-two and a mean of 22.61. With the exception of on partic-ipant, all participants were male. None of the participants reported any illness, habit or other factor that could influ-ence their sport performance, in general and at the time being. During the approximate two weeks that the experi-ments were conducted the participants performed the exper-iment within the same environment. All participants self-reported to play games on a regular basis. Each participant engaged in physical activity at least once a week.

(8)

Figure 8: Comparison between exploratory and verification iteration 0 200 400 600 0.0 0.2 0.4 0.6 0.8 1.0

Hear t r ate (in % − nor maliz ed) Control (exploratory) Experiment (exploratory) Target (verification)

5.2 Experimental setup

Using the game Raise Driver (as described in section 3) we conducted our experiment in a climate-controlled office in Amsterdam, The Netherlands. The participants were tested between 10 AM and 13 PM with a maximum of five partici-pants per day. Participartici-pants were seated on the exercise bike in front of the television screen while their resting hart rate was measured for two minutes, which was used as a baseline for their trial. Next, the participants had to perform both trials, giving them fifteen minutes of rest in between. The order of the conditions was pseudo-randomly assigned by alternating between control condition first and experimental condition first. After both conditions the players given five minutes of rest before filling in the questionnaire. See figure 9 for an impression of the game in practice.

5.3 Results

5.3.1 User performance

To evaluate the participant scores we added the results of participants within the same category of fitness level to-gether and divided them by the amount of participants within that category. We firstly wanted to investigate the di↵er-ences between the di↵erent categories within the control and experimental condition separately. As may be seen in figure 10 both the control and the experimental condition seem to follow the desired heart rate pattern. However, from these graphs no visual significant di↵erences can be seen between the two conditions.

We have examined the data more closely by looking at the mean and median of the error and the natural-value error of both the control and the experimental conditions. From the table in figure 1 we see that the results of all categories in all conditions are very similar. The averages have a maximum di↵erence of 0.01 between the control and the experimental condition. Between the median of the control and the ex-perimental condition there is a maximum di↵erence of 0.02.

Figure 9: A participant playing Raise Driver

Raise Driver being tested in practice. The setup used in the figure is representative for all participants.

Table 1: Average and mean error for all fitness level cate-gories for all participants

## [1] "Error"

## fitness_index ae_1 ae_2 me_1 me_2 ## 1 Basic -0.01 -0.02 -0.01 -0.03 ## 2 Average -0.01 0.00 -0.01 0.01 ## 3 Advanced -0.01 -0.01 0.00 0.01 ## [1] "Natural value error"

## fitness_index ae_1 ae_2 me_1 me_2 ## 1 Basic 0.06 0.07 0.05 0.07 ## 2 Average 0.05 0.06 0.05 0.05 ## 3 Advanced 0.06 0.06 0.06 0.05

Overall we see that the participants are able to follow the pattern with an approximate five to six percent error mar-gin.

To make a better comparison between the control and the experimental condition we need to investigate if there is a significant di↵erence among the categories within each con-dition. We perform a one way ANOVA for both the control and the experimental condition with the error as the re-sponse variable and the fitness index category as the factor. The results may be seen in table 2. For the table we see that both p-values of the ANOVA tests are significant, meaning the true means of the categories are not equal. This means the performance of the participants di↵ers among the di↵er-ent fitness index categories. When we compare the p-value of the control condition with the p-value of the experimental condition we see that the value for the experimental condi-tion is much higher than that of the control condicondi-tion. This indicates the mean values of the experimental condition are closer than those of the control condition. The means are, however, still significantly di↵erent among the categories of the experimental condition.

(9)

Figure 10: Comparison between control and experimental condition for all categories

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 100 200 300 400 0.0 0.4 0.8 Control condition

Hear t r ate (in % − nor maliz ed) Basic FL Average FL Advanced FL Target ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 100 200 300 400 0.0 0.4 0.8 Experimental condition

Hear t r ate (in % − nor maliz ed) Basic FL Average FL Advanced FL Target

Note: we abbreviated ’fitness level’ to ’FL’ in the legends.

Table 2: Results of ANOVA tests among fitness level cate-gories for both conditions

## [1] "P-value one way ANOVA control cond." ## [1] 3.65047e-15

## [1] "P-value one way ANOVA experimental cond." ## [1] 0.009600677

To investigate potential di↵erences between the two condi-tions we performed a t-test for the error of each pair of data from the control and experimental condition within each cat-egory. Since the control and experimental experiments were conducted with a within-subject design the t-test was paired. As can be seen from table 3 the di↵erence between the mean of the error of the control condition and the mean of the error of the experimental condition di↵ers significantly. This significance is in line with our hypothesis for the category basic and advanced. However, participants of the category average were given the same test in the control condition and the experimental condition. In an error-less test the p-value for this test should not be significantly di↵erent, as the test themselves are the same.

If we look at the mean of the di↵erences in the third column we see that the mean error was higher for the control to the experimental condition for the first category. Participants in this category scored worse in the control condition than in the experimental condition. For the third category the opposite applies.

5.3.2 Results of user performance

Table 3: Results of ANOVA tests between the control and experimental condition for all fitness level categories

## fitness_index p_value motd ## 1 Basic 1.475320e-15 0.014101366 ## 2 Average 1.320118e-05 -0.007776677 ## 3 Advanced 5.603117e-01 -0.001055566

motd stands for mean of the di↵erences

Using statistical measures we have seen that, looking at their error with respect to the training target, the player per-formances were significantly di↵erent among the groups, in both the control and the experimental condition. We also saw that, for each group, the performances were significantly di↵erent between the control and experimental condition. Since one of the groups received an identical training in the control and experimental condition, whilst the significance exists, we conclude that our measure was not error free. For participants from the ’basic fitness level’ category we saw a decrease in error, for both the ’average fitness level’ and the ’advanced fitness level’ category we was an increase in error.

5.3.3 User experience

After performing the experiment in both the control and the experimental condition each participant was asked to fill in a questionnaire. The main goals of this questionnaire was to investigate if participants noticed any di↵erence in the two conditions. All questionnaires were conducted through Google Forms5_{. The questionnaire consisted of these}

ques-tions:

1. Did you notice a di↵erence in the two rounds? 2. Which of the two rounds did you prefer? 3. Why did you prefer this one?

4. Which of the two rounds did you find more challeng-ing?

5. In what way did you find one more challenging?

After all experiments the answers were coupled to the data of the participants to find corresponding orders of testing (con-trol condition before experimental condition or vise verse). All answers were put into a spreadsheet, grouping them by order of testing. From this overview we tallied the amount of participants that did

• Not notice any di↵erence (type 1)

• Noticed a di↵erence and preferred the control condi-tion (type 2)

• Noticed a di↵erence and preferred the experimental condition (type 3)

(10)

Table 4: Distribution of participants who did not notice a di↵erence between the control and experimental condition

Below average average above average

2 4 1

Table 5: Attitudes towards the experimental condition positive negative

true 5 3

false 4 1

An attitude that is based on an answer that were not relevant to our research question were is labeled false, otherwise it is labeled true.

There were a total five participants that did not notice any di↵erence between the rounds (type 1). For the average category this is to be expected, as the control condition and the experimental condition are the same in this category. The distribution of the participants can be seen in figure 4. Answers that fell into the second type were labeled positive and answers that fell into the third type were labeled nega-tive. However, not all answers contributed to our hypothesis in either a positive or a negative way. An example of which may be seen in the following quote:

”I preferred the first time because firstly, I was less fatigued and secondly when you do something the first time it is always more exciting”

We added a second label to all answers. Answers that did not contribute to our hypothesis in any way were labeled false and the remaining answers were labeled true. Addi-tionally, there was one participant from the average cate-gory that was labeled positive. However, due to the fact that the control condition and the experimental condition are the same in this category we labeled the participant as false as well. The final outcome of the questionnaire can be seen in table 5. As may be seen from the table the majority of the participants preferred the experimental condition. All of them, excluding one, had this preference because this con-dition was perceived less challenging. The heart rate results from these participants may be seen in figure 11. As may be seen from the figure the performance from the control con-dition closely matches the performance of the experimental condition. We take a look at the summary of both condi-tions in table 6. As one can see the error values are less for the experimental condition than for the control condition. However, the di↵erence does not appear to be significant. To verify this we performed a repeated measures ANOVA. The resulting p-value was 0.305, indicating that the errors of the two conditions for these participants were not signifi-cantly di↵erent.

We hypothesize that there are three possible reasons for this positive attitude towards the experimental condition:

Figure 11: Results of participants that had a true positive attitude towards the experimental condition

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 1.0

Table 6: Summary of participants that had a true positive attitude towards the experimental condition

## [1] "Control Condition - Error"

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.01458 0.04142 0.05692 0.05899 0.07797 0.11230 ## [1] "Experimental Condition - Error"

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.008401 0.035830 0.047790 0.050470 0.065190 0.096810

1. The first reason could be that the participant simply performed better in this condition. For this reason the median error, and possibly the mean error, would be lower in the experimental condition than in the control condition.

2. The second reason could be that the participant had similar or a slightly worse error but the error conve-nient for the player his training type. For instance, if a person with a basic fitness level has an arbitrary error of -0.3, this error might result in the command cycle much harder in the control condition and cycle slightly harder in the experimental condition.

3. The third reason could be that the participant had similar or a slightly worse error but the error not con-venient for the player his training type. For instance, if a person with an advanced fitness level has an arbitrary error of -0.3, this error might result in the command cycle slightly harder in the control condition and cycle much harder in the experimental condition. In the this case the user would would be challenged more, which might be a preference for that player.

We take a look at the mean and median error values for the true positive participants. In the following table we

(11)

abbrevi-Table 7: Mean and average for all participants that had a true positive attitude towards the experimental condition

## name fi ae_1 ae_2 me_1 me_2 ## 1 R.O. 3 -0.03 0.00 -0.03 0.01 ## 2 T.K. 3 0.01 0.00 0.01 0.00 ## 3 N.G. 3 0.01 -0.01 0.02 0.00 ## 4 F.A. 1 0.02 0.01 0.02 0.00 ## 5 D.C. 1 -0.03 0.01 -0.02 0.01

ated median error as me and mean error as ae (for average error). The control condition is indicated by 1 and the ex-perimental condition by 2. Fitness levels are labeled one for basic fitness and three for advanced fitness. No participants with an average fitness was present in the true positive list of participants.

In table 7 we can see that in all cases the user had a median error closer to zero. This correspond to our first reason for positive attitude towards the experimental condition. In the table we also see that all mean error scores are closer to zero, except for the participant N.G.. The mean error for N.G. is the same value in the control condition as in the experimental condition except for the negative sign. Since N.G. has a high fitness level a mean error below zero is less convenient for his training type than a mean error above zero. However, if we look at the reason why N.G. preferred the experimental condition, we see that N.G. liked that it felt more challenging, as illustrated by the following quote from N.G.:

”It seemed more challenging to me. As someone who works out quite a lot I liked this aspect.”

This explanation correspond to our third reason for positive attitude towards the experimental condition. We were not able to find an example of the second reason of our hypoth-esis within our data.

5.3.4 Results of user experience

Looking at the user experience we saw that from the ’aver-age fitness level’ category two-third of the participants did not notice any di↵erence, as is conform with the experiment itself. We saw that from participants who did notice a dif-ference the majority of participants had a positive attitude toward the experimental condition. After excluding answers attitudes that were based on answers that were not relevant to our research question we looked further into the partici-pants that had a positive attitude towards the experimental condition. We found that overall the participants did not scored significantly better on the experimental condition.

6. CONCLUSIONS

For our research we created ’Raise Driver’: an exergame that adapts the intensity of an exercise dynamically based a player his physical fitness level in order to have an identical training performance and experience for all players. After testing our system we found that the players their performances were significantly di↵erent among the three

’fitness level’ groups in the experimental condition. Since one of the groups received an identical training in the control and experimental condition, whilst the significance exists, we conclude that our measure was not error free. We were not able to succeed in our research question of creating an adaptive exergame for which the performance of all players is the same.

For our second research question we looked at the user expe-rience to investigate what di↵erence players notice between a generic and an adaptive exergame. Between the control condition and the experimental condition we found that the majority of participants who did noticed a di↵erence be-tween the conditions had a positive attitude toward the ex-perimental condition. For these participants we investigated why this could could be the case. After excluding partici-pants whose attitudes were based on answers that were not relevant to our research question, we found that overall the players who had a positive attitude toward the experimental condition did not scored significantly better relatively to the control condition.

7. DISCUSSION AND FUTURE WORK

We have seen that for the ’average fitness level’ category the error score is significantly di↵erent between the control condition and the experimental condition. Both trials were the identical in this condition so the error should not be significant. We might be able to make the test error free by scaling up the number participants per group. In doing so our experiment would yield results that are more reliable to research if the used adaptation technique works. Fur-thermore, we used the fat percentage of a player as a rough indication for his fitness level and more research is needed to verify if this is a reliable factor for this indication.

8. ACKNOWLEDGMENTS

I would like to thank Sander Bakkes of the University of Amsterdam (UvA) for supervising this research. Addition-ally, I would like to thank Frank Nack of the UvA for being the second reader of this master thesis. Finally, I would like to thank all participants of this study by whom enabled me to execute my research.

9. REFERENCES

[1] Microsoft: Xbox kinect release date 10 november. http ://www.techradar.com/news/gaming/microsoft-xbox-kinect-release-date-10-november-710465. Accessed: 2015-05-17.

[2] Wii. http://en.wikipedia.org/wiki/Wii. Accessed: 2015-05-17.

[3] C. Baur, E. Rayner, and N. Tsourakis. Using a serious game to collect a child learner speech corpus. 2014. [4] J. Bolton, M. Lambert, D. Lirette, and B. Unsworth.

Paperdude: a virtual reality cycling exergame. In CHI’14 Extended Abstracts on Human Factors in Computing Systems, pages 475–478. ACM, 2014. [5] V. Brezinka. Treasure hunt-a serious game to support

psychotherapeutic treatment of children. Studies in health technology and informatics, 136:71, 2008. [6] J. Chen. Flow in games (and everything else).

(12)

[7] J. A. Garcia, K. Felix Navarro, D. Schoene, S. T. Smith, and Y. Pisan. Exergames for the elderly: Towards an embedded kinect-based clinical test of falls risk. Stud Health Technol Inform, 178:51–7, 2012. [8] S. Kazmi and I. J. Palmer. Action recognition for

support of adaptive gameplay: A case study of a first person shooter. International Journal of Computer Games Technology, 2010:1, 2010.

[9] J. Knüwer. â ˘AIJwii-habilitationâ ˘A˙I: The use of motion-based game consoles in stroke rehabilitation; a systematic review. 2014.

[10] J. M. Silva and A. El Saddik. An adaptive game-based exercising framework. In Virtual Environments Human-Computer Interfaces and Measurement Systems (VECIMS), 2011 IEEE International Conference on, pages 1–6. IEEE, 2011.

[11] J. Sinclair, P. Hingston, and M. Masek. Exergame development using the dual flow model. In Proceedings of the Sixth Australasian Conference on Interactive Entertainment, page 11. ACM, 2009.

[12] J. Togelius, R. De Nardi, and S. M. Lucas. Towards automatic personalised content creation for racing games. In Computational Intelligence and Games, 2007. CIG 2007. IEEE Symposium on, pages 252–259. IEEE, 2007.

[13] D. P. Vossen et al. The nature and classification of games. AVANTE-ONTARIO-, 10(1):53–68, 2004. [14] E. Witkowski. Running from zombies. In Proceedings

of The 9th Australasian Conference on Interactive Entertainment: Matters of Life and Death, page 1. ACM, 2013.