Assessing Dynamic Balance Performance during Exergaming based on Speed and Curvature of Body Movements

(1)

Assessing Dynamic Balance Performance during Exergaming based on Speed and Curvature

of Body Movements

Soancatl Aguilar, Venustiano; van de Gronde, Jasper J.; Lamoth, Claudine J. C.; Maurits,

Natasha M.; Roerdink, Jos B. T. M.

Published in:

IEEE Transactions on Neural Systems and Rehabilitation Engineering DOI:

10.1109/TNSRE.2017.2769701

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Final author's version (accepted by publisher, after peer review)

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Soancatl Aguilar, V., van de Gronde, J. J., Lamoth, C. J. C., Maurits, N. M., & Roerdink, J. B. T. M. (2018). Assessing Dynamic Balance Performance during Exergaming based on Speed and Curvature of Body Movements. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 171-180.

https://doi.org/10.1109/TNSRE.2017.2769701

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Assessing Dynamic Balance Performance during

Exergaming based on Speed and Curvature of Body

Movements

Venustiano Soancatl Aguilar, Jasper J. van de Gronde, Claudine J.C. Lamoth, Natasha M.

Maurits, Senior Member, IEEE and Jos B.T.M. Roerdink, Senior Member, IEEE

Abstract—Improving balance performance among the elderly is of utmost importance because of the increasing number of injuries and fatalities caused by fall incidences. Digital games controlled by body movements (exergames) have been proposed as a way to improve balance among older people. However, the assessment of balance performance in real-time during exergaming remains a challenging task. This assessment could be used to provide instantaneous feedback and automatically adjust the exergame difficulty. Such features could potentially increase the motivation of the player, thus augmenting the effectiveness of exergames. As clear differences in balance performance have been identified between older and younger people, distinguishing between older and younger adults can help identifying measures of balance performance. We used generalized linear models to investigate whether the assessment of balance performance based on movement speed can be improved by incorporating curvature of the movement trajectory into the analysis. Indeed, our results indicated that curvature improves the performance of the models. Five-fold cross validation indicated that our method is promising for the assessment of balance performance in real-time by showing more than 90% classification accuracy. Finally, this method could be valuable not only for exergaming but also for real-time assessment of body movements in sports, rehabilitation and medicine.

Index Terms—Assessing dynamic balance performance, ex-ergaming, speed and curvature, generalized linear models, Markov chain Monte Carlo estimation.

I. INTRODUCTION

T

HE assessment of the quality of body movements in real-time is of utmost importance in exergames, that is, digital games controlled by body movements. Older adults form a special target group for exergaming [1]. Because the number of injuries as well as the number of fatalities caused by fall incidences among older people have increased during the last decade [2], the ultimate goal of exergames is not only to provide fun and entertainment but also to improve postural balance performance. It is known that improving balance can reduce the incidence of falls among the older population [3]. Assessing balance during exergaming could be used to adapt the difficulty of the game as a function of the quality of body movements of the player, as well as to provide appropriate immediate feedback. This in turn could increase motivation of

Manuscript received . . . ; revised . . . . The project was financially supported by the Northern Netherlands Provinces Alliance, Course for the North.

Venustiano Soancatl Aguilar was supported by the Mexican National Council of Science and Technology (CONACYT) under scholarship number 313791 (correspondence e-mail: v.soancatl.aguilar@rug.nl)

the player and also increase the effectiveness of exergames as tools to improve balance [4].

Traditional methods to determine the effectiveness of ex-ergames rely on the assessment of balance before and after intervention tests [1]. Real-time methods to assess balance are still scarce and a gold standard for dynamic balance assessment has not yet been established. Methods for real-time balance quantification of whole body movements as recorded by devices such as Microsoft Kinect are yet to be developed [5].

Common home exergame devices such as Microsoft Kinect 1 reliably capture body movements that could be used to assess balance performance during exergaming [5]. It is known that younger adults have better postural performance than older adults [6]. There is also evidence of significant deterioration of postural performance, as evaluated from the center of pressure (CoP), over 60 years of age [7]. Hence, distinguishing between older and younger adults on the basis of movement characteristics can help to identify measures of (dynamic) balance performance. One way to identify older and younger adults is by means of generalized linear models (GLMs) [8]. Assuming that the actual age of a group of people is unknown (or temporarily ignored), we use GLMs to predict their age based on curvature and speed of their body motion. The results of this model prediction are meaningful values that can be interpreted as probabilities of the people belonging to a younger or an older age class. In addition, these GLMs can be used in real time during exergaming and continuously provide insight into the behaviour of the player, as they would indicate whether this behaviour is similar to that of an older or a younger participant, representing weak or strong balance performance, respectively.

In this paper, we investigate curvature and speed of body movement as measures of interest for assessing balance per-formance in real time. Speed is a reliable measure of bal-ance performbal-ance [9] and can be estimated instantaneously during exergaming. Although curvature is not commonly used, it has been identified as a promising measure for postural performance assessment in real-time [10], as it can provide additional information regarding the smoothness of body movements and can also be estimated instantaneously during exergaming. However, speed and curvature cannot be considered independent measures as they are related according to a power law [11]. We therefore also investigate whether linear regression parameters, derived from the speed-curvature

(3)

Fig. 1. Schematic showing the general steps followed in this study.

relationship on doubly logarithmic scale, can be used as compound measures of real-time balance performance. Fig. 1 shows our methodological approach in a flow chart.

II. METHODS

This study was performed in the context of the project Exergaming for balance training of older adults at homeof the SPRINT research center [12] of the University Medical Center Groningen (UMCG). Part of the project was the development of a custom-made ice-skating exergame for unsupervised balance training of older adults. The study was executed in accordance with the ethical standards of the declaration of Helsinki and approved by the Medical Ethical Committee, and the Ethical Committee of the Center of Human Movement Sciences at the UMCG.

A. Participants

The data were collected from forty healthy participants in a previous study [13]. Twenty participants were older than 65 years (average 71.8, SD 4.0 years, twelve males). Twenty participants were younger than 61 years (average 36.9, SD 16.6 years, 9 males). Inclusion criteria were BMI < 30 and the ability to walk without aid (self-reported) for at least 15 minutes. Exclusion criteria were the use of medication affect-ing postural performance, musculoskeletal, visual, hearaffect-ing, or neurological disorders that may affect balance performance, or an inability to understand the Dutch language.

B. Procedure and instrumentation

Each participant played the exergame ten times by swaying the center of body mass in lateral directions, resulting in 400 trials in total. On average, the trials lasted 44.3 seconds (SD 17.8 seconds). During game-play, 15 body parts were non-uniformly sampled at a frequency of about 30Hz using Microsoft Kinect version 1. Fig. 2 provides an example of the 15 point clouds recorded by Kinect during one minute of exergaming. For further details regarding the body positions captured by Kinect see [14, pp. 99].

feet knees hips mid-spine mid-shoulder head shoulder hand hand elbow

Fig. 2. Point clouds of 15 body parts tracked by Kinect during one minute of exergaming of a younger participant.

C. Data preprocessing

Kinect data were re-sampled at a fixed rate of 30Hz using cubic spline interpolation [15] to deal with possible sampling frequency deviations. For each trial, the first and last 5 seconds were removed to exclude motions that were not part of the swaying exercise. Feet, hands, and forearms were excluded from the analysis at this stage already, as they have the least reliably recorded body-part trajectories [5].

D. Curvature and speed estimation

For all the collected trials we estimated instantaneous speed (s) as the distance between two consecutive points divided by sample time. Instantaneous curvature (κ) was approximated by taking the inverse of the radius of a circle fitted to each three consecutive points [10].

E. Intercepts, slopes and means

Parameters from the power law relation between curvature and speed can be estimated [11] by fitting straight lines in doubly logarithmic space using the following linear model:

log(skj(ti)) = γkj+ βkj· log(κkj(ti)) (1)

wherek = 1 . . . 40 is the index of the participants, j = 1 . . . 9 is a body part identifier, t is time, i = 1 . . . n, n is the number of samples per body part, γkj are the intercepts of

the straight lines, and βkj are the slopes. These parameters

were estimated using [15]. For each participant and body part, average curvatureκ¯kj and speeds¯kj were also estimated.

(4)

3.3% < 0.001 14.8% < 0.001 22.1% < 0.001 24.1% < 0.001 32.4% < 0.001 33% < 0.001 45.1% < 0.001 52.6% 0.02 58.2% < 0.01 62.4% 0.07 74.7% 0.36 86.3% 1 γ(M ) s(M ) s(S) γ(K) s(K) γ(S) κ(M ) κ(K) κ(S) β(M ) β(K) β(S) −0.5 0 0.5 p OV L

Variables per body part

Measure

Age group:

Younger Older

Fig. 3. Overlapping violin plots illustrating differences between older and younger participants. The variables (γ-intercept, β-slope, κ-mean curvature and s-mean speed) derived from three body parts (K-knee, M -mid shoulder and S-mid spine) are shown on the horizontal axis arranged by the overlapping area (OVL) between distributions [17], and p is the statistical significance of the difference (Man-Whitney U-tests, Bonferroni corrected.

F. Body part and variable selection

The aim here is to reduce the complexity of the data as much as possible. For that purpose we estimated intercept correlations for all pairs of body part variables(γj, γj0), where j0 _{> j. Highly correlated body parts (r > 0.95) were excluded.}

As a result only mid shoulder, mid spine and right knee body parts remained for analysis because head, mid shoulder and shoulders, mid spine and hips, and both knees turned out to be highly correlated. Using the remaining body parts we tested whether the variables speed (¯sj), curvature (¯κj), and

power law parameters (intercept γj and slope βj) showed

significant differences between older and younger participants, as assessed by Mann-Whitney U-tests [16] with Bonferroni correction for multiple comparisons. These differences provide insight into the importance of the variables to differentiate the two groups. Variables showing no differences between age groups were excluded from the analysis. As a result γj and

¯

sj derived from mid shoulder and mid spine were found to

be the most important variables, ¯κj also showed significant

differences, but βj did not. Therefore,βj was excluded from

the analysis (see Fig. 3).

G. GLM creation and selection

A GLM can be specified in three steps [8]. First, an assumption is made about the distribution of the outcome variable. In our case, we assumed that the response variable follows a Bernoulli distribution, that is, pk is the probability

that a participant is 61 years or older and 1 _{− p}k is the

probability that a participant is younger than 61 years. Second, a linear model as a function of the explanatory variables is specified. Our explanatory variables are speed, curvature, and parameters derived from the power law relation between speed and curvature for mid shoulder, mid spine and right knee.

Third, the relationship between the mean value of the outcome variable and the linear model is specified.

Since in our case the outcome variable is binary (0 -young age class, 1 - old age class), we use logistic regression (or logit regression) which was invented by Cox [18], see also [19]. Logistic regression generates the coefficients of a formula to predict a logit transformation of the probability of a binary response based on one or more independent variables. The logit transformation involves the logit function, which transforms a probability value constrained between zero and one into the logarithm of the odds (or log odds),

logit(pk) = log(pk/(1− pk)), (2)

which can take any real value.

In this study, we defined GLMs to predict age category as follows.

• The age category of the k-th participant, ageCatk,

fol-lows a Bernoulli distribution with probabilitypk; • The logit transform of pk is linearly regressed on the

independent variables mean speed, mean curvature, and power law parameter (intercept) of the measured body parts, or a subset thereof.

Putting all this into a mathematical formula we get: ageCatk ∼ Bernoulli(pk), k = 1 . . . 40 logit(pk) = α + |B| X l=1 βl· Bl, α_{∼ N (0, 10), β}l∼ N (0, 50) (3)

whereageCatk represents the age category younger (class 0)

(5)

participantk belongs to the older category, and B is a subset of the explanatory variables, i.e.,

B⊆ {¯s(Mk), ¯s(Sk), ¯s(Kk), ¯κ(Mk), ¯κ(Sk), ¯κ(Kk),

γ(Mk), γ(Sk), γ(Kk)},

where mean speeds, mean curvature ¯¯ κ, and intercept γ are the measures derived for the body parts mid shoulder (Mk), mid

spine (Sk), and right knee (Kk). The regression coefficients

are the intercept α, and the coefficients βl for each variable

inB, where_{|B| is the number of variables in B.}

We selected the prior normal distributions_{N for α and β}l

in equation (3) with zero mean and standard deviation 10 and 50 respectively, based on the highest posterior density interval (HPDI) band, which is conceptually similar to the confidence interval band. We tested several priors using different standard deviation (SD) values, SD values of 10 for γ and 50 for βl

yielded the best fits (narrower bands are preferred). See [19, pp. 67] for further details regarding the HPDI.

Next, we investigated which combination of the variables would best predict age category. Evaluating all possible subsets (29_{for a set of 9 variables) of the full model defined by Eq. (3)}

can be time consuming. Therefore, some specific models were considered based on the main aims of the study and the importance of the variables and body parts. Variables showing larger differences between older and younger participants were considered to be more important. In this sense, intercept (γ) and speed (¯s) derived from mid shoulder (M ) are some of the most important variables (see Fig. 3). We assumed that models without mid shoulder variables would perform worse than models that include these variables. We also wanted to investigate whether adding curvature to the models could improve the performance of models that use only speed as explanatory variable. Thus, in addition to the empty model

logit(pk) = α, (m0)

used only as reference [20], we defined 12 models falling within 3 classes: including only speed, including only inter-cepts, and models including speed and curvature as follows.

• Model class I. Only speed

logit(pk) = α + β1· s(Mk), (m1)

logit(pk) = α + β1· s(Mk) + β2· s(Sk), (m2)

logit(pk) = α + β1· s(Mk) + β2· s(Kk), (m3)

logit(pk) = α + β1· s(Mk) + β2· s(Sk)+

β3· s(Kk). (m4)

• Model class II Only intercepts (here, curvature is used to estimate intercept values):

logit(pk) = α + β1· γ(Mk), (m5)

logit(pk) = α + β1· γ(Mk) + β2· γ(Sk), (m6)

logit(pk) = α + β1· γ(Mk) + β2· γ(Kk), (m7)

logit(pk) = α + β1· γ(Mk) + β2· γ(Sk)+

β3· γ(Kk). (m8)

• Model class III. Curvature and speed:

logit(pk) = α + β1· s(Mk) + β2· κ(Mk) (m9) logit(pk) = α + β1· s(Mk) + β2· κ(Mk)+ β3· s(Sk) + β4· κ(Sk), (m10) logit(pk) = α + β1· s(Mk) + β2· κ(Mk)+ β3· s(Kk) + β4· κ(Kk), (m11) logit(pk) = α + β1· s(Mk) + β2· κ(Mk)+ β3· s(Sk) + β4· κ(Sk)+ β5· s(Kk) + β6· κ(Kk). (m12)

Fitting GLMs (m0 . . .m12) we encountered the complete

separation problem, which happens mostly when the number of samples is small and a linear combination of the predictors perfectly or almost perfectly predicts the outcome [21], [22]. In such a case, the maximum likelihood fitting method provides implausible estimates. To circumvent this problem we used Markov chain Monte Carlo (MCMC) simulations to fit the GLMs [23]. We also used the probabilistic programming language Stan [24] and the rethinking [19] package as interface to fit the GLMs.

To compare the above models (m0 . . .m12)

Watanabe-Akaike information criterion (WAIC) [25] scores were esti-mated. WAIC is a measure of the predictive accuracy of the models on new data [26]. Hence, this criterion provides a way of model selection. These scores were ordered from the lowest to highest (smaller values are preferred). For each model the Akaike weightwas also estimated, which is a partition of the total weight of 1 among the models considered, so that the sum of the weights is always 1 (see Table I). These weights provide a more interpretable measure of the relative differences between the models because they are an estimate of the probability that a model will perform better on new data given the models considered. Further details can be found in [19, pp. 199, 207]. Note that for these comparisons the models were fitted using all the participants (k = 1 . . . 40). This is a preparatory step to select the models that will be assessed dynamically using five-fold cross-validation [27], [28] which we consider next.

H. Dynamic GLM performance

The main purpose of the defined GLMs m1 . . .m12 is to

perform dynamical predictions on new data, that is, estimate balance performance continuously during exergaming. We restrict ourselves to a selection of four models to be tested dynamically using five-fold cross-validation, as this technique is computationally expensive. Based on the WAIC scores we selected GLMs with the three highest scores, that is,m5,m7,

and m11. We also selected the model with the lowest score

m3 because it is based only on speed values derived from

mid shoulder and right knee body parts. This model is used as a benchmark to show that adding curvature and intercepts improves model performance.

Fig. 4 illustrates the first iteration of the five-fold cross-validation procedure. As part of the five-fold cross-cross-validation procedure, the order of the participants was randomized and

(6)

2 37 31 7 21 26 36 6 24 32 23 18 19 1 16 9 5 4 14 34 25 17 20 38 13 40 30 35 33 12 27 8 3 22 11 15 10 29 28 39

Body part variables s(M ) s(K) κ(M ) κ(K)

Testing data (20%) {0.5s, 1s, 1.5s, 2s} Running means Time Training data (80%) {¯s(M), ¯s(K), ¯κ(M), ¯κ(K), γ(M), γ(K)}2−8

(means and intercepts)

m3, m5, m7, m11 Models

Estimate

Tested on Trained on

Fig. 4. Visualization of the training and testing disjoint subsets used to assess models m3, m5, m7, and m11(only the first iteration of the five-fold cross validation procedure is illustrated). The order of the participants was randomized. Dots represent speed (s) and curvature (κ) estimations derived from two body parts, mid shoulder (M ) and right knee (K), using Kinect recordings.

TABLE I

WAICMODEL COMPARISON. SYMBOLS REPRESENT: M -MID SHOULDER,

S-MID SPINE,ANDK -RIGHT-KNEE, γ-INTERCEPT, κ-MEAN CURVATURE,

ANDs-MEAN SPEED. COLUMN‘Measure’:THE VARIABLES INCLUDED IN A

MODEL;COLUMN‘Body part’:THE INCLUDED BODY PARTS;COLUMN

‘Weight’:THEAKAIKE WEIGHT;COLUMNSE:THE STANDARD ERROR OF

THEWAICESTIMATE.

Model Body

part

Measure WAIC Weight SE

m5 M γ 2.4 0.15 1.08 m7 M , K γ 2.4 0.15 0.95 m11 M , K κ, s 2.6 0.14 0.85 m6 M , S γ 2.7 0.13 1.20 m12 M , S, K κ, s 2.8 0.13 0.94 m8 M , S, K γ 3.0 0.11 1.31 m10 M , S κ, s 3.6 0.08 1.37 m9 M κ, s 4.1 0.06 1.61 m2 M , S s 5.9 0.03 2.61 m4 M , S, K s 7.4 0.01 3.13 m1 M s 14.2 0.00 7.74 m3 M , K s 17.2 0.00 9.48 m0 57.5 0.00 0.03

five training-testing disjoint subsets were created using 80% of the participants for training and 20% for testing. For each iteration of the cross-validation procedure, m3, m5, m7 and

m11were fitted again (on means and intercepts) in the training

phase. To test the trained models we estimated running means, from the 20% testing data (local curvature and instantaneous speed, see section II-D), using a moving window over the whole length of the trials in the testing set. To investigate the effect of the running window size, each model was tested using 0.5, 1, 1.5, 2 second running means.

The models were assessed using traditional metrics such as the F-measure, precision and recall [29]. Values of these metrics were estimated using the threshold at the point with the best sum of sensitivity and specificity, closest to the point (0,1) of the ROC curve. Sensitivity (also called “recall”) is the proportion of correctly classified older participants. Specificity is the proportion of correctly classified younger participants. Precision is the number of correctly classified older participants divided by the total number of classified older participants. The F-measure is the harmonic mean of precision and recall. The threshold values were estimated using the pROC [30] R-package.

As a last step, we investigated whether more samples for training could improve model performance. For this purpose we redefinedm3,m5,m7 andm11, to be trained on running

means and running intercepts, as follows:

logit(pik) = α + β1· si(Mk) + β2· si(Kk), (m03)

logit(pik) = α + β1· γi(Mk), (m05)

logit(pik) = α + β1· γi(Mk) + β2· γi(Kk), (m07)

logit(pik) = α + β1· si(Mk) + β2· κi(Mk)+

β3· si(Kk) + β4· κi(Kk), (m011)

wheresi andκi are the one-second running means of speed

and curvature respectively at timei, and γi is the one-second

running intercept. Note that training these variant models (indicated by a prime) was computationally expensive because we used the whole length of the trials (i is the same index as in equation 1). For each training set, these models were trained twice using 10% and 100% of the running means. The window size was selected after looking at the performance of the modelsm1. . . m12. These models performed worse using

(7)

-0.6 -0.4 -0.2 0.00 0.25 0.50 0.75 1.00 -0.1 0.0 0.1 0.2 0.3 -0.6 -0.4 -0.2 -0.1 0.0 0.1 0.2 0.3 0.0 0.2 0.4 s(K) γ(M ) γ(K) s( M ) p (O lder ) γ (M ) 0.00 0.25 0.50 0.75

p(Older) Younger Older

0.00 0.25 0.50 0.75

p(Older)

Model 3 Model 5 Model 7

Fig. 5. Fits of the models m3, m5, and m7. The variables are intercept γ, speed s, curvature κ, whereas p represents the probability of belonging to the older age group. Dots and triangles represent mean values per participant inlog − log scale. The red circles in Model 3 indicate participants that are misclassified or not clearly classified as older or younger participants. In Model 5, the solid line represents the mean point estimates of the posterior distribution. The light-gray shaded area represents the 89% highest posterior density interval (HPDI) band for the means. The HPDI is the narrowest range containing the specified probability mass, similar to the common confidence interval.

a 0.5-second window size than using larger window-sizes, but there were no clear differences from 1 to 2 second window sizes. Hence we used a window size of one second.

III. RESULTS

Overlapping violin plots of mean speed and curvature, inter-cepts and slopes (of the speed-curvature relationship) per body part provide first insight into their potential to differentiate older and younger groups (Fig 3). Smaller overlapping area (OVL) values suggest greater differences between groups. This figure shows that intercept and speed measures derived from mid shoulder movements,γ(M ) and s(M ), differentiate better between older and younger groups than the other measures. Although curvature measures (κ) from the three body parts show around 50% OVL, the difference between older and younger participants is still significant. Slope measures (β) from mid shoulder, mid spine, and knee show the least differences between older and younger participants. Indeed, slopes show both the highest OVL values (> 60%) and non-significant differences between older and younger groups and were therefore excluded from GLM fits.

Next, we first consider GLM model selection based on global means, where the averages are computed over the whole time interval based on modelsm1-m12in Section II-G.

Subsequently, we investigate dynamic GLM performance, us-ing the variant GLM models m0

3, m05, m07, m011, as defined in

Section II-H. A. GLM selection

Table I shows the model comparison results. It can be observed that models including either curvature and speed together or only intercept are among the “best” models. Also

notice that most of these models score similar Akaike weights (probability of making best predictions on new data, see Section II-G), suggesting similar model performance. Models including only speed measures score the lowest weights of the fitted models. This implies that including either curvature or intercepts improves model performance on new data.

Fig. 5 shows how modelsm3,m5, andm7 fit the data.

Model m3 in this figure is based only on speed values

derived from mid shoulder and right-knee body parts. It scores the lowest WAIC value in Table I. The visualization of the fit of m3 provides insight into the performance of a model that

does not include curvature as a predictor variable. It shows a clear, but not perfect, separation between older and younger participants.

Model m5 in Fig. 5, is the “best” model according to

WAIC criteria. It is the simplest model as it only uses “one” variable, the intercept γ from the speed-curvature relation, derived from mid shoulder movements. This visualization shows that including curvature in the model improves the separation between older and younger participants.

Model m7 in Fig. 5 shows perfect separation between

older and younger participants. In addition to intercept values γ derived from mid shoulder, γ values derived from knee movements are included. According the figure shows, this additional information helps to even better differentiate older and younger participants.

The fit of model m11 cannot be visualized in two

dimen-sions because it involves four explanatory variables (speed and curvature means derived from mid shoulder and knee). How-ever, we can use principal component analysis [31] to visualize a projection of the points in the four-dimensional space onto the two first principal components (PC). We can also use a

(8)

bi-plot to gain additional understanding of the contribution of the variables to the first two PCs. Fig. 6 shows the bi-plot and the projection of the four variables of model m11. This

figure also shows clear separation between older and younger participants in terms of probabilities, since all the triangles correspond to high probabilities, and all circles correspond to low probabilities. Note that in the projection, one older participant is located among the younger participants. Larger arrows in the bi-plot indicate stronger contribution of the variables to the PCs than shorter arrows, orthogonal vectors in-dicate weak correlation and opposite directions inin-dicate strong negative correlation between variables [32]. Thus, curvature has a stronger contribution to the PCs, curvature and speed are strongly negatively correlated if they are derived from the same body part, but only weakly correlated if they are derived from different body parts.

s(M ) s(K) κ(M ) κ(K) -2 -1 0 1 2 -4 -2 0 2

First principal component

Second principal comp onen t 0.25 0.50 0.75

p(Older) Younger Older

Model 11

Fig. 6. Projection of the four variables of model m11(¯κ and¯s estimated from mid shoulder and knee) onto the first two PCs, p represents the probability of belonging to the older age group. The arrows represent the variable vectors with names at their end-points (M -mid shoulder, K-right knee), see text for explanation. The first PC accounts for 78.63% of the variance and the second PC accounts for 16.81%.

In summary, Table I, and Figs. 5 and 6 show that including curvature in the models improves their predictive performance.

B. Dynamic GLM performance

Fig. 7 shows the performance of the models m3,m5,m7,

and m11 and their variant versions (m0) tested on the trials

using 0.5, 1, 1.5, and 2 second running-means (#H#, # , H

# , and ). In contrast to WAIC comparison (Table I) where two γ-based models get the highest scores (m5 and

m7), this figure shows that these models score the lowest when

tested on 0.5-second running means. Moreover, m5, m7 and

their variants (m0

3andm07) are among the models scoring the

lowest accuracies, tested on 0.5-second and 1-second running means. Most of the models based only on speed values (m3

andm03) scored better than models fitted onγ values. Clearly,

the top 5 models are based on both speed and curvature values (m11andm011). As for the size of the time window, although

11

m

11 γ M#H# 1 γ MK#H# 1 s MK#H# 1 γ M# 1 γ M# 10 γ M_#100 γ MK_# 1 γ MH# 1 γ MK# 100 γ MK# 10 κs MK#H# 1 γ M 1 γ MK_H_# 1 γ MK 1 s MK# 1 s MK# 10 s MK# 100 s MKH# 1 s MK 1 κs MK_# 1 κs MK_#100 κs MK# 10 κs MKH# 1 κs MK 1 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Value of metric

Metrics

Fmeasure Precision Recall

Fig. 7. Model performance as assessed by different metrics. The horizontal axis represents the value of the three metrics, precision, recall, and F-measure. The vertical axis represents the fitted models ordered by F-measure. The symbols represent γ-intercept derived from the power law, κ-curvature and s-speed (one bar indicates means and double bar indicates running means), M -mid shoulder, S-mid spine, and K-right knee; black filled circles represent time windows, H#-0.5 seconds and -1 second; and boxes represent the amount of data used to fit the models: 1-means and intercepts, 10 -10% of the running means, and 100 -100% of the running means. In the plot, circles represent the average values of the metrics estimated by five-fold cross-validation, and horizontal bars represent standard errors.

F-measure, precision and recall, in Fig. 7, agree on the top-five models. Recall, that is the proportion of correctly classified older samples, shows that the top-five models correctly classify above 90% of the older samples. These models score between 0.8 and 0.86 on precision. The F-measure is commonly used as a measure between precision and recall. Among these measures, recall could be the best measure for our purposes because a) it shows the biggest gap between the top-five models and the rest of the models (see the distance between m3 andm11), b) it shows the highest accuracy in classifying

older participants, which is the target population of exergames in this study, and d) it shows the smallest standard errors. All

(9)

in all, these results show that including curvature in the models indeed improve their performance.

Fig. 8 illustratesm11, the top model, with dynamic

tions over time along the 400 trials. Even though the predic-tions for some participants are not clear cut (see participants, 19, 20, and 39), this figure shows a clear separation of older and younger participants over time, as on the left most of the probability values to belong to the older age group are low (“light gray”) and on the right most of the values are high (“dark gray”). The values in this figure could be used to provide appropriate feedback during game-play, as the player could see his/her instantaneous performance as “older” or “younger” participant. These values could also be used to adjust the difficulty of exergames to meet the skills of the players.

Fig. 8. Model m11 predictions from 400 trials using one-second running means. The horizontal axis represents 10 exergame trials for each participant. The vertical axis represents time. Each vertical line represents a trial. Partic-ipants are ordered by age, the first 20 are younger and the rest are older. For clarity, white lines are added to separate between trials per participant. Also, missing Kinect data are indicated by white lines.

In terms of balance performance our results suggest, as could be expected, that the younger participants performed better during the trials than the younger participants. This means that the older participants performed slower during the trials than the younger participants, illustrated in Figs. 3 and 5. Faster movements also indicate smoother movements (smaller curvature values) among the younger than among the older participants, because of the speed-curvature relationship [11].

IV. DISCUSSION

Our main goal in this study was to investigate whether balance performance as assessed by speed can be improved by adding curvature in the assessment, in two ways: a) using compound measures derived from the relation between curva-ture and speed, and b) using curvacurva-ture and speed together. Our results suggested that including curvature can indeed improve postural performance assessment in real-time.

First, WAIC scores suggested that GLMs including both curvature and speed, and GLMs including intercepts would perform better on new data than GLMs including only speed (see Table I). Second, common classifier performance metrics

such as F-measure, precision and recall suggested that GLMs including curvature and speed will probably perform better on new data than GLMs fitted only on speed or only on intercepts (see Fig. 7).

Although obtaining high prediction accuracy was not the main goal of this study, it is a natural way to gain insight into the future usefulness of the models [33], in our case, the usefulness for assessing balance performance in real-time. In addition, the straightforward interpretation of the probability values estimated from logistic regression are useful to provide immediate and meaningful feedback and can be interpreted by common people. These probability values indicate how likely the quality of body movements is similar to that of an older person.

In a previous study, curvature and speed derived from force plate recordings were identified to be suitable measures of balance performance in real-time, because a) they show differences between older and younger participants, and b) they can be estimated instantaneously [10]. The results of the present study provide additional evidence of this suitability, but now using Kinect recordings. This relies on the abil-ity to instantaneously characterize participants as “older” or “younger” in terms of probabilities. Even though the model predictions might be incorrect, the probability estimations could be useful because some older participants could behave as younger ones and vice-versa. In addition, curvature and speed provide a natural and intuitive interpretation of the results. For example, assuming that exergaming players are healthy it is natural to expect that fast and smooth movements reflect good postural performance. On the contrary, slow and non-smooth movements represent worse postural performance. As these features can be assessed by speed and curvature, higher speed and lower curvature values should reflect better postural performance.

In another study [13], balance performance was assessed using the same Kinect recordings. The authors used medial lateral movements (values of x and y coordinates, but not z) of nine body parts of the players to identify patterns using Self Organizing Maps. Then, they trained a kNN classifier using the identified patterns to discriminate older and younger participants. The accuracy of the classifier was 65.8%. Here, using two features derived from only two body parts we achieved more than 90% recall accuracy (see Fig. 7). These results suggest that features derived from body movement trajectories provide more information to differentiate older and younger participants than the coordinates of the trajectories.

In this study, all of the models showed a “clear” separation between older and younger participants. This may be because the number of participants is small. Thus, including more participants could be expected to give more overlap between older and younger participants. Still, the separation we found is consistent with the evident physical decline of people after 60 years of age. For example, older and younger participants were clearly differentiated using mean velocity during static tasks in [9]. Evident physical decline was also reported in walking speed and aerobic endurance for people in their 60s and 70s [34]. As exergaming is a physical activity, it may not be a surprise that we here observed similar differences between

(10)

older and younger participants.

One of the limitations of this study is that the lengths of the trials are short and the number of participants is small; to mitigate this, we applied 5-fold cross validation as an accepted method to gain insight into the performance of classifiers on new data. Further research is needed to investigate the usefulness of the presented method on long intervention studies. Another limitation is that the selection of body parts (mid shoulder, mid spine and right knee) is partly imposed by the limited accuracy of Kinect. More accurate devices may lead to a different selection of body parts for better classification performance.

The method presented here based on GLMs could be used to assess balance performance in different kinds of exergames using different kinds of tracking technology. That is because our method depends on features derived from the trajectory of body movements and not on the tracking device. Also, in ad-dition to curvature and speed, other instantaneously estimated variables/features can also be included in the models such as torsion and their derivatives [35]. Once a GLM is fitted, the estimated parameters could be used to make predictions as shown in Fig. 8.

Studies such as [36] and [37] have suggested some de-sirable features for the development of adaptive exergames. Some of these features are: embracing age-related physical impairments, adapting individual differences in player range of motion, preventing overexertion by providing appropriate game pacing, and including automatic adjustment of difficulty. In this sense, the methods shown here form a firm step towards the development of adaptive exergames based on measures of balance performance in real time.

V. CONCLUSION

We have presented a promising method to assess dynamic balance performance during exergaming. Curvature derived from the trajectories of body movements can provide addi-tional information to assess performance in real-time. GLMs provide a way to derive a single measure, as a function of curvature and speed, that represents the behaviour of partici-pants as belonging to a younger or older age group. Given reliably captured body movement trajectories, this method could potentially be used to assess the quality of movements in real-time not only in exergaming but in other fields such as sports, rehabilitation, and medicine, offering instantaneous and appropriate feedback that could foster motivation and movement performance. Finally, in future work we plan to study the validity and sensitivity of our method to detect changes in balance performance, using trials recorded during an unsupervised six week exergaming training at home.

APPENDIXA COMPUTATIONAL COST

Fig. 9 shows the amount of time used to fit models m03,

m05,m07, andm011on 10% and 100% of the data, respectively.

To fit these models we used a computer with two Intel Xeon E5-2630 processors of 2.3 GHz, each processor with 12 cores, and 64GB of main memory. For each model we ran

1.5 1.6 2.8 12.7 5.7 8.6 20.1 180.8 10% 100% m0 5 m07 m03 m011 m05 m07 m03 m011 0 50 100 150 200

Models

Computing

time

(hours)

Number of

_variables

1 2 4

Fig. 9. Amount of time used to fit variant (m0) models on 10% and 100% of the data, respectively. Note that as we used 5-fold cross validation, the bars represent the time to fit a particular model 5 times on different datasets.

24 simultaneous simulations, as there are 24 cores available, with 1000 warm-up samples and 3000 iterations, resulting in 24× (3000 − 1000) = 48000 real samples in total. These models were fitted using only 1-second running means as these fits were computationally intensive. The figure clearly shows that the time increases exponentially as the number of variables and the number of samples increase, see for example model m0

11 that was fitted on 4 variables with about 300k

samples. This figure, together with the results shown in Fig. 7 suggest that models fitted on means and intercepts could be good enough to make reliable predictions on new data. In addition, the most “complex” model m11 tested using 5-fold

cross validation based on means and intercepts required only about 2.5 minutes for fitting five different datasets.

ACKNOWLEDGMENT

The exergaming project has been performed on behalf of research center SPRINT of the UMCG and was supported by INCAS3 _{and 8D-Games. Data used in the present paper were}

collected by Mike van Diest in the realm of this project. REFERENCES

[1] M. van Diest, C. J. C. Lamoth, J. Stegenga, G. J. Verkerke, and K. Postema, “Exergaming for balance training of elderly: state of the art and future developments,” Journal of Neuroengineering and Rehabilitation, vol. 10, no. 1, p. 101, 2013.

[2] CDC, “WISQARS (Web-based Injury Statistics Query and Reporting System)—Injury Center—CDC.” [Online]. Available: https://www.cdc. gov/injury/wisqars/

[3] L. D. Gillespie, M. C. Robertson, W. J. Gillespie, C. Sherrington, S. Gates, L. M. Clemson, and S. E. Lamb, “Interventions for preventing falls in older people living in the community.” The Cochrane database of systematic reviews, vol. 9, p. CD007146, jan 2012.

[4] C. Burgers, A. Eden, M. D. Van Engelenburg, and S. Buningh, “How feedback boosts motivation and play in a brain-training game,” Computers in Human Behavior, vol. 48, pp. 94–103, 2015.

[5] M. van Diest, J. Stegenga, H. J. W¨ortche, K. Postema, G. J. Verkerke, and C. J. Lamoth, “Suitability of Kinect for measuring whole body movement patterns during exergaming,” Journal of Biomechanics, vol. 47, no. 12, pp. 2925–2932, sep 2014.

[6] C. J. Lamoth and M. J. van Heuvelen, “Sports activities are reflected in the local stability and regularity of body sway: Older ice-skaters have better postural control than inactive elderly,” Gait & Posture, vol. 35, no. 3, pp. 489–493, 2012.

[7] D. Abrahamov´a and F. Hlavacka, “Age-related changes of human balance during quiet stance.” Physiological research / Academia Sci-entiarum Bohemoslovaca, vol. 57, no. 6, pp. 957–964, 2008.

(11)

[8] A. F. Zuur, E. N. Ieno, N. J. Walker, A. A. Saveliev, and G. M. Smith, Mixed Effects Models and Extensions in Ecology with R. New York: Springer Science + Business Media, 2009.

[9] T. E. Prieto, J. B. Myklebust, R. G. Hoffmann, E. G. Lovett, and B. M. Myklebust, “Measures of postural steadiness differences between healthy young and elderly adults,” Transactions of Biomedical Engineering, vol. 43, no. 9, pp. 965–966, 1996.

[10] V. Soancatl Aguilar, J. J. van de Gronde, C. J. C. Lamoth, M. van Diest, N. M. Maurits, and J. B. T. M. Roerdink, “Visual Data Exploration for Balance Quantification in Real-Time During Exergaming,” PLOS ONE, vol. 12, no. 1, p. e0170906, jan 2017.

[11] P. L. Gribble and D. J. Ostry, “Origins of the power law relation between movement velocity and curvature: modeling the effects of muscle mechanics and limb dynamics.” Journal of neurophysiology, vol. 76, no. 5, pp. 2853–2860, 1996.

[12] “SPRINT Technologie die Ouderen Beweegt.” [Online]. Available: http://www.imdi-sprint.nl/

[13] M. van Diest, J. Stegenga, H. J. W¨ortche, J. B. T. M. Roerdink, G. J. Verkerke, and C. J. C. Lamoth, “Quantifying Postural Control during Exergaming Using Multivariate Whole-Body Movement Data: A Self-Organizing Maps Approach,” Plos One, vol. 10, no. 7, p. e0134350, 2015.

[14] J. Webb and J. Ashley, Beginning Kinect Programming with the Microsoft Kinect SDK. Apress, 2012.

[15] MATLAB, 8.6.0.267246 (R2015b), The Mathworks, Inc., Natick, Mas-sachusetts, 2015.

[16] H. B. Mann and D. R. Whitney, “On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other,” The Annals of Mathematical Statistics, vol. 18, no. 1, pp. 50–60, 1947.

[17] M. Ridout and M. Linkie, “Estimating overlap of daily activity patterns from camera trap data,” Journal of Agricultural, Biological, and Environmental Statistics, vol. 14, no. 50, pp. 322–337, 2009. [18] D. R. Cox, “The regression analysis of binary sequences (with

discus-sion),” J Roy Stat Soc B., vol. 20, pp. 215–242, 1958.

[19] R. McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Chapman and Hall/CRC, 2015.

[20] T. A. Snijders and R. J. Bosker, Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, second edition. Sage Publications, 1999.

[21] A. Albert and J. A. Anderson, “On the Existence of Maximum Likelihood Estimates in Logistic Regression Models,” Biometrika, vol. 71, no. 1, pp. 1–10, 1984.

[22] A. Gelman, A. Jakulin, M. G. Pittau, and Y. S. Su, “A weakly informative default prior distribution for logistic and other regression models,” Annals of Applied Statistics, vol. 2, no. 4, pp. 1360–1383, 2008.

[23] C. Rainey, “Dealing with Separation in Logistic Regression,” Political Analysis, vol. 24, no. 3, pp. 339–355, 2016.

[24] A. Gelman, D. Lee, and J. Guo, “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization,” Journal of Educa-tional and Behavioral Statistics, vol. 40, no. 5, pp. 530–543, 2015. [25] S. Watanabe, “Asymptotic Equivalence of Bayes Cross Validation and

Widely Applicable Information Criterion in Singular Learning Theory,” Journal of Machine Learning Research, vol. 11, pp. 3571–3594, 2010. [26] A. Gelman, J. Hwang, and A. Vehtari, “Understanding predictive infor-mation criteria for Bayesian models,” Statistics and Computing, vol. 24, no. 6, pp. 997–1016, 2014.

[27] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” in International Joint Conference on Artificial Intelligence, 1995.

[28] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed. Springer, 2009.

[29] T. Fawcett, “An introduction to ROC analysis Tom,” Pattern Recognition Letters, vol. 27, pp. 861–874, 2006.

[30] X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, and M. M¨uller, “pROC: an open-source package for R and S+ to analyze and compare ROC curves,” BMC Bioinformatics, vol. 12, no. 1, pp. 1–8, 2011.

[31] I. T. Jolliffe, “Principal Component Analysis, Second Edition,” Springer Series in Statistics, vol. 98, p. 487, 2002.

[32] K. R. Gabriel, “The biplot-graphical display of matrices with applica-tions to principal components analysis,” Biometrika, vol. 58, pp. 453– 467, 1971.

[33] J. Piironen and A. Vehtari, “Comparison of Bayesian predictive methods for model selection,” Statistics and Computing, vol. 27, no. 3, pp. 1–25, 2016.

[34] K. S. Hall, H. J. Cohen, C. F. Pieper, G. G. Fillenbaum, W. E. Kraus, K. M. Huffman, M. A. Cornish, A. Shiloh, C. Flynn, R. Sloane, L. K. Newby, and M. C. Morey, “Physical Performance Across the Adult Life Span: Correlates With Age and Physical Activity,” The Journals of Gerontology Series A: Biological Sciences and Medical Sciences, vol. 00, no. 00, pp. 1–7, 2016.

[35] S. Wu and Y. Li, “Flexible signature descriptions for adaptive motion trajectory representation, perception and recognition,” Pattern Recognition, vol. 42, no. 1, pp. 194–214, jan 2009.

[36] K. Gerling, I. Livingston, L. Nacke, and R. Mandryk, “Full-body motion-based game interaction for older adults,” in Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI ’12, 2012, pp. 1873–1882.

[37] A. Velazquez, A. I. Mart´ınez-Garc´ıa, J. Favela, and S. F. Ochoa, “Adaptive exergames to support active aging: An action research study,” Pervasive and Mobile Computing, 2016.

Venustiano Soancatl Aguilar received the B.Sc. degree in computer science from the Benemérita Universidad Autónoma de Puebla, México in 2001, and the MSc degree in computer science from the Instituto Nacional de Astrof´ısica,

´

Optica y Electrónica, Tonantzintla, Puebla, México in 2003. From 2006 to 2014, he was an Associate Professor in computer science at the Universidad del Istmo, Oaxaca, México. He is pursuing a PhD degree in computer science at the University of Groningen, The Netherlands. His research focuses on quantification of human movement in real-time for gaming technology and movement disorders. His interests include human-computer interaction, computer vision and machine learning.

Jasper J. van de Gronde studied computing science at the University of Groningen, the Netherlands, where he obtained his M.Sc. in 2011 and his Ph.D. in 2015 on a theoretical framework for mathematical morphology on colour and tensor-valued images. After a postdoctoral period at the same university he joined the ASML company in Veldhoven, The Netherlands, as Metrology Software Design Engineer in early 2017. His research interests include mathematical morphology, compressed sensing, and signal/image processing in general.

Claudine C.J. Lamoth is an Associate Professor and principal investigator at the Centre for Human Movement Sciences of the University Medical Centre Groningen (UMG), University of Groningen. Lamoth’s research focus is in the area of motor control, motor learning, ageing and movement disorders, using concepts and tools from dynamical systems theory. She combines fundamental knowledge of motor control with sensor and game technologies and innovative data analysis in order to develop solutions for supporting independent mobility, preventing falls, and improving health.

Natasha M. Maurits received the MEng degree (1994) in applied mathemat-ics, the MSc degree (1994) in numerical mathematmathemat-ics, and the PhD degree (1998) in chemistry, from the University of Groningen, The Netherlands. She is currently a full professor of Clinical Neuroengineering at the Department of Neurology at the University Medical Center Groningen, the Netherlands. She translates clinical neurological problems to physical-mathematical problems, tries to find a solution using state-of-the-art mathematical techniques and translates this solution back to neurology for application. Her interests include movement disorders, trauma, multimodal recordings, signal analysis and data visualization. She is a senior member of IEEE.

Jos B.T.M. Roerdink received a Ph.D. (1983) in theoretical physics from the University of Utrecht, the Netherlands. After a two-year position (1983-1985) as a Postdoctoral Fellow at the University of California, San Diego, he joined the Centre for Mathematics and Computer Science in Amsterdam, working on image processing and tomographic reconstruction. He was appointed as-sociate professor (1992) and full professor (2003), respectively, at the Johann Bernoulli Institute for Mathematics and Computer Science of the University of Groningen, where he currently holds a chair in Scientific Visualization and Computer Graphics. His research interests include mathematical morphology, biomedical visualization, neuroimaging and bio/neuroinformatics.