Towards a production-ready PCIT-VR framework

(1)

Bachelor Informatica

Towards a production-ready

framework for PCIT-VR

Mick Vermeulen

June 22, 2020

Supervisor(s): Robert Belleman, Iza Scherpbier

Informatica

—

Universiteit

van

(2)

(3)

Abstract

Parent-Child Interaction Therapy (PCIT) is used to teach parents skills that reinforce appropriate behaviour of their children with behavioural problems like Oppositional De-ﬁant Disorder (ODD). In recent work, a 360 degree video virtual reality Minimum Viable Product (MVP) was developed for parents to practice these skills at home using their mo-bile phones. This application was accordingly called PCIT-VR and played VR videos of therapy sessions with interactive elements at key points in the video to test the knowl-edge of the parents. To aid therapists in creating these videos, a proof of concept editing web-application was created, which included limited functionality to edit videos and insert interaction moments in the video.

This thesis aims to build upon the viewing and editing applications to work towards a production-ready framework. This includes updating the therapy data structure to support a non-linear storyline behaviour based on the performance of a parent, analytics systems to track the performance of parents and rebuilding the playing environment from the ground up to support the newly designed data structure.

(4)

(5)

3.4 Analytics . . . 21 3.5 Conclusion . . . 21 4 Implementation 23 4.1 Interaction moments . . . 23 4.1.1 Database implementation . . . 24 4.1.2 API implementation . . . 26 4.2 Exporting projects . . . 26 4.2.1 Format . . . 26 4.2.2 API . . . 26 4.3 Viewing application . . . 28 4.3.1 Technical requirements . . . 28

4.3.2 Setup and technologies . . . 28

4.3.3 Implementation . . . 28

4.4 Analytics . . . 33

5 Results 35 5.1 Editor . . . 35

(6)

6 Conclusion and future work 37

6.1 Conclusion . . . 37

6.2 Limitations and future work . . . 37

6.2.1 General improvements . . . 37

6.2.2 Regarding Rust . . . 38

Appendices 41

A Export JSON schema 43

B Export JSON example 51

C Nuxt.js A-frame plugin 55

(7)

CHAPTER 1 Introduction

1.1 PCIT

Parent-Child Interaction Therapy (PCIT) is a form of therapy that is applied to parents who have children with behavioural disorders such as Oppositional Deﬁant Disorder (ODD) and Conduct Disorder (CD) to reduce counter-productive parent-child dynamics [1]. Improperly handling these behavioural problems can lead to symptoms like aggressive and oppositional behaviour to worsen due to a vicious cycle of increasing stress for the parent and conse-quently more extreme behaviour of the child.

Therapists use PCIT to train parents to handle this disruptive behaviour by using a ther-apy split into two parts. In the ﬁrst part, the child directed phase, parents are taught to reinforce appropriate behaviour and ignore undesired behaviour. Once a parent masters this phase, a second phase is introduced: the parent directed phase. In this phase the par-ent learns to properly set boundaries for the child and limit unwanted behaviour in such a way that further disruptive behaviour is reduced. The therapist teaches skills that are proper responses to the child, like praising, reﬂecting or imitating. The therapist uses an earpiece to coach the parents during the therapy sessions.

1.2 PCIT-VR

In-person therapy requires the therapist, the parents and the child to be in the same loca-tion. There are many scenarios for the parents to study, requiring many sessions to com-plete. To reinforce the learning experience between sessions, a virtual reality application was created that allowed parents to practice these skills at home using their mobile phones [28]. The parents were shown 360 degree virtual reality videos of therapy sessions. At key moments, the video was paused and the parent was presented a set of responses to the situation, allowing the parent to test their knowledge. This proof of concept application was used to show the beneﬁts of using virtual reality PCIT practice for therapy. It was not pro-duction ready, though: all the key interaction moments were hard coded into the application and were not easily editable by the therapists.

To aid therapists in creating interactive videos, an editing web application was built [11]. This framework allowed therapists to load video clips, edit them and insert key interaction moments. A basic data structure was setup to register the interaction moments and export them as part of the ﬁnished practice video. When a practice session was ﬁnished, the proper codec and bitrates were automatically selected such that the therapist would not have to worry about these technical details required by the playing environment.

(8)

1.3 Research Goal

In the proof of concept editing application, the key interaction moments were designed as a linear storyline; to reinforce the learning process, therapists prefer a tree-like structure where answers to the key interaction moments deﬁne a non-linear storyline. This allows the therapists to repeat certain portions of the practice session that a parent might struggle with to speed up the learning progress. Secondly, there was no way to track the performance and progress of parents that practice using the PCIT-VR sessions. Finally, therapists were not able to work on multiple projects at the same time as there were no user scopes and there was no user authentication.

The research project of this thesis extends the editing application to allow therapists to specify dynamic key interaction moments with follow-up videos, making the exported project not a single video but a series of videos where the traversal through these videos is determined by the answers of the parent. In addition, the editing application API is extended to allow the viewing application to pass analytics events to the server, giving therapists the opportunity to track the progress of their clients while they practice at home. Finally, research into an authentication and authorization server is included so that user scope and user authentication can be implemented in the PCIT-VR framework in the future.

The viewing application is developed from a Minimum Viable Product (MVP) into a pro-duction ready application. It integrates with the new export format speciﬁed by the editing application API. Altogether, this allows therapists to share their therapy projects with their clients, allowing in-practice PCIT therapy tests with parents.

The work done in this thesis thus works towards a production-ready PCIT-VR framework, both on the editing and viewing side. This framework is used to advance the PhD research of Iza Scherpbier, on of this thesis’ supervisors, about applying VR to PCIT in practice.

1.4 Ethics

As therapy data is considered to be personal and sensitive it must be handled with care when working on this research project. Progress of parents’ practice sessions is stored in the database as analytics events, making it beneficial to make this data as anonymous as possible to ensure that, if a leak were to occur, the data is not connected to individual clients or sessions. However, as therapists need to review the therapy progress of parents, this can be difficult to achieve. Also, if a leak were to occur in the authentication or authorization portion of PCIT-VR, large fines for the therapists are at risk due to the General Data Protec-tion RegulaProtec-tion (GDPR) of the European Union [31]. Implying that not only the safety of the parents is at stake but the safety of the therapists using the tool as well.

Positive effects of PCIT-VR will reveal themselves if PCIT-VR turns out to be a successful tool for practicing PCIT at home in a practical trial: this can speed up overall therapy time, allowing more parents and children to go through the therapy in the long run. The PCIT-VR framework is independent of this speciﬁc type of therapy, in essence, it is a Virtual Real-ity training designer and player which could be applied to other therapy, rehabilitation or hands-on training programs. Early research into using VR for neuropsychology treatments has shown that VR comes with many interesting assets for therapy. For example: "gaming" features of VR could be used to boost therapy motivation of clients, safe and secure environ-ments can be recreated to reduce real world risks and performance can be measured more effectively than through traditional therapy [26].

Furthermore, in the healthcare industry, research has shown that VR training can reduce operating errors of doctors-in-training by up to six times [29]. VR has also been used to train stroke victims to navigate public transport [16]. Stroke victims who used VR showed improvements in their abilities to navigate the metro and ﬁnd facilities in the metro stations. Another adoption of VR training was to teach children between ages 7-11 to cross roads safely. Using a computer simulated VR trafﬁc environment, children were trained to cross

(9)

roads faster and pick safer moments to cross the road [30].

These applications highlight the potential of VR in different types of situations: danger-ous situations can be practised in a safe environment and hard-to-recreate situations which require doctors or therapists can be recreated and replayed. This lowers the costs and time related to training and can improve the skills of the subject that is trained.

(10)

(11)

CHAPTER 2 Previous work

2.1 Viewing application

The PCIT-VR viewing application used to display the interactive VR videos was built using the A-frame open source WebVR framework [8]. This allowed low-cost playback of the in-teractive experiences as parents could use their mobile phones as Head Mounted Displays, instead of needing expensive VR devices like the Oculus Rift or HTC Vive [28]. To train parents, key interaction moments were hard coded into the application: at certain points in time the video was paused, questions related to PCIT were displayed and the parents were given time to choose a response. This proof of concept application was well-received by the therapists that were included in this study, highlighting the potential for further PCIT-VR research. Five industry experts gave the viewing application an average Systems Usability Score (SUS) of 88.5, a very high score. These early results support the hypothesis that VR is a suitable solution to train parents for PCIT. However, this application was not production ready as there was no way for therapists to easily alter the interaction moments and thus only three static linear training scenarios were implemented.

2.2 Editing application

An editing application was built to aid the therapists in creating videos for their clients to use with the PCIT-VR viewing application. This editing application automatically identiﬁes and applies the proper codec for the VR videos and when inserting key interaction moments [11]. The editor was built using the client-server model with a Rust [10] back-end server and a TypeScript [19] front-end client. The client communicated with the server through a Restful API exposed by the server. This API was built upon Rocket, a Rust web-framework [5] in combination with Diesel, a Rust Object Representation Model (ORM) framework [13] used for handling PostgreSQL database mutations. Rust and TypeScript were selected as programming languages due to their type safety and concurrent features. Using the Web-Socket API [18], the client and server are synchronised when importing ﬁles or exporting projects.

Video processing was handled by FFmpeg [4], an open source video editing tool, using Rust system calls as background progresses. A schematic overview of the back-end can be seen in figure 2.1. The final application resembles the video editing software iMovie by Apple visually, and was selected due generally good user experience that iMovie offers, shown in figure 2.2.

Despite allowing therapists to insert interaction moments, these were statically placed along a single edited video, producing a linear timeline. An ideal implementation would respond to the parents answers by branching to the relevant subsection of the therapy,

(12)

Figure 2.1: PCIT-VR server structure [11].

Figure 2.2: PCIT-VR client [11].

creating a non-linear timeline.

The application did not expose an API for analytics in the viewing application , with the result that the therapists were unable to conduct the therapy remotely. Finally, no user authorization and user scope was implemented which led to a single editing scope: multiple therapists were not able edit videos at the same time. To conclude, these shortcomings are summarized as follows and are the main subject of this thesis:

1. Building the MVP viewing application into an application that supports non-linear time-lines.

2. Building support into the editor front-end and back-end for creating and exporting non-linear timelines.

(13)

3. Research and implement an analytics API for the server.

(14)

2.3 Non-linear PCIT timelines

An example of a non-linear timeline that needs to be supported is shown in figure 2.3. This scenario defines a basic PCIT parenting skill calledreflection [1], the learning objective of this session. When the child tells the parent "I am going to put a blue block on top of the tower." in the video, the correct parental reaction is option A: "You are going to put the blue block on the tower." shown on the left side of the diagram. When the parent chooses this option, the objective is completed and the session ends, denoted by the green square in figure 2.3.

If the parent chooses option B: "I’m going to put a blue block on my tower as well", they branch into the right side of figure 2.3, as this reaction is not areflection and thus the incorrect answer. The right side of the figure is used to train the parent to use the reflection technique: first the parent is shown a tool tip: "You could have also used a reflection here.". Next, the parent is shown another scenario wherereflection should be used, the question about the correct response is shown again. If the parent yet again chooses option B, the incorrect answer, another tool tip is shown: "Areflection is when you repeat what your child said, without changing the context. For example: ’You are going to put the blue block on the tower’". This tool tip is used to reinforce the learning objective for the parent. At this point the parent is in a loop: choosing option A will branch the parent into the option A branch at the beginning, ending the session. Choosing option B will branch the parent back to the start of the last scenario, showing the tool tip again and asking the same question until the parent chooses the right response: option A.

Scenario Nazegging

Kind speelt, kind zegt ‘ik ga de het blauwe blokje hierop zetten’ Ouder maakt

keuze A: ‘jij gaat het blauwe blokje daarop zetten’

Ouder moet kiezen door middel van keuzemenu

Ouder maakt keuze B: ‘ik ga ook een blauw blokje op mijn toren zetten’

Ouder kiest juist: er komt in beeld te staan: ‘je hebt een nazegging gedaan,

goedzo!’

Ouder ‘praat’: er komt in beeld te staan: ‘Je hebt nu gepraat, je zou hier ook een nazegging kunnen doen’ Scenario komt niet

meer voor in de reeks van die

oefening

Scenario wordt later nog een keer

afgespeeld met dezelfde keuzes

Ouder kiest keuze A

Ouder kiest keuze B

Er komt in beeld te staan: ‘Een nazegging is als herhaalt wat

je kind zegt zonder de context te veranderen, bijvoorbeeld: ‘jij gaat het blauwe blokje daarop zetten’’ Scenario wordt

later nog een keer afgespeeld met dezelfde keuzes

Figure 2.3: Example of a non-linear PCIT timeline, source: Iza Scherpbier.

Figure 2.3 is a basic example of what the PCIT-VR framework should support. It is the golden standard that will prove that these non-linear scenarios can be implemented.

(15)

CHAPTER 3 Design

In this chapter the design considerations for a production-ready framework for PCIT-VR are laid out. It is important to make a distinction between the theoretical and practical parts of this thesis: in this chapter, research is conducted into the design considerations for a production-ready PCIT-VR framework, however, not all the sections in this chapter are implemented in the corresponding research project. Therefore, whenever a component of this section is implemented in the research project, a reference to the implementation section is provided. Otherwise, the section can be used for future implementation reference.

3.1 Editing application

As previously mentioned, the therapist wants to be able to create a non-linear timeline so that the viewing application branches in response to the responses to their question. The initial editing application featured a timeline that supported a single sequence of videos, as seen in ﬁgure 2.2. This timeline has to be extended to visualise the non-linear storyline. As the editing application already has a clear design goal, namely to represent iMovie visually, we can build upon these design principles.

User friendliness is an important aspect of this application as the technical process of creating therapy videos should be simple. This allows the therapist to focus their attention on the content of the video, the scenarios they are building to support their clients progress. Designing a user-friendly way of presenting the branch behaviour in the editing application is critical to reaching this goal. Inspiration can be taken from the video editor of WarpVR: WarpVR is a company that produces VR training software, allowing a user to upload videos and link them together using an annotation system [33]. Their annotation system is a direct and user-friendly approach: the user can drag lines from answers to other segments in the timeline, clearly showing the defined path to the user. An overview of this editor is shown in figure 3.1. The change that would have to be made to the WarpVR design is that the PCIT-VR editor is also used as a video editor: clips can be joined and cut, for each segment this behaviour has to be maintained. Thus, the final PCIT-VR design should be based on the WarpVR design, but each block in the WarpVR editor should be treated as a partial timeline where the therapist can edit their clips. The front-end of this new non-linear storyline editor is not implemented as part of this research project but the back-end implementation is described in section 4.1.1.

(16)

Figure 3.1: WarpVR editing environment, source: https://www.youtube.com/watch?v= 2CnKKbeStZA.

3.2 Viewing application

As noted by the results of the MVP viewing application, the viewing application needs to be easily accessible and user-friendly [28]. As we cannot expect all clients to have expen-sive VR equipment at home, the therapists suggested using the parents’ mobile phones as head mounted displays. It is important to compare mobile phones to more sophisticated VR devices to determine if they are a suitable alternative. Sophisticated VR devices might provide a better feeling of presence, the feeling that a user is actually inside the VR experi-ence, due to the extra tracking data and the availability of a dedicated controller. Research has shown that the main contributors to high user presence in VR experiences are tracking level, stereoscopy and ﬁeld of view [6].

When using a mobile phone, there are only 3 degrees of freedom: a user can rotate their head, but cannot move around in the environment, meaning that the mobile phone has limited capabilities with regards to the tracking level when compared to dedicated VR equipment. However, for PCIT-VR more than 3 degrees of freedom are not preferred as the PCIT-VR framework uses therapy videos and not a fully rendered 3D VR environments. Giving the user the opportunity to walk around in a 360 degree video might even worsen the user immersion as the video perspective can be distorted.

In contrary to the tracking level, both stereoscopy and ﬁeld of view can be achieved with modern mobile phones and a head-mount. Modern mobile phones are generally wide, providing a large ﬁeld of view when using a head-mount to view the mobile display in VR. Also, when using a head-mount like Google Cardboard, stereoscopy can be enabled. In summary, mobile phones provide 2 of the 3 requirements for an immersive VR environment, making them a suitable and cost-effective alternative to expensive VR equipment for home practice.

The viewing application can then be implemented as a native iOS and Android Applica-tion or as a web applicaApplica-tion. Due to the following reasons, a web applicaApplica-tion is preferred for PCIT-VR:

1. Only one source has to be maintained, instead of an Android and iOS native source. 2. The source code of the PCIT-VR viewing concept can be re-used.

3. App deployment, especially on iOS, is a lengthy process that generally requires an app review by the vendor.

4. A web browser is already available on all mobile phones and prevents the parents from having to download an app. They only have to visit a PCIT-VR URL.

(17)

3.2.1 Ease of access

A clear design direction provided by the therapists is that the viewing application needs to be easily accessible for the parents. When the therapist shares a link to view a practice session, the parents should not have create an account and login before being able to start. A secure way of implementing this feature request is by generating a long and randomized URL for an exported practice session. This ensures that accessing a practice session once one obtains a URL is as simple as opening the URL in a web browser. However, applying brute force methods to acquire a URL are useless as long as the number of possible URLs is large and the generation scheme is cryptographically secure. The parents are instructed to keep this link secure, to prevent unintentional leaking. The server implementation of this URL scheme is described in section 4.2.2, the client implementation is described in section 4.3.3.

The links can be further protected from unintended access by using invite links that grant permission to view the practice session to the visitor of the sharing link. This can be achieved by adding a validation time-out to this invite link so that the duration of how long a practice session can be shared is limited. When the invite link is invalidated, nobody without access rights can visit a dangling session link without requesting another invite link. Although this link invalidation scheme is more secure than using the static randomized scheme, it is currently not implemented in the PCIT-VR framework.

3.2.2 User interaction

When designing the interactive VR environment for smartphones, we have to keep in mind that there is no mouse pointer that can be used to click on buttons shown in the environment. Clicks in VR environments are generally achieved by using a motion controller shipped with the head mounted display, allowing the user to point at objects in the VR environment and use buttons on the controller to trigger a click. However, the PCIT-VR environment must be usable without VR gear because a modern mobile device is the only requirement. A common way of implementing a pointer in a mobile-only VR environment is to place a visual pointer, called a "reticle", at the centre of the screen which moves with the head movements of the parent. This reticle is then pointed at an object by looking at the object, triggering a click.

However, a naive implementation of this reticle suffers from the "Midas touch" problem: computer users are not familiar with an interface whereby simply looking at an object trig-gers an action by the computer [15]. Most people are used to scanning a screen, choosing what action they want to perform based on what’s visible on the screen andthen selecting an action to activate. If we were to implement this naive"what you look at is what you get" approach of navigating through the VR environment, users do not get the opportunity to read the text contained in a button, because merely looking at it triggers a click. As such, "eye gaze selection" is preferred: a user triggers a click by looking at an object for a certain amount of time. Usually, feedback is provided to the user about how long one has to gaze at an object by enlarging the cursor when looking at an object or by making the object glow until the click is registered. The eye gaze selection is implemented and described in further detail in section 4.3.3.

3.3 Authentication and authorization

The PCIT-VR project contains many elements that should be protected from unauthorized ac-cess, this is not only ethical but also required by law due to the GDPR [32]. For example, the project editor should only be accessible by therapists and the practice sessions should only be accessible by therapists and parents. The most fragile data is the usage analytics of the parents, which records the parents’ progress in detail. As mentioned before, the therapists of PCIT prefer a solution that does not require the parents to login so that parents can focus on the practice sessions and not the technical details like maintaining passwords. Thus, an

(18)

authentication and authorization solution must be selected that can be used anonymously, e.g. the parents and through login, e.g. the therapists. Although authentication and autho-rization are not implemented in the PCIT-VR framework produced in the research project, we explain the possible authentication and authorization implementations for PCIT-VR in the following sections.

3.3.1 Session Tokens

Session tokens are a stateful solution to authentication: once a user logs into the application server with valid credentials, a session is created and a session token is shared with the client by the server. This session is stored in the database of the server which tracks all the logged in users. Whenever the client requests a resource from the server, the client sends the session token as a cookie to the server, the server then checks if this is a valid session and then makes sure that this sessions’ user has the proper access rights to the requested resource. When both the session and the user are validated, the server fulﬁlls the request.

The beneﬁt of using session tokens is that the client state is always synchronized with that of the server: if a client were to log-out, the server is notiﬁed and the session on the server is removed from the database. A downside of using session tokens is the extra complexity that it brings: the server must keep track of all the available access rights and must validate the sessions against these access rights for each request.

Another downside is that session tokens stored in browser cookies are vulnerable to Cross Site Request Forgery (CSRF) attacks. One vector by which CSRF works is by calling a certain URL of a trusted website when a user visits a malicious website, usually by setting a URL of the trusted website as a source attribute of an image on the malicious website. As the HTTP protocol will send all the cookies it has of the trusted website, along with the session token, to the server where the requested resource is located, this attack can be used to make users unknowingly perform actions on trusted websites where they have active sessions.

A common way of protecting against CSRF attacks is by checking the Origin HTTP header or Referrer HTTP header, which speciﬁes where this request originates from [3]. Using this method, the server knows if the request came from a malicious website as the Origin header will not match the domain of the server. However, this makes the server dependent on the Origin header implementation of the users’ browser.

Another way of preventing CSRF attacks is using CSRF tokens: these tokens are inserted into the body of HTML forms on the trusted website as an invisible input ﬁeld. Whenever a form is submitted, this token is checked on the server, ensuring that the request is valid as the server token and client token match. However, this method only works with stateful web servers that generate the HTML for each request as the token is then embedded in the HTML, which is not the case for the PCIT-VR server.

To summarise, implementing session tokens introduces extra logic in the back-end as sessions and user permissions have to be stored in the database. Also, session tokens come with security vulnerabilities like CSRF attacks that require more security implementations to protect against, resulting in extra complexity.

3.3.2 Bearer authentication

Bearer tokens are tokens generated by a server and sent to a client once, for example on login. The server internally tracks who this token belongs to and what the access rights of that token are. The token must be kept safe by the user and is generally stored in client browser. The client sends the token for each HTTP request to the server, allowing the server to decide whether the action is valid or not. A bearer token is generally sent to the server using the following HTTP header:

(19)

A beneﬁt of this token system is that it does notrequire a login: when an authorization action occurs the server can be programmed to provide the requesting client with a token. This can be when a therapist signs on or when a client clicks on a temporarily valid sharing link, as we described before. Additionally, a bearer token does not suffer from the CSRF attack vulnerability of the session tokens described in section 3.3.1. Because the token is sent as a header, it is implemented in the client and not automatically sent with a HTTP request. A malicious actor cannot access the bearer token and cannot forge a request to a trusted website. A downside of this approach is that tokens have to also be stored in the server database, along with the permissions that the token bears. Generally, this does not impose a performance penalty but does require a more complex back-end to support the custom permission system.

3.3.3 OAuth 2.0

OAuth 2.0 is an industry standard authorization protocol that handles secure authorization between two parties and allows an application to access restricted information about a user [23]. The OAuth 2.0 authorization protocol works in 3 steps:

1. Authorization request: logging in to a service like Facebook or Google.

2. Authorization grant: the user conﬁrms that an application may access their account in the before mentioned service and the service provides the requesting application with an authorization grant.

3. Access token: the requesting application uses the temporarily valid authorization grant to request an access token which is later used for all requests to prove the identity of the user.

A more schematic overview is highlighted in ﬁgure 3.2.

Figure 3.2: OAuth 2.0 authorization ﬂow, source: https://www.digitalocean.com/ community/tutorials/an-introduction-to-oauth-2.

The beneﬁt of using OAuth 2.0 is that it allows a user to login to the application us-ing an already existus-ing account on services like Google or Facebook. This improves user friendliness as users can quickly sign in via a third party service. However, as mentioned before, the therapists prefer a solution for their parents where no login is required at all. A downside to using OAuth 2.0 is that it is designed as an authorization protocol and not as anauthentication protocol. This means that, although users can login using third party

(20)

Encoded JWT: eyJhbGciOiJIUzI1NiIsInR5cCI6I kpXVCJ9. eyJyb2xlIjoidGhlcmFw aXN0IiwibmFtZSI6Ik0uIFZlcm1. LbqyNPyowSpc492by8wAPd3nC 4YJIuKoqX7CUwpRY2Y 1 Decoded JWT: 2 { 3 "alg": "HS256", 4 "typ": "JWT" 5 } 6 { 7 "role": "therapist", 8 "name": "M. Vermeulen", 9 "sessions_access": ["*"] 10 } 11 HMACSHA256( 12 base64UrlEncode(header) + "." + 13 base64UrlEncode(payload), 14 "my-secret" 15 )

Figure 3.3: Example JWT for PCIT-VR, the red encoded part represents the header, the blue encoded part represents the payload and the green encoded part represents the secret.

services, the user state must still be stored in a database of the requesting application along with the permissions of the user: the access token on its own provides no information about the user. Also, the OAuth 2.0 protocol is quite complex to implement. Although many li-braries are available for implementing OAuth 2.0, for example OAuth for Rocket [27], the application must support the API routes OAuth 2.0 requires. OAuth themselves also state that it is not recommended to use OAuth 2.0 as an authentication protocol [22].

In summary: OAuth 2.0 is a protocol that can be used to easily integrate other services into applications but should generally not be used as authentication means on a single ap-plication.

3.3.4 JSON Web Tokens

JSON Web Tokens (JWT) are an open token deﬁnition standard for providing secure claims developed by auth0, an authentication and authorization service provider [2]. JWTs are tokens that consist of three base64 encoded parts, separated by a dot:

1. Header: this contains information about the used algorithm and token type. 2. Payload: information stored in the token, for example: claims or user information. 3. Signature: a signature that is created by hashing the header and the payload using a

server side secret.

As JWTs are signed, the server can always validate that the claims in the token are valid and have not been tempered with: if one were to edit the claims in a token, for example that to show they are an administrator, the hash of the JWT becomes invalid. An example token that is usable for PCIT-VR is shown in ﬁgure 3.3.

This token is served using the authorization header, similar to the regular bearer token. The structure of a JWT is interesting for use in PCIT-VR: when a therapist logs into the system they are granted a JWT similar to the one in ﬁgure 3.3, giving them access to the editing application and all the exported sessions. When a user accesses a sharing link they receive a token that restricts their access to the exported sessions that they have accessed the sharing links of. An upside is that no user permissions have to be stored in the database: when a user accesses a certain API route, the token claims are checked by validating the token using the secret only the server knows.

(21)

A downside to JWTs is that they cannot be easily invalidated: a blacklist of tokens has to be kept to know which tokens are no longer valid. Another solution to this problem is using token expiry: whenever a token passes this signed expiry time, the user has to login again. For parents this means they have to gain access to a new sharing link for each practice session they want to view. By setting the expiry time to the average training time therapists expect, these variables can be set to a user-friendly timeout.

Also, the CSRF attack described in section 3.3.1, is not possible with JWTs: as the JWT is sent as a header to the server and not as a cookie, the JWT is not sent when calling the API of a server from a malicious website. The JWT is only available in the browser storage when the user is on the trusted website.

In summary, JWTs provide the ease of implementation that simple bearer tokens provide along with powerful methods of deﬁning user roles and user permissions in the token itself instead of in the server database. As such, this is the authentication method that should be implemented for PCIT-VR. As mentioned at the beginning of this section, authentication is currently not implemented in the PCIT-VR framework. However, as shown above, JWTs are the most suitable authentication method for the PCIT-VR framework due to their ease of implementation and inherent security.

3.4 Analytics

In the core, analytics are bits of data that tell the maintainer of a service that: something happened at a certain point in time. Although that might seem simple, an improper analytics design beforehand can lead to data that is unstructured so that it is hard to comprehend when reporting on it. On the other hand, making analytics too dependent on the system that is currently in place, e.g. through database relations, can lock the developer in on the design and make analytics structures harder to design in the future. As such we adopt an analytics system equal to that described in [20]. Here an open architecture is sketched that applies to any work ﬁeld, allowing analytics data to be easily be expanded upon and migrated to other systems.

The architecture states that all analytics contain actions or, something that happens. All actions are then represented throughevents, or the fact that an action has occurred, a record of the action taking place. These events are further enriched by the introduction of anevent type, which contains metadata about when the action occurred, where the action occurred and other domain speciﬁc metadata.

For PCIT-VR, basic actions that occur during a practice session are starting a session, ﬁnishing a session and clicking on an answer to one of the annotations. The proposed architecture grants us the freedom to expand upon these actions in the future: as all events are text records, with no reference to other database relations, they can be introduced and removed at will. Due its modularity, this is the analytics architecture that is built into PCIT-VR. An in-depth implementation is described in section 4.4.

3.5 Conclusion

In this section we deﬁned the structure for a production ready PCIT-VR framework. To summarize, the following requirements were researched:

• Authentication and authorization (using JWTs).

• Designing non-linear storylines in the editor and server. • Interactive VR viewing application for mobile phones. • Analytics API.

(22)

In the next chapter, the implementation of a part of the requirements for the PCIT-VR framework is extensively explained.

(23)

CHAPTER 4 Implementation

In this section, the implementation of a part of the features of concept the production-ready PCIT-VR framework mentioned in section 3 is discussed. The concept framework that is designed in the previous section is summarized in ﬁgure 4.1. This chapter focuses on the implementation of the viewing application, analytics API and the server side of editing and exporting non-linear storylines. Implementation of JWT authentication and the editing front-end are not part of the scope of this research project.

Editor Server Player Therapist Parent

Editing

Playing

Interactive VR video

Figure 4.1: The new envisioned PCIT-VR application ﬂow.

4.1 Interaction moments

An interaction moment is a part in a therapy video when the video is paused and the parent is shown an annotation (e.g. "question"). They can then pick certain options (e.g. "answers") in response to that annotation from a list. These interaction moments have to be altered so that they are dynamic, meaning that the answers to an annotation provide a pre-deﬁned next video for the viewer, creating a non-linear storyline. Now, instead of having a single video with annotations, there are multiple videos that each contain their own annotations. Consequently, therapists can direct the path of a parent through a practice session on their performance, for example, by: repeating aspects of a session that a parent has trouble with or by skipping parts of the session that a parent has mastered. An overview of the old and new structure is shown in ﬁgure 4.2 and 4.3 respectively.

(24)

Annotation

Figure 4.2: Static implementation: a practice session is based on a static video, annotations are placed along the video.

Annotation Option Branch

Figure 4.3: Dynamic implementation: a practice session is based on multiple videos and the answers to the annotations deﬁne the path in the practice session.

4.1.1 Database implementation

For the new branch-like behaviour, the main difference is that there is now more than a single series of clips in a project: each series of clips, more efficiently called a segment, is a video on its own and contains annotations which lead to other segments. Here, the end of a practice session is reached when a segment finishes playing and no new branches are introduced. Note that a segment is a term that is only practically defined in the exported version of the project, when a practice sessions is edited, a segment is defined as a list of clips in the database. When a project is exported, a segment is defined as the final list of clips, joined together by FFmpeg to create a video.

Annotations with no options can be added to mark tool tips in the video. Therapists can use them to give hints or instructions for a parent when viewing the exported project. For example: "You might want to consider ... " or "Great job, that was the correct answer!". A summary of these changes is shown in ﬁgure 4.4 and ﬁgure 4.5.

1 * 1 * 1 * projects id int name varchar version int annotations varchar clips id int order_id int project_id int asset_id int position int start_time int end_time int assets id int name varchar state varchar duration int width int height int products id int project_id int project_version int

Figure 4.4: Old project database structure: clips point to a project and are ordered by an order index, annotations are stored on the project as JSON. Created using https:// dbdiagram.io.

A practical example of this new database implementation is shown in ﬁgure 4.6. Here the clips point to a next clip, like a linked list. This linked list of clips is called a segment in

(25)

1 * 1 * 1 * 1 * 1 * 1 * 1 * projects id int initial_clip_id int name varchar version int assets id int name varchar state varchar duration int width int height int products id int project_id int project_version int uuid varchar clips id int asset_id int next_clip_id int position int start_time int end_time int annotations id int clip_id int int timestamp varchar annotation annotation_option id int annotation_id int varchar option next_clip_id int

Figure 4.5: New project database structure: projects point to an initial clip. Each clip might have an annotation associated with it. Annotations are stored in a database table and provide options which point to the next segment in the project tree, giving therapists the opportunity to deﬁne paths in their projects. Created using https://dbdiagram.io.

the server, but has no database entry as shown in ﬁgure 4.5. We can see that based on the option that is picked, a different next clip is chosen.

Figure 4.6: Practical example of the project database structure described in ﬁgure 4.5. Here, a list of consecutive clips is considered a segment.

(26)

4.1.2 API implementation

The old API implementation used a PATCH HTTP request where the client provided the entire updated project structure to save progress on the project. The new branch-like be-haviour imposes a new set of rules that have to be followed by the client to provide a valid end product. For example:

• An option may not point to a clip that is not the initial clip of a segment: as the end product no longer contains separate clips but only segments, it is impossible to ﬁnd the segment to point to.

• A clip cannot just be inserted before, after or in between clips: a proper linked list has to be maintained. If, during a clip insertion or edit, a dangling clip is left over, the server will not know what to do with that clip when exporting. Therefore, the clips before and after the clip that is inserted or edited always have to be checked for validity with regards to the linked list.

The old PATCH API behaviour is thus replaced by a per-clip POST and PATCH API: for each clip insert or clip update the client sends a request to the server. This ensures that the server can check each operation as-is and provide useful error messages to the client at the moment the client requests a change, instead of processing and verifying the PATCH request for the entire project. This new API also has a beneﬁcial side effect: autosave. As each request to update the project is handled on a per-edit basis, the database is immediately synchronised with the state of the client. As a result, the manual save button can be omitted, improving user experience by preventing therapists from accidentally forgetting to save their project.

4.2 Exporting projects

An exported project, denoted as a "product" in the back-end, is composed of a database entry along with a JSON export ﬁle that denotes the annotations in the exported video. For the new API the format of this export ﬁle is changed to allow the viewing application to properly display the non-linear storyline created by the therapist.

4.2.1 Format

In the new export format, a product consists of an array of segments with their accompany-ing annotations. These annotations can point to different segments, directaccompany-ing the viewaccompany-ing application to jump to another part of the session by using the next segment ID of an option as the index in the segments array. Where the old export JSON contained only information about the annotations in the product, the new JSON includes information about the prod-uct, the segments in the product and the annotations in those segments. A JSON schema speciﬁcation of this format is provided in appendix A. An example that validates against this schema is provided in appendix B, this example is the implementation of the non-linear timeline shown in ﬁgure 2.3.

4.2.2 API

Exporting a project

In the old implementation, exporting a project was achieved by an API call that created a product database entry and spawned a FFmpeg job that joined the clips in the background. This joined clip was then exported into multiple proﬁles for Adaptive Bitrate Streaming (ABRS). The ID of this job was shared with the client through the response of the export

(27)

request, which the client used to notify the therapist of the status of the export, potentially displaying error messages if the export failed.

In the new editing application a project consists of multiple segments of clips and thus a single FFmpeg job does not fulﬁll the needs for exporting the entire project. Due to the concurrent nature of Rust, the export API is extended to spawn a job for each segment in the project. As a segment is not dependent on other segments in the project, each segment runs as a separate job, making all FFmpeg jobs run concurrently. Exporting a project now works in the following steps:

1. Generate the JSON for the entire project along with UUIDs for all the segments. 2. Spawn a FFmpeg job for each segment in the project.

3. Return a response to the client with an overview of all the jobs that were spawned. 4. Run jobs in the background.

(a) If all jobs succeeded: update the product to a "Succeeded" state, allowing the viewing application to view the product.

(b) If any of the jobs failed: update the product to a "Failed" state, the product is discarded and cannot be viewed by the viewing application.

Using a simple shared state array, all the jobs know the state of the other jobs, which is either "Pending", "Succeeded" or "Failed". In Rust, this is achieved by sharing the following array between all jobs:

1 Arc::new(Mutex::new(vec![Pending; export.segments.len()]));

Using an automated reference counter (Arc) that contains a Mutex to the list of states, we ensure that this data is always freed when no more jobs are running. The list is also mutated by only one job at a time as each job has to gain access to the mutex lock of the array before being able to mutate it.

Sharing a product

For the products API, the therapists want clients to easily access the products, without any authentication steps. However, sharing these products should not be implemented trivially. A trivial API route like:

1 GET /api/products/<product_id>

allows callers of the API to access other products by simply incrementing the product id from the id that is present in the URL. Thus, sensitive practice sessions can easily be looped over and indexed. To complement this, products are accessed using version 4 UUIDs: randomized 128-bit numbers used to globally identify assets [17], for example: 9a2ac409-5a2b-4cd5-a60d-95c3f736a805. As version 4 UUIDs have 6 bits to indicate the version and variant of the UUID, this leaves 122 bits to be used as index, meaning there are2122possible UUIDs to consider when trying to ﬁnd other products through the API, the chance that a brute force collision occurs with a UUID is so small that it is generally ignored. Each product is now assigned a UUID, thus the API route to gain access to an export ﬁle is now:

1 GET /api/products/<product_uuid>/export.json for example:

(28)

4.3 Viewing application

As noted before, the viewing application that was implemented was a suitable MVP for providing evidence that VR can be applied to PCIT [28], however, this application was not production ready as there was no dynamic ﬁle format that could be loaded and presented. Three static training scenarios were hard coded in the application. Here we describe the new viewing application, which is based on the old prototype and supports the new export format described in section 4.2.1.

4.3.1 Technical requirements

The new viewing application has these concrete requirements:

1. Load product export ﬁles.

2. Play segments from the export ﬁle in a video sphere around the user.

3. Pause videos when an annotation is presented and jump to other segments when an option is chosen.

4.3.2 Setup and technologies

As the MVP viewing application used A-frame to render VR environments in a web applica-tion, this technology was re-used as a base for this new PCIT-VR viewing application. One of the biggest advantages of A-frame is the ability to render entire VR environments using simple HTML tags [8].

As this application requires many DOM mutations to dynamically render segments, ques-tions, responses and menus as A-frame HTML tags, a JavaScript web framework is used. This eliminates the need to write the HTML and JavaScript from scratch. The chosen web framework is Vue.js [34] in combination with Nuxt.js [21]. Vue.js is a lightweight web frame-work that has a syntax very similar to that of regular HTML tags. It allows users to dynami-cally render re-usable "components" by specifying how a component appears based on prop-erties that are passed by the client. Nuxt.js is a Vue.js add-on that is used to handle modular page routing, automatic loading of data on start up and integration of external plugins, in this case, A-frame.

As A-frame is a client side script that requires DOM API elements like window and doc-ument, this produces errors when building the Vue.js application on the PCIT-VR server. Happily, Nuxt.js allows us to specify plugins to integrate with other web technologies. A basic client-side plugin for A-frame is implemented which ensures that the entire A-frame API can be used in the Vue components, an overview of this plugin is shown in appendix C.

The combination of these two technologies turned out to be very powerful. As Vue.js is a reactive web-framework, if we alter any of the variables we used to render a page, that part of the page will be re-rendered. These re-renders in Vue.js will also automatically be picked up by A-frame. This resulted in having to write not a single line of JavaScript to manipulate the DOM and allowed us to focus on the critical parts of the application: the logic for loading and playing the exported product ﬁles from the server.

4.3.3 Implementation

Reticle and eye gaze selection

To implement the reticle, a donut-like shape is placed at the centre of the screen which moves with the movements registered by A-frame. To create the eye gaze selection for the parent, we ﬁrst have to give this A-frame cursor a fuse timeout, this tells A-frame after how long a click is registered if the user looks at a certain object. By adding the following property to our camera, this behaviour is handled by A-frame:

(29)

1 cursor="fuse: true; fuseTimeout: 4000"

Here, "fuseTimeout" is the time-out in milliseconds. Now, the user has to look at an object for 4 seconds before triggering a click. To provide visual feedback about how long the user has to look at the object, we can make the reticle grow with the fusing time. This is achieved by adding the following animation properties to the camera:

1 animation__fusing="`property: scale; startEvents: fusing; easing: easeInCubic; dur: 4000; from: 1 1 1; to: 2 2 2`"

2 animation__mouseleave="property: scale; startEvents: mouseleave; easing: easeInCubic; dur: 500; to: 1 1 1"

3 animation__click="property: scale; startEvents: click; easing: easeInCubic; dur: 150; from: 2 2 2; to: 1 1 1"

Here, line 1 is used to scale the reticle up to twice its size for the duration of the "fusing", e.g. gazing, with the object. Line 2 is used to reset the scale if the user were to look away from the object before the time-out has completed and line 3 is used to reset the scale if a click was triggered. To conclude, the gaze selection functionality took 4 lines of A-frame programming, the resulting reticle is shown in ﬁgure 4.7a and 4.7b.

(a) Neutral state of the reticle. (b) Gazing state of the reticle.

Figure 4.7: The reticle implemented in the PCIT-VR viewer, growing when a user looks at an object.

Core and main menu

The new viewing application works by specifying a UUID in the URL after the root. This UUID is the UUID that the viewing application will fetch from the server as an export file. Once a valid UUID is provided, the viewer will start to render the page based on the export file. When the user launches the application they see a menu screen and a start button which can be clicked using "gaze" functionality: looking at an object for a certain amount of time to trigger a click, described in section 4.3.3. Once this button is clicked, the viewer will start playing the first segment in a video sphere. The Vue implementation of this main menu, along with the logic to switch to a segment based on events is shown in listing 4.1.

1

2 <template> <!-- In Vue.js, the HTML layout of a component is in a template tag

-->

3 <a-scene>

4 ...

5

6 <MainMenu v-if="!pressedStart" @click="pressStart" />

7

8

9 <Segment

10 v-if="pressedStart && activeSegment !== null"

11 ref="segment"

(30)

13 @switch-segment="switchSegment" 14 @finished="finished" 15 /> 16 ... 17 </a-scene> 18 <template> 19 ...

20 <script>

21 ...

22 // Properties passed by the parent

23 props: {

24 product: { // The JSON export

25 type: Object,

26 required: true

27 },

28 ...

29 // Data managed by this component

30 data () {

31 return {

32 activeSegment: 'segments' in this.product && this.product.segments. length > 0 ? this.product.segments[0] : undefined,

33 pressedStart: false

34 }

35 },

36 // Methods exposed by the component, also used as event handlers

37 methods: {

38 switchSegment (id) {

39 // Switches the active segment to the provided id

40 },

41 pressStart () {

42 // Sets pressedStart to true

43 },

44 handleTimeUpdate () {

45 // Notify the active segment by calling its timeUpdate method

46 },

47 finished () {

48 // Sets pressedStart to false, re-rendering the welcome menu

49 }

50 }

51 </script>

Listing 4.1: The root A-frame component, implemented in Vue.js.

Here, the main menu is rendered as long as the variable "pressedStart", found in the data section of the component, is false. The resulting view is shown in ﬁgure 4.8. The MainMenu component is a simple component that contains a welcome text and a start button, when the user looks at this button, the main menu emits a "click" event. The event handler for this click, denoted by "@click", sets the pressedStart variable to true using the "pressStart" method in the A-frame component. As Vue.js is reactive, the DOM automatically re-renders when "pressedStart" is updated, hiding the main menu and loading the segment component. This logic is determined by the "v-if" tags deﬁned on the Segment and MainMenu compo-nents: the "v-if" tag contains rendering conditions and determines whether an element is rendered or not. When "v-if" is true, the Segment component containing a video sphere will automatically start playing. As A-frame detects these re-renders, the VR environment will also be automatically updated, starting the session and playing the VR video.

(31)

Figure 4.8: The main menu of the viewing application

Playing the export ﬁle

Playing the export ﬁle is handled by events exchanged between the main A-frame component and the Segment component. As per A-frame standard, video ﬁles have to be loaded into the scene in an a-assets tag, which is located in the main A-frame component as shown in listing 4.2.

1 ...

2

3 <template>

4 <a-scene>

5 ...

6 <a-assets>

7 <img id="background" src="/background.jpg" alt="Clouds Background">

8 <video

9 v-for="video in uniqueVideos"

10 :id="`video-${video.filename}`" 11 :key="video.filename" 12 :ref="'videos'" 13 crossorigin="anonymous" 14 playsinline 15 webkit-playsinline 16 :src="`/server/static/products/${product.uuid}/${video.filename}`" 17 @timeupdate="handleTimeUpdate" 18 @ended="handleEnded" 19 /> 20 </a-assets> 21 ... 22 <a-scene /> 23 </template>

Listing 4.2: Loading the videos of the export project into the main A-frame component.

As a result, the video files needed in the Segment component are stored in the main A-frame component. To allow communication between the Segment and the A-frame com-ponent, the Segment exposes a method: "handleTimeUpdate" as shown in listing 4.3. When-ever the video increases a tick, generally 0.3 seconds, the Segment is notified of that change by the A-frame component by calling its "handleTimeUpdate" method, here the Segment can check if an annotation has to be displayed for that timestamp. If such a timestamp is reached, the Segment pauses the video and fills a variable, "activeAnnotation", with the annotation that has to be displayed, automatically rendering the annotation.

(32)

1

2 <template>

3 <a-entity>

4 <template v-if="!ended">

5 <Videosphere ref="sphere" :src="`#video-${uuid}`" :autoplay="true" />

6 <Annotation

7 v-if="activeAnnotation !== undefined"

8 :annotation="activeAnnotation"

9 @switch-segment="switchSegment"

10 @continue="continueSegment"

11 />

12 </template>

13 <FinishedMenu v-else @click="$emit('finished')" />

14 </a-entity> 15 </template> 16 17 <script> 18 data () { 19 return { 20 ended: false, 21 activeAnnotation: undefined 22 } 23 } 24 ... 25 methods: { 26 ... 27 handleTimeUpdate() {

28 // Check if an annotation needs to be shown

29 // If so: pause the video and update activeAnnotation accordingly

30 },

31 switchSegment (id) {

32 // The user picked an option, emit a "switchSegment" event to the A-frame component

33 },

34 handleEnded () {

35 // Set "ended" to true

36 }

37 ...

38 }

39 ...

40 </script>

Listing 4.3: The segment component, implement in Vue.

The choice of the user is then emitted as an event, "switchSegment", which is bubbled up to the A-frame component. This switches the segment to an ID provided in the event, re-rendering the entire scene automatically. The event, "continue" is emitted when a tool tip annotation is presented; this continues the currently playing video. When the end of a video is reached, the main A-frame component receives an "ended" event from the video. This will call the "handleEnded" method of the segment, setting the finished variable to true. As seen in the listing, this renders a FinishedMenu. This menu congratulates the user and asks if they want to return to the main menu. This final step is achieved using the "finished" event, which is emitted when the users looks at a confirmation button in the FinishedMenu. The finished event is handled by the A-frame component using the "finished" method, setting the "pressedStart" variable back to false, re-rendering the main menu. A schematic overview of the application state is shown in figure 4.9.

These examples demonstrate how powerful the combination of Vue.js and A-frame is. The programmer can focus on designing the ﬂow of the application by describing how

(33)

com-$timeUpdate $finished $sta_rt $pa use $pla_y

$switch segment $finished

@star t @fi nish_ed init @tim eUpd ate @fi nish_ed @con tinue @swi tch Component App condition Event emission Event handler Executed once on open $ @ init

Figure 4.9: Schematic overview of the application ﬂow of the viewing application

ponents interact with each other through events and methods instead of thinking about manipulating the HTML of the web page or setting up event listeners and callbacks: all these tasks are either fully handled by Vue.js’ reactive nature or require less code using event wildcards and component methods.

4.4 Analytics

The analytics implementation is based on the system introduced in section 3.4. Analytics are structured through actions that occur during the practice session, including starting a session, finishing a session and clicking on an answer to one of the annotations. Actions are registered as events, which adds information about when and where the action occurred. For structuring the domain specific data, inspiration was taken from Google Analytics [12]: this introduces five so called "dimensions" to capture extra meta information about the event as well as more intuitive domain specific fields like category, label and value. This setup allows us to break down each event into pseudo normalized terms: category, action, label and value, while providing metadata in non normalized terms: dimensions 1 through 5. This makes the analytics events easier to grasp for the therapists as the domain specific data is presented in a metaphorical way. In the database each analytics event is structured as follows:

• category: the category this event occurs in, for example general sessions or answers to a question. Basically provides context to where on the page this event occurred. • label: a label attached to the event, containing metadata. For the annotation analytics

this ﬁeld contains the question that was asked.

• action: the action that took place, a "click", "ﬁnish" or "start".

• value: a second type of label attached to the event, containing metadata. For the annotation analytics this ﬁeld contains the answer that the parent chose.

(34)

• dimensions: 5 dimension ﬁelds to store extra domain speciﬁc information.

For PCIT-VR, the dimensions are structured as follows:

1. UUID of the practice session.

2. The name of the video that is currently active. 3. Timestamp in that video where the action took place. 4. The name of the parent that created this event. 5. Empty.

All of these fields are implemented as simple text fields. This system opposes traditional relational database structures: the analytics events table is completely separate and has no relations to any other table in the database. However, this design decision supports a highly portable and resilient system: the viewing application defines the implementation of the analytics using the feedback of the therapists. This means that the analytics events are not dependent on the underlying structure of the PCIT-VR server and could easily be migrated to another system or analysed by the therapists as no foreign key look-ups are necessary.

(35)

CHAPTER 5 Results

5.1 Editor

The branch functionality has been fully implemented into the back-end of the PCIT-VR editor. This means that projects can be created, clips can be joined together as segments and annotations can be defined to jump from one segment to another. Whenever an edit is performed, the server is immediately notified using a HTTP request, this has the positive side effect of auto-save, the server and client are always synchronised. When a project is finished it can be exported using the new export format, which uses WebSocket jobs to keep the client updated on the export progress. In addition, an analytics API is created to send analytics events to the server, allowing the therapists to monitor the progress of the parents. Due to time constraints the front-end of the editing application is not finished. The new API is available in the back-end but the front-end has no support for this new API. Consequently, the WarpVR inspired front-end for defining paths in projects is not completed. Finally, the JWT authentication is not finished. Although the definition of the required tokens is finished, the JWT authentication has a lower priority than the newly produced playing application which is needed for practical trials. Consequently, a larger portion of this research project was spent on designing and producing the playing application. Therefore, no practical implementation of the JWT authentication is made.

Using the completed export API, therapists can define exported projects by adding them manually to the static file folders of the server under valid random UUIDs. This ensures that the practical trial of PCIT-VR can continue: by placing an export file along with the desired video files in an export folder on the server, the video can be accessed through the static file routes of the server, thus being available for the viewing application. To aid the therapists in creating exported projects manually, a manual is written. This manual is written in Dutch and provided in appendix D.

5.2 Player

The PCIT-VR player is rebuilt from the ground up using Vue.js and A-frame. The player runs on exported projects provided by the server, allowing the therapists to deﬁne practice ses-sion scenarios which they wish to test in practice. A summary of what viewing a sesses-sion looks like is shown in ﬁgure 5.1. When a parent opens up the viewing application, they are asked to share their name, which is used in the dimensions of the analytics, allowing thera-pists to separate events among parents. Gazing functionality is also added which allows the parents to register a click by looking at an object for a certain amount of time. As a result, no VR equipment is needed to use the application. Analytics events are sent whenever a parent starts a session, ends a session and clicks on one of the options for a question.

(36)

(a) A parent enters their name on start up for ana-lytics purposes.

(b) The home screen of the PCIT-VR player, the ses-sion is started by looking at the "Start" button.

(c) An annotation deﬁned by the therapist that shows two reactions to the situation that just oc-curred.

(d) A tool tip deﬁned by the therapist to help the parent during the session. In this case: showing the correct answer.

(e) When the end of a segment is reached without further annotations, the session ends.

Figure 5.1: Overview of the PCIT-VR player when playing the example practice session, deﬁned in ﬁgure 2.3 and implemented as export in appendix B.

.

As mentioned before, the JWTs are yet to be implemented in either the server or the front-end of the PCIT-VR project. This means that the viewer also does not use JWTs to check if the visitor of the playing URL is actually authorized to view this practice session. Sessions are not shared using temporarily valid sharing links but through their static link on the server. This link is hidden behind a UUID, which makes it difﬁcult to brute force the links, providing some security. However, because users are not individually authorized, it is possible for parents to share the practice sessions with otherwise unauthorized users.

(37)

CHAPTER 6 Conclusion and future work

6.1 Conclusion

The goal of this thesis was to explain the design decisions to turn the basic PCIT-VR editing application and PCIT-VR playing application into production ready applications. The result of the research project presented in this thesis demonstrates the many production ready features that are implemented, for example: branch-like behaviour based on performance and analytics to track parent performance. Although not all the requirements for an easy-to-use editor were met, it is now possible for PCIT-VR to go into practical trial using a functional playing application for the parents. Therapists can modify the JSON export file directly or use API calls to define the practice sessions and measure their client’s performance using the provided analytics tools. To show that the viewing application can be used to represent the desired non-linear timelines, the timeline in figure 2.3 has been implemented in the new export format, as shown in appendix B.

6.2 Limitations and future work

6.2.1 General improvements

Currently there are a few concrete limitations that have to be cleared up before the PCIT-VR project can go into production. First, authentication and authorization using JWTs must be finished, to ensure that only authorized users can access the sensitive data that the PCIT-VR project contains. Secondly, the editors front end must be adapted to exploit the functionality of the new API using a structure similar to that of WarpVR. The limitations of the projects current state and the goals for the envisioned application flow are highlighted in figure 6.1.

Towards a production-ready PCIT-VR framework

Bachelor Informatica