• No results found

Forecasting demand in pupils’ lecture selection in personalized learning -A comparative analysis to investigate the effectiveness of influence factors in forecasting pupils’ lecture selections

N/A
N/A
Protected

Academic year: 2021

Share "Forecasting demand in pupils’ lecture selection in personalized learning -A comparative analysis to investigate the effectiveness of influence factors in forecasting pupils’ lecture selections"

Copied!
75
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Forecasting demand in pupils’ lecture selection in personalized learning

-A comparative analysis to investigate the effectiveness of influence factors in

forecasting pupils’ lecture selections

Rijksuniversiteit Groningen

Msc Technology and operation management Supervisor: Prof. dr. I.F.A. Vis Second supervisor: Dr. J. Riezebos

(2)

Contents

1.Introduction ... 4

2. Conceptualization of factors ... 8

3. Theoretical background ... 11

4. Description and Analysis of Data ... 17

5. Application of ARIMA model ... 25

6.Discussions ... 33

7. Conclusion ... 35

8. Limitation and further research ... 36

Acknowledgment ... 37

Appendix A. Graphs of registration ... 38

Appendix B. Weekday analysis ... 43

Appendix C . Before and After exam and Grades influence factors analysis ... 47

Appendix D. The preference of AM and PM and of teachers influence factor analysis49 Appendix E. Model developing and forecasting ... 50

(3)

Abstract

The aim of this thesis was to investigate the effectiveness of influence factors in forecasting demand of pupils lecture selection in school which participated the personalized learning program in order to better help schools to allocate resources. General forecasting model was discussed and forecasting model in pupils lecture selection context was selected. With defined influence factors, the specific models for different teacher were built in this thesis. And the detailed forecasting process were represented in this thesis. The developed models were applied in forecasting. The significance of factors during forecasting process were compared to get insights of effectiveness of influence factors and also through forecasting performances of different models, the effectiveness of influence factors were investigated.

(4)

4 1. Introduction

Personalized learning can be defined as the offering of a variety of learning situations to satisfy individual differences among pupils as much as possible. Schools take individual pupils’ characteristics and needs into account, relying on flexible instructional practices in organizing the learning environment (Jenkins & Keefe, 2002). Since schools are considered an important place to deliver knowledge to pupils and to prepare them for future development and competition, the fulfillment of pupils' demands and the improvement of pupils’ learning effectiveness are important objectives (Du & Wagner, 2007). In addition, a school is responsible for organizing a learning environment for pupils. This environment can be recognized as being efficient if it is found to provide and satisfy the pupils with a given bundle of resources (Ray, 1991). According to Jones, Thomas, Evans, Welch, Haug & Snow (2008), an accurate forecast when making decisions and planning for the future is the foundation for more extensive and better allocation of resources, which indicates that forecasting the demand of pupils can assist schools in better allocating resources. In personalized learning, pupils have the authority to determine in consultation with their teachers which specific lecture they plan to attend. In a classical situation, teachers must focus on the mean level of their pupils. In personalized learning with flexible group composition, teachers can take into account what pupils already know and what they would like to know. In addition, pupils can express their preferences for lectures in this setting. Therefore, in personalized learning, the demand of pupils to attend a certain lecture can potentially fluctuate. In order to better allocate schools’ resources, it would be beneficial for schools to conduct a forecasting process to determine pupils’ demands for specific lectures.

Since personalized learning allows pupils to make decisions, to some extent, influencing the factors that affect pupils’ lecture selections may have an impact on the forecasting results. Perry, Hechter, Menec & Weinberg (1993) indicate that academic performance in education involves a complex interplay of pupils and environments. The same holds true for the selection of lectures: demand varies as a result of environments. Factors that influence the lecture selection of pupils could be, for example, interest (Bong, 2001), past experiences including grades (Hallinan & Williams, 1990), teachers (Kelly, 1987), friends (Hallinan & Williams, 1990), time period of a semester, time of lecture, and so on. Since these influence factors may have an impact on the accuracy of forecasting pupils’ lecture selections, it is helpful to investigate the effectiveness of integrating influence factors in forecasts of pupils’ lecture selections, in order to develop a better forecast model for schools to achieve an efficient allocation of resources.

(5)

5

demand by pupils: they both perform predictions of people’s demands. There are qualitative and quantitative forecast models in a service context that could be categorized by three types: qualitative methods, time-series methods and causal methods (Chase Jr, 1997). This thesis applies a time-series model as a starting point which is explained in chapter 3. Influence factors will be added to conduct the comparative analysis to investigate the effectiveness of integrating influence factors. More explanations on choosing time-series models will be represented in Chapter 3.

Bearing this issue in mind, this thesis aims to develop a time series model with specific influence factors for pupils’ lecture selections and compare the outcomes with a time series model that does not consider influence factors when investigating the effectiveness of influence factors on pupils’ lecture selections.

1.1 Problem definition

In accordance with issue presented above, the goal of this research is to determine the effectiveness of influence factors in the forecasting of pupils’ lecture selections through comparing the outcomes of a time-series model with those of a time-series model that includes specific influence factors of pupils’ lecture selections by means of literature review, interview, modeling and data analysis. 1.2 Research questions

For the purpose of answering the problem, five sub-questions are used. These are stated below: 1. What are the influence factors for pupils’ lecture selections?

2. What is an appropriate time series model to represent pupils’ lecture selections?

3. How should an appropriate time series model of pupils’ lecture selections with the integration of factors of pupils’ lecture selections be designed?

4. What are the performance differences between a time series model including influence factors and a time series model without influence factors?

5. How is the effectiveness of influence factors in forecasting pupils’ lecture selections? 1.3 Methodology

1.3.1 Research design

To answer the five sub-questions, this research first divides the sub-questions into four phases: (1) the conceptualization phase; (2) the integration and design phase; (3) the application phase; and (4) the comparison phase.

Starting with the conceptualization phase, Sub-questions 1 and 2 will be discussed in a structural way. Answering Sub-question 1 will provide an overview of factors that influence the pupils’ lecture selections. The complete answer to Sub-question 1 will be derived from a literature review and interviews. Subsequently, the answer to Sub-question 2 will provide an appropriate general time series model, based on a literature review.

(6)

6

Consequently, the designed time series model with influence factors and selected general time series model will be tested, and the comparison of results will assist in investigating the effectiveness of influence factors in forecasting.

Interview and literature review of influcence factors

of pupils lecture selection

Data analysis to defined the inlfuence

factors

A general model for forecasting Literature review on forecasting mnodel Select suitable time-series model

Pilot school test Designed forecasting model with influcence factors Forecasting model without influence factors Forecasting performances Comparing results and Effectiveness analysis of influence factors Model without influence factor Best fit model with

influcence factors

Fig. 1. The structural design of proposed comparative analysis

1.3.2 Research method

In order to sufficiently answer the research questions, this thesis uses various approaches, including literature review, interview, modeling and data analysis, which will answer the sub-questions and, finally, solve the main problem as defined in Section 1.1.

Literature review

(7)

7

this thesis to support the continuing content. Science direct, SmartCat and Google Scholar database, provided by the University library and Google Group, mainly assists the literature review. The keywords for the forecast model are “time series model”,“literature review of forecasting model”, “times series model for costumer demand”, “ARIMA model” and similar phrases. For influence factors, search terms include “pupils’ behaviors”, “influence factors of pupils’ attendance or enrollment”, “socializer in pupils’ decision-making”, and “pupils’ course selections”.

The primary outcome of the literature review is the selected general time series model(s), obtained by comparing features of models with characteristics of forecasting in pupils’ lecture selections and supplementing for influence factors of pupils’ lecture selections.

Interview

Interviews are appropriate when the researchers are seeking to obtain access to an interviewee’s understanding of the world and his or her experiences (Rubin & Rubin, 2011). Thus, interviewing is conducted to obtain some insights of influence factors.

Interviewee selection is based on the availability of preferred experts who empirically observed pupils in a secondary school who were able to select a lecture to attend. According to Rubin & Rubin (2011), two interviewees may have complementary information. In this thesis, interviews with two experts are presented.

In order to conduct the interview, a semi-structured interview protocol is designed, which is displayed in Chapter 2. A semi-structured interview allows researchers to focus on topics but simultaneously discover hidden information (Rubin & Rubin, 2011). After the data collection, a table is presented to visualize the results of the interviews.

Modeling

In order to examine the effectiveness of integrating influence factors, a model with the consideration of influence factors is designed. In this thesis, through the study of literature, one appropriate time series model is selected as the basement for the context of pupils’ lecture selections. Based on the selected model, the defined factors that influence pupils’ lecture selections, which are founded in interview and literature reviews, are bonded in the model. Finally, the designed forecast model with influence factors is implemented in forecasting by means of suitable software.

Data analysis

The data, which is provided by “Zo.Leer.Ik!”, a collaboration of 12 educational institutions for the purpose of developing pupils’ talents and providing education to obtain the best from them (Zo.Leer.Ik!, 2014), is used to give insights into influence factors and, in the pilot test phase, as an input for the model and also measurement of the forecasting performance.

(8)

8

second data would be combined with first data set and then forming the entire data set which is from 28/08/2015-07/04/2016.

1.4 Outline

In this section, the main structure of this thesis is explained. The conceptualize phase is described in Chapter 2; this phase primarily focuses on the influence factors of pupils’ behaviors that are significant for pupils’ lecture selections, as derived from interview analysis combined with literature support. Chapter 3 exhibits the literature study on forecasting models. Here, Section 3.1 represents the motivation of selecting a time series forecasting model. Then the main focus of Section 3.2 is the motivation for selecting a suitable specific time series model and its applications. Chapter 4 focuses on the identification of influence factors. Modeling that integrates the defined influence factors is conducted in Chapter 5 and also the forecasting process would be finished in chapter 5. The last part of chapter is the compassion of forecasting performances based on effectiveness of influencing factors. Then the discussion would be explained in chapter. Finally, there are conclusions and limitations.

2. Conceptualization of factors

In general, the pupils’ lecture selections reflect the pupils’ behaviors. This chapter primarily builds on interview results with the support of literature reviews.

As mentioned in the methodology section, interviews are conducted with two experts who made the observations in a secondary school that has implemented the personalized learning project. The interview protocol is shown in Table 1. In this regard, the expert interviews can be identified as insights into the influence factors and can also be complementary with patterns found in literature. The collected interview data is qualitatively analyzed. In addition, Spronk (2016) is also regarded as a source of definition of influence factors; this work investigates the definition factors of pupils’ lecture selections through interviews of secondary school teachers. From the interview transcriptions, there are 11 overarching influence factors that drive the pupils’ demands.

2.1 Friends

(9)

9

school is a social activity and cooperation is a type of motivation for lectures. Therefore, ‘friends’ is a possible influence factor for pupils’ lecture choices.

2.2 Differences in school years

The school year level of pupils influences their lecture selection behavior (expert interviews; Spronk, 2016). The higher the grade level of a pupil, the greater the probability that a pupil will make his or her own selection. For example, they may choose a lecture for interest or career preparation. In this case, the influence of the factor ‘friends’ is weakened for pupils at higher grade levels (expert interviews).

2.3 Interest in a certain course

Interest is one significant factor that mainly considers pupils themselves. There is an expectation that if a pupil can choose among everything, he or she will probably select what he or she likes best (Spronk, 2016). For example, in a certain week, for a certain course, within the limitation of the number of lectures that they can attend, pupils are more likely to choose lectures that they are more interested in (expert interviews). In addition, according to Bong (2001), pupils who are intrinsically interested in topics covered in the present course are more willing to take similar courses in the future. The more interested pupils are in a subject, the more involved they become in their assignments, putting effort into their studies and engaging in deeper levels of thinking (Hayden, Ouyang, Scinski, Olszawski and Bielefeldt, 2011). In lecture selection, pupils who are interested in a specific course are more likely to select that lecture in a certain step because they can more deeply learn about it. Therefore, ‘interest’ is defined as one of the influence factors.

2.4 Learning style

According to the expert interviews, learning style preferences can be an influence factor for pupils’ lecture selections. Learning style preferences are manners in which learners can most efficiently and effectively perceive, process and store what they attempt to learn (James & Gardner, 1995). For instance, pupils who are self-studying are more likely to select a self-studying lecture. Some pupils require instructions while some prefer learning by themselves (expert interviews; Spronk, 2016). And pupils may also find a practical lesson more enjoyable than a lecture (Spronk, 2016). Hence, learning style can influence the selection of pupils.

2.5 Difficulty of a course for an individual pupil

Pupils state that they will select a course that they find difficult more often than a course that they find less difficult (expert interviews). However, teachers expect pupils to select easier courses (Spronk, 2016). Therefore, the difficulty of a course does have influence on pupils lecture selection from their own side and also teachers side.

2.6 Grades

(10)

10

2.7 Teachers

From the expert interviews, the teacher is a factor that influences pupils’ lecture choices in various ways. To what extent they like the teacher, whether they get used to the teacher and the experiences of the teacher all influence pupils’ choices of lecture (expert interviews). One factor specially mentioned is that if a teacher is an intern, pupils are less likely to choose that teacher. Furthermore, pupils can have preferences for specific teachers. To what extent a pupil likes a teacher can influence the lecture selection. In one of the schools investigated by Spronk (2016), there was only one teacher available per course. In such a case, pupils’ preferences for teachers have no influence, as pupils cannot select a certain teacher. The teaching method could also influence pupils’ lecture selections and corresponds to the learning style of pupils. If pupils become accustomed to a certain teaching method, they might prefer to select a certain teacher. Therefore, teachers have impact on pupils lecture selections.

2.8 Advice from coach

A coach has a different role than that of a teacher. A coach is responsible for guiding pupils in their learning process. Advice from the coach has an impact on the behavior of pupils, as pupils often listen to the coach’s advice. For example, if the coach suggested pupils select a certain lecture, pupils would then select this lecture instead of others (Spronk, 2016). In personalized learning program, advice from coach can be regarded as an influence factors of pupils lecture selection.

2.9 Parents

Parents play a similar role as a coach does to some extent (expert interviews). Across a range of studies, a conclusion has emerged that parental involvement in pupils’ education correlates with pupils’ learning success (Hoover-Dempsey & Sander, 1995). Moreover, according to Hoover-Dempsey & Sander (1995), parents can influence pupils’ educational outcomes by modeling behaviors and attitudes, reinforcing learning and providing direct instruction. Through these three methods, parents can also give suggestions to pupils according to their academic process in lecture selection. Parental involvement in pupils’ academic lives is indeed a powerful influence (Keith, Keith, Troutman & Bickley, 1993). Based on these findings, it can be said that ‘parents’ are one of the influence factors of student lecture selection.

2.10 Capacity

The factor of capacity of teachers and classrooms was proposed, according to expert interviews. If there is still capacity available for a certain lecture shown in the system, then there is the possibility for pupils to select this lecture; otherwise, they must make other selections. So capacity can influence the decisions of pupils.

(11)

11

When exams are fast approaching, pupils are more likely to select courses for which they need more explanation, so there may be some turbulence of pupils selecting a certain course in a period (expert interviews; Spronk, 2016). In addition, pupils are more likely to select a specific course in the afternoon instead of in the morning (Spronk, 2016). Time period would influence the choice of pupils.

2.12 Overviews

Table 2 below summarizes the influence factors mentioned above. In Table 2, learning style is combined with teaching method. All the influence factors are represented categorially .

In order to properly elaborate on the factors, a table was developed, listing source, data type and potential gathering source. The sources of these factors are mainly from expert interviews and literature review. According to Robinson (2014), data can be divided into categories A, B and C. Category A data means data are known or have been previously collected; Category B data need to be collected and must be feasible to be collected; Category C data are not available and cannot be collected (Robinson, 2014).

3. Theoretical background

This section reviews the current literature of forecast models in a service context and provides an overview of the potential useful model framework for personalized learning in an educational system. The results of this section will serve as a basement for Chapter 4.

3.1 Forecast techniques in a service context

(12)

12

methods are primarily subjective, which involves conducting forecasting by means of subjective judgmental assessments. This method is more suitable for situations absent of sufficient data (Arunraj & Aherens, 2015). A process of making predictions for future performance based on existing historical data or mathematical models is a time series model (Arunraj & Aherens, 2015; Jones et al., 2008). Casual models involve assuming that the forecasted demand is highly correlated with certain factors in the environment (e.g., the state of the economy or the interest rate).

As stated above, there are similarities between forecasts of customer demand and forecasts of the lecture demand of pupils: they both forecast the demands of people. Therefore, it is reasonable to apply the models mentioned above in a lecture-selection context. The most suitable model is selected based on an analysis of the advantages and disadvantages of each model.

As described in Table 4, there are advantages and disadvantages to each method. Qualitative methods have multiple advantages, such as a low cost to develop, fast execution and ability to be used without available data. However, several disadvantages limit the application of a qualitative method. First, since the qualitative method is defined as a subjective tool based on the experience of predictors, it is likely that there may be existing bias caused by predictors. Moreover, the subjective assessments are made based on a certain condition, which makes the prediction inconsistent with another condition, for example, another time spot. In addition, it is not feasible to conduct predictions for large numbers of product (Chase Jr,1997). In personalized learning, there are multiple courses and the demand of pupils changes over time, which also cannot be applied in a qualitative model. Hence, a qualitative method is not able to forecast the demand of pupils’ lecture selections.

In the case of a causal method, it can reveal the casual relationship between independent variables and dependent variables but such a method has no ability to describe and encompass trends or series behaviors in data. In personalized learning, it is still unclear whether there are trends or series behaviors contained in the data. Hence, a casual model is not a perfect fit for forecasting pupils’ lecture selections.

(13)

13

(Jeong, Jung & Park, 2002). In personalized learning, one of the most significant impetuses for choosing a time series model is that the historical data of the demand of pupils’ lecture selection is available, which according to definition can be used as past-measured performance to predict a value at a certain time. Furthermore, time series models can obtain a trend of series of behaviors existing in the data, which may appear in historical data of pupils’ lecture selections. Another impetus of selecting a time series model is that adding explanatory variables and intervention to form a multivariate structural time series model is feasible (Harvey, 1990). In personalized learning, there are influence factors that dynamically interact with each other. A time series model can obtain these influencing factors, which makes it capable for conducting comparative analysis in order to investigate the effectiveness of influence factors.

Based on an analysis of the advantages and disadvantages of different types of models and personalizing learning, this research has determined that it is better to choose a time series model as a basement and then to add influence factors to it in order to conduct the comparative analysis for the investigation of the effectiveness of influence factors.

3.2 Forecasting and time series model

A time series is a set of observations measured orderly through a continuous or discrete set of time points (Chatfield, 2013). Therefore, time series models are statistical models that explain a variable in relation to past-measured performance data and a random disturbance term (Song & Li, 2008). Since they only require historical observations of a variable, one of their benefits is that their use is less costly for data gathering and model estimation (Song & Li, 2008). The models are consequently used to extrapolate the time series into the future state (Zhang, 2003). According to Chatfield (2013) and Song & Li (2008), many forecasting procedures are based on time series models and they have been widely used in various fields over the past few decades to conduct forecasting. Several prior studies have been applied to forecast the customer demand, primarily based on time series models including autoregressive model (AR), moving average (MA), exponential smoothing, autoregressive integrated moving average (ARIMA), and seasonal autoregressive integrated moving average (SARIMA).

3.3 Selection of time series model

(14)

14

the data. Whether historical data is non-stationary or stationary is another property that is unknown in the context of pupils’ demand selections. The mean amount of pupils’ demand of a lecture changes with time. In addition, explanatory variables and intervention are other properties of models. In personalized learning, there are influence factors that would have an impact on forecast results.

Based on the above situation, this thesis intends to conduct an ARIMA model as a starting point for

investigating the effectiveness of influence factors. There are several reasons for this choice. First, since there are many uncertainties in the data performances of the demand of pupils’ lecture selection, ARIMA is a quite flexible technique that can represent several different types of time series, i.e., pure autoregressive (AR), pure moving average (MA) and combined AR and MA (ARMA) series (Zhang, 2003), which can reveal the existing potential relationship. The MA time series can be suitable for the condition of a long-term trend of a certain lecture. In addition, with AR inside the ARIMA, autocorrelation of past performances is indicated. It would be a good starting point for testing the properties of the data. It is also capable of dealing with stationary and non-stationary historical data. The crucial limitation of ARIMA is that it assumes a linear correlation structure between predicted value and historical data, hence no nonlinear patterns can be captured by ARIMA models (Zhang, 2003). However, because of the relative simplicity of understanding and implementation, linear models have been the primary focuses of research and the tool most often applied (Zhang, 2003). In many examples of forecasting customer demand in a service context, which is similar to pupils’ lecture selection, the ARIMA is used as a basic model, examples of which are listed in Table 6. Based on the justifications, it can be found that ARIMA is a widely and successfully used basic time series method for forecasting costumer demand and provides acceptable results in different sectors with the assumption that a linear correlation structure is established using time series values. Furthermore, of these examples, some ARIMA models with explanatory variables or intervention variables have been successfully implemented.

(15)

15

the patient attendances in the emergency department. In the field of electricity demand forecasting, Juberias, Yunta, Moreno & Mendivil (1999) constructed an ARIMA model using meteorology to forecast consumers’ electricity demands. More variables are added in ARIMA modeling by Hor, Watson & Majithia (2006), including special days and weather. Bianchi, Jarrett & Hanumara (1998) use ARIMA to forecast the number of calls to telemarketing centers. Based on these examples, it can be seen that ARIMA can be used to deal with practical problems in which the data is not linearly related. There are primarily two types of applications of ARIMA in various fields, one includes influence factors, and the other type uses no influence factor. Combined with the properties of pupils’ lecture selections, non-linear related data can be assumed to be linear to simplify the problem and still provide acceptable results. In addition, the explanatory variables can be obtained by ARIMA, which makes it suitable for forecasting the demand of pupils’ lecture selections.

Therefore, this thesis applies ARIMA as a time series model as basement to obtain the performance of forecasts, and adds specific influence factors derived from personalized learning in ARIMA to investigate their effectiveness.

3.4 Description of ARIMA

In ARIMA, the predicting value of a variable is assumed to have a linear relationship with several past observations and random errors (Zhang, 2003). Its equation is shown below:

𝑦𝑡= 𝜃0 + 𝜙 1𝑦𝑡−1+ 𝜙 2𝑦𝑡−2+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 (Equation 1)

where yt is the forecasting value in time t; εt is random error at time t; and

(16)

16

One central objective of the ARIMA model building is to determine the parameters. There are three steps required to arrive at the final model. First, data transformation is required to make the time series stationary, which is an essential condition for developing the ARIMA model. If the data were non-stationary, differencing would be applied to stabilize the data variance. Therefore, the order of differencing (d) must be determined. Subsequently, the autoregressive order (p) also needs to be defined. In addition, moving average order (q), which is the number of lagged forecast errors in Equation 1, should be tested. The designed model is then diagnosed to check its adequacy by running it until the residuals appear as white noise. Finally, the diagnosed model is used for forecasting the future values.

3.4.1 ARIMA with influencing factors

In order to investigate the effectiveness of influence factors for pupils’ lecture selections, influence factors should be included in an ARIMA model. Since it already assumed linearity in time series values, it is feasible to add influence factors in ARIMA by using linear regression. In linear regression analysis, the errors are assumed to be naturally random (Arunraj & Ahrens, 2015). The general ARIMA model with influence factors can be described below (Arunraj & Ahrens, 2015; Zhang, 2003):

𝑦𝑡= 𝛽0+ 𝛽1𝑥1,𝑡+ 𝛽2𝑥2,𝑡+ 𝛽3𝑥3,𝑡+ ⋯ + 𝛽𝑘𝑥𝑘,𝑡 + 𝜃0+ 𝜙 1𝑦𝑡−1+ 𝜙 2𝑦𝑡−2+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 (Equation 2)

where x1,t, x2,t, … xk,t are observations of influence factors corresponding to the dependent variables, and β0, β1, … βt are regression coefficients of influence factors.

The forecasting performances of the ARIMA model alone and the ARIMA model with influence factors will be compared to investigate the effectiveness of influence factors; this is conducted in Chapter 4.

3.4.2 Cross-validation

(17)

17

Theil Inequality Coefficient= (Equation 3)

Where 𝑌𝑡 is actual values, 𝑓𝑡 is forecast values and n is number of forecasts. The value of U will

always lie between 0 and 1. If U = 0, 𝑌𝑡 = 𝑓𝑡 , there is a perfect fit; if U = 1 the predictive performance is as bad as it possibly could be.

RMSE= (Equation 4)

where, in Equations 4, is the forecast value and n is the number of observations. 3.5 Summary of model selection

In chapter 3, the forecasting models in service context which are qualitative methods, time series and causal methods are compared. Based on the advantages and disadvantages, time-series model is selected to apply in pupils lecture selection context. There are multiple models which belong to time-series model. Corresponding to the characteristics of pupils lecture selection context and also according to previous researches, the ARIMA is selected as a starting point of forecasting pupils lecture selections. Furthermore, the general ARIMA model with and without influence factors for different context are illustrated and explained which is the helpful for answering Q3.

4. Description and Analysis of Data

The aim of this chapter is to discover the influencing factors that occur in pilot school to assist in answering RQ3. First, the data is described and then influence factors are investigated separately. 4.1 Description of data

(18)

18

11 pupils in 6th. The data set includes information about the registration and attendance of each pupil for various lectures. Within the registration record of each pupil, the time spot of the lecture and the place where it will be offered are identified. Also, the mid-exam periods are pointed out which are 28/10/2015-03/11/2016 and 13/01/2016-19/01/2016. Furthermore, the various lectures are represented in the data set by the name of the teacher. When performing the analysis, the names of teachers will be replaced by the numbers 1-20 to ensure anonymity (separate attachment). Based on the data provided over the period of 28/08/2016-16/12/2016, the analysis of influencing factors is implemented.

4.2 Data analysis

First, a general analysis of the data is conducted to exclude the lectures for which there is a lack of information. Then, the difference of weekdays that is found in the description of data is analyzed by means of comparing the total registration of pupils on weekdays and it is then be tested using ANOVA to examine the significance of variances among weekdays. Corresponding to Table 3 in Chapter 2, the influence of time period on pupils’ lecture selections, including the difference between AM and PM and impacts of the exam, are investigated. Popularity (preference) of teacher can also be checked by comparing the number of registrations of pupils for each available teacher. Finally, the grades are searched to investigate the correlation between grades and registrations of pupils for a lecture.

For the sake of discovering the influencing factors for pupils’ lecture selections, this thesis analyzes the lectures provided by different teachers separately. Figure 2 illustrates the sum of pupils going to all lectures with x coordinate as the time spot of the lecture. This time series is highly stochastic and it is difficult to observe any patterns. Therefore, the attendance trend for lectures provided by different teachers is investigated separately. As shown in Figure 3, it is difficult to find a general pattern of attendance, since the total number of registrations and availability of each lecture are extremely different..

(19)

19

Figure 3. Registration of individual teacher

In order to gain more insights into patterns of registration for single lectures that are represented in this thesis by teachers, this thesis excludes some lectures that are lacking for information. From Figures 4 and also Appendix A figure 2 through 21, it can be seen that some teachers did not provide lectures over a relatively long period and even some of them only provide a few lectures over a three-month period. This thesis excludes data for teachers numbered 10 through 20. The fact that such teachers provide only a few lectures may be caused by the assignment of teachers in different classes and the schedule planning of the school. Teacher 11 mainly provides lectures for pupils in lower grades according to the availability of teachers. Teacher 10 provides lectures for all grades, however this teacher only gives a few lectures to pupils from 3rd to 6th grades according to assignment and availability of teacher. Furthermore, Teachers 12-20 provide significantly fewer lectures in a three-month period, based on the registration and attendance history. Therefore, only Teachers 1 through 9 will be used in further analyses, since they have rich data and may contain lots of information.

Figure 4. Excluded teachers

Based on the general analysis, this thesis primarily focuses on influence factors and the forecasting of registrations for lectures provided by Teachers 1 through 9. As stated earlier in Chapter 2, this thesis attempts to assimilate the effects of pupils’ lecture selection influence factors. Nonetheless, the effect of most influencing factors are not taken into account due to the lack of information, such as the difficulties of subjects and the interests of pupils. Only the fields of weekday, mid-exam, AM and

0 10 20 30 40 2015 082612 201 5083 112 2015 090215 2015 090715 2015 091014 2015 091514 2015 091812 2015 092312 2015 092812 2015 093015 2015 100612 2015 100814 201 5101 314 2015 101613 2015 102715 2015 110613 2015 111112 2015 111612 2015 111815 2015 112315 2015 112612 2015 120111 2015 120313 2015 120812 2015 121014 2015 121513 2015 121811

Registration of separate teacher

(20)

20

PM, preference of teacher, and grades are investigated using the data available. The descriptions and representations of the considered effects are investigated as discussed in the following sections. 4.2.1 Weekday analysis

Since the school is closed on Saturdays and Sundays, the weekdays of Monday through Friday are used in the calculations. The number of pupils attending on various weekdays is summed. It is found that the highest number of registrations and attendance of all lectures over three months occurs on Wednesdays (Figure 5). In order to investigate that if the highest amount of registration was caused by the availability of teacher, the sum of the availability of teachers provided by pilot school in week day are represented as 9, 12, 10, 7, 8 units from Monday through Friday respectively, which is not in the same trend as registration. So the peak day is not caused by availability of teachers. Hence, it is essential to investigate the weekday influence.

Figure 5. Total registration of all lectures and availability of teachers on weekdays

In order to check the pattern for one week, the selected lectures precisely provided by teachers, which are continuously over three months, are investigated. It can be seen in Figure 6 that Teachers 2, 5, 6, and 7 have the highest total number of pupil registrations for lectures on Wednesday, and Teachers 1, 4, 8, and 9 received the highest total number of pupil registrations on Tuesday. Lectures of Teacher 3 that are available only on Friday and Wednesday have the highest number of pupil registrations on Friday. Exact numbers of the registrations per teacher on weekdays can be found in Table 1 in Appendix B. 9 12 10 7 8 0 5 10 15 0 500 1000 1500 2000 2500 3000

Monday Tuesday Wednesday Thursday friday

total registration in weekdays

0 200 400 600 800 1000

Monday Tuesday Wed Thursday friday

(21)

21

Figure 6. Total registrations of selected lectures on weekdays

In order to obtain more in depth insights and to define whether there is a statistically significant difference among weekdays, the one-way analysis of variance (ANOVA) with confidence level at 95% is conducted to investigate the difference among the weekdays of pupil registrations. ANOVA is used to determine whether there are any significant differences between the two or among more means of independent groups that are suitable for weekday analysis.

Table 7. Result of ANOVA test for Teacher 1

Teacher 1 2 3 4 5 6 7 8 9

Sig. 0.067 0.001 0.006 0.054 0.015 0.338 0.00 0.011 0.325

Table 8. Result of ANOVA test for Teacher 1through 9

Based on the result of the ANOVA test shown above, for example, Teacher 1, and in Appendix B, the mean of Tuesday is higher than that of Monday. However, the value in the last column, which represents the p-value, is larger than 0.05. This shows that it is not a significant difference, so there is no peak day for Teacher 1 in any week. In conclusion, considering the fixed schedules of teachers and according to all information obtained from the data, Teachers 2, 5 and 7 experience their peak day on Wednesday of every week; Teacher 3 has his or her peak day on Friday. For others, such as Teachers 1, 4, 6 and 9, there are no influencing factors of weekdays on pupils’ registration because the ANOVA test did not show any significant differences. Therefore, it would be reasonable to add weekdays influence factors to the forecast model for Teachers 2, 5, 7 and 3 and 8. According to Arunraj & Ahrens (2015), an example of adding influencing factors in the ARIMA model does exist. In this article, the average sales of bananas during summer months are higher than sales in other months, and the researchers add dummy variables 1 to summer months and 0 to other months. Similarly, in this thesis, in each week, there is a difference in registrations. Therefore, it is feasible to add a dummy variable (1) to peak days and 0 for the other days.

4.2.2 Mid-term pattern.

(22)

22

From the aspect of teachers, based on the analysis, attendance rates of lectures provided by Teachers 4, 5, 6, 7 and 9 are lower after an exam than before an exam; attendance rates of Teachers 1 and 3 increased when compared with before the exam. However, since the number of absent pupils is extremely lower, and the registration record is the most accurate record, it is better to use the number of registration as a measurement. For Teacher 5, when looking at the average registration before and after the exam, it is found that the average registration reduced by 9 as compared with before the exam. For Teacher 9, the average registration decreased by 10. For other teachers, the average registration is very stable both before and after the exam (exam analysis from teacher aspect in Appendix C). In general, it is difficult to find a significant influence on attendance based on attendance rate, since the difference is quite small. For registration amount, based on the analysis above, only Teachers 5 and 9 show relatively large differences of average registrations before and after the exam, which is approximately 10.

From the aspect of subject (Appendix C table 1), teacher 1 and 7 teach physics and science, teacher 4,8 and 9 teach math, and teacher 2,5 and 6 teach biology. For physics and science and biology, the average registrations keep relatively stable which the difference is smaller than 6. However for math, the average registration reduced 14 after exam. After scanning the data, this is mainly caused by teacher 9 whose average registration was reduced around 10 after exam.

Thus for Teachers 5 and 9, there are fluctuations of the average registration which can be added to the forecasting model. According to Arunraj & Ahrens (2015), a holiday has an effect on the sales of bananas and can be incorporated in the model by coding holidays as 1. In this thesis, a holiday has some similar properties with the exam week. It is not a regular time in the time series, and the sales and registrations do change as a result of the special time spot. Therefore, this holiday factor for Teachers 5 and 9 can be incorporated as a dummy variable here.

4.2.3 AM and PM

(23)

23

Table 8. The investigation of the preference for AM and PM through comparison of the rate.

In conclusion, for Teachers 1, 3, 5, 6 and 7, pupils prefer to attend in the PM instead of the AM. For teachers 2, 8 and 9, pupils are more likely to choose the lecture in the AM. It is therefore feasible to add this influencing factor to the forecasting model. For example, when attempting to forecast the registrations of Teachers 1, 3, 5, 6 and 7, the dummy variable can be coded as 1, similar to what was done for the weekday effect, when the registrations are in the PM. Note that for Teacher 4, the information from the schedule of teachers is not consistent with the actual situation. Therefore, it is impossible to investigate the influence of AM and PM on Teacher 4.

4.2.4 Preference for Teachers

For the subjects that were taught by more than one teacher, differences among teachers providing lectures is observed.

Table 9. The investigation of teacher preference by comparing the utilization of single availability.

It can be seen in Table 9 that, for Math, there is no preference of teachers since, in the unit of availability of teachers, the unit number of pupils registered in each availability is nearly the same. For Physics/Science, it can be found that with limited availability, Teacher 7 is the most popular teacher that has the highest unit registration for each availability (and most of the availability of Teacher 7 is in the AM). For biology, Table 9 shows that Teacher 5 is the most popular teacher. Therefore, in order to include this factor in the model, it is reasonable to add preference for teachers in the forecasting model for Teachers 5 and 7, with dummy variables coded as 1.

4.2.5 Grades

(24)

24

after clustering the grades of all pupils, and total registrations of lectures, Table 2 in Appendix C is obtained. In order to determine the relationship, a correlation test between sum of registrations and grades is conducted. Correlation is only -0.06, which is quiet weak (correlation arranged -1 to 1). So the relationship between grades and registration of subject is not existed. Based on the situation, grade factor would not be added to the model.

4.3 Summaries of data analysis

Based on the analyses described above, it can be seen that different registrations of different teachers have different influencing factors and it is infeasible to use one general model to forecast registration of different lectures (teachers). According to chapter 3, there is a general model containing different factors. So it is feasible to build specifics models for individual teachers to answer Q3. Table 10 below summarizes the factors for each teacher in order to help build the model to be used for forecasting.

Teacher Weekday

effect

Exam effect

AM and PM Preference of teachers Grades

1 no no Coded as 1 when

PM

Coded as 0 when doing forecasting of Physics/science no 2 Coded as 1 on Wed no Coded as 1 when PM

Coded as 0 when doing forecasting of Biology no 3 Coded as 1 on Fri no Coded as 1 when PM no no 4 no no no no no 5 Coded as 1 on Wed Coded as 1 after exam Coded as 1 when PM

Coded as 1 when doing forecasting of Biology

no

6 no no Coded as 1 when

PM

Coded as 0 when doing forecasting of Biology no 7 Coded as 1 on Wed no Coded as 1 when PM

Coded as 1 when doing forecasting of Physics/science no 8 Coded as 1 on Tuesday no Coded as 1 when AM no no 9 no Coded as 1 after exam Coded as 1 when AM no no

Table 10. Influencing factors in the model.

(25)

25

be investigated to check the effectiveness of influencing factors during forecasting process. And two re-divided data sets would be utilized to gain performance in which one containing 70% of observations and another 30% observations.

5. Application of ARIMA model

In order to completely answer Q3 and gain the results of Q4, chapter 5 would primarily focus on developing an ARIMA model furthermore applying designed model with given data sets. First, the general model for forecasting pupils lecture registration with influence factor would be represented based on chapter 3 and chapter 4. Subsequently, specific models for teachers would be designed to obtain forecasting performances and to do validation. Finally, in 5.3, comparison of performance of models of different influence factors and for different teachers would be conducted to test the effectiveness of influencing factors in order to answer Q5. And in order to explain the forecasting process, teacher 5 would be selected as an example when doing forecasting.

5.1 ARIMA model in pupils lecture registration context

In this section, the general forecasting model in pupils lecture registration context would be illustrated. Then specific models are built based on table 10. As mentioned in chapter 3.4, the general ARIMA model with influencing factors could be described as below:

𝑦𝑡= 𝛽0 + 𝛽1𝑥1,𝑡+ 𝛽2𝑥2,𝑡+ 𝛽3𝑥3,𝑡+ ⋯ + 𝛽𝑘𝑥𝑘,𝑡 + 𝜃0+ 𝜙 1𝑦𝑡−1+ 𝜙 2𝑦𝑡−2+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝+

𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 (Equation 2)

Applying the general model in pupils lecture selection context, the general model of pupils lecture selection would contain 4 influence factors which would be represented as

xw,t,i, xe,t,i, xampm,t,i, xpre,t,i where i represents teacher 1 through 9 and w, e, ampm and pre

represents weekday, exam, AMPM and preference of teacher respectively. As mentioned in data description, the combined entire dataset from 28/08/2015-07/04/2016 would be re-divided into two sets: sample set containing 70% of the observations and validation set containing 30% of the observations. So the sets of the model would contain the sample registration and validation registration and also the teacher i from 1 to 9. And in this thesis, in order to easily explain the date, the date 12am 26th August 2015 would be represented as 2015082612 and the same for other time spot. Applying the general ARIMA model with influencing factors in pupils lecture selection context, the model would be conducted as below:

Objective:

𝑦𝑡= 𝛽0+ 𝛽1𝑥𝑤,𝑡,𝑖 + 𝛽2𝑥𝑒,𝑡,𝑖+ 𝛽3𝑥𝑎𝑚𝑝𝑚,𝑡,𝑖+ 𝛽4𝑥𝑝𝑟𝑒,𝑡,𝑖+ 𝜃0+ 𝜙 1𝑦𝑡−1,𝑖+ 𝜙 2𝑦𝑡−2,𝑖+ ⋯ +

𝜙 𝑝𝑦𝑡−𝑝,𝑖+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 (Equation 5)

(26)

26 i= teacher {1,2,3,4,5,6,7,8,9}

𝑆𝑎𝑚𝑝𝑙𝑒 𝑅𝑒𝑔𝑖𝑠𝑡𝑟𝑎𝑡𝑖𝑜𝑛𝑡,𝑖 = first 70% of registration observations {2015082612,…, 2016040715} 𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛 𝑅𝑒𝑔𝑖𝑠𝑡𝑟𝑎𝑡𝑖𝑜𝑛𝑡,𝑖 =

registration observations of rest 30% of entire data set {12015082612,…,2016040715}

Variables:

𝑥𝑤,𝑡,𝑖: the weekday of teacher i ( Wed=1, others 0)

𝑥𝑒,𝑡,𝑖: Exam effect of teacher i (after exam=1, before exam=0)

𝑥𝑎𝑚𝑝𝑚,𝑡,𝑖: AM and PM effect of teacher i (PM = 1, AM = 0)

𝑥𝑝𝑟𝑒,𝑡,𝑖: preference of teacher i ( preference = 1, no preference = 0)

Parameters:

𝛽1: regression coefficients of xw,t,i

𝛽2: regression coefficients of xe,t,i

𝛽3: regression coefficients of xampm,t,i

𝛽4: regression coefficients of xpre,t,i

𝜙𝑝: parameter of autoregressive orders of the model 𝜃𝑞: parameter of moving average order of the model 𝜀𝑡: random error at time t

Based on table 10, different teachers have different influence factors. Since the forecasting objective is the registration for one teacher, so the preference of teacher would not be taken into account. Multiple specific models for different teachers would be represented separately in table 11.

(27)

27 Teacher5: 𝑦𝑡 = 𝛽0+ 𝛽1𝑥𝑤,𝑡,5+ 𝛽2𝑥𝑒,𝑡,5+ 𝛽3𝑥3,𝑡,5+ 𝜃0 + 𝜙 1𝑦𝑡−1,5+ 𝜙 2𝑦𝑡−2,5+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝,𝑖+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 Teacher6: 𝑦𝑡 = 𝛽0+ 𝛽3𝑥𝑎𝑚𝑝𝑚,𝑡,6+ 𝜃0+ 𝜙 1𝑦𝑡−1,6+ 𝜙 2𝑦𝑡−2,6+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝,6+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 Teacher7: 𝑦𝑡 = 𝛽0+ 𝛽1𝑥𝑤,𝑡,7+ 𝛽3𝑥𝑎𝑚𝑝𝑚,𝑡,7+ 𝜃0+ 𝜙 1𝑦𝑡−1,7+ 𝜙 2𝑦𝑡−2,7+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝,7+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 Teacher8: 𝑦𝑡 = 𝛽0+ 𝛽1𝑥𝑤,𝑡,8+ 𝛽3𝑥𝑎𝑚𝑝𝑚,𝑡,8+ 𝛽4𝑥𝑘,𝑡,8+ 𝜃0 + 𝜙 1𝑦𝑡−1,8+ 𝜙 2𝑦𝑡−2,8+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝,8+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞 Teacher9: 𝑦𝑡 = 𝛽0+ 𝛽2𝑥𝑒,𝑡,9+ 𝛽3𝑥𝑎𝑚𝑝𝑚,𝑡,9+ 𝜃0+ 𝜙 1𝑦𝑡−1,9+ 𝜙 2𝑦𝑡−2,9+ ⋯ + 𝜙 𝑝𝑦𝑡−𝑝,9+ 𝜀𝑡− 𝜃1 𝜀𝑡−1− 𝜃2 𝜀𝑡−2− ⋯ − 𝜃𝑞 𝜀𝑡−𝑞

Table 11. different models for teacher 1 through 9

5.1 Developing an ARIMA model

In this section, an ARIMA model would be developed based on the instruction of chapter 3 to help answer Q4. Teacher 5 who has most influencing factors would be selected as an example to explain the forecasting process.

As mentioned above, the initial step of the ARIMA is to develop a ARIMA model based on sample data. The sample data for teacher 5 is 70% of the entire data set from 12am 26th August 2015 to 12am 29th February 2016 (2015082612-2016022912) and it would be used to define the parameters and then doing the forecasting of period 13am 29th February 2016 to 15pm 7th April 2016 (2016022913-2016040715) which is also the validation data set to measure the TIC and RMSE. As mentioned in chapter 3.4, the smaller the TIC is, the better the forecasting performance is and also for RMSE.

According to the chapter 3.4, there are three steps to develop and gain a complete ARIMA model by define the parameters (d, p, q).

 d is the differencing of non-stationary data;

 p is the number of autoregressive order;

 q is the number of moving average order.

(28)

28

model built following above three step would be used to forecasting. Finally, the TIC and RMSE value would be investigated to compare the performance of forecasting model.

For teacher 5, the sample set contains 247 samples and validation set contains 106 samples. First, registration amount in sample data set of teacher 5 in figure 7 shows that it is not stable.

Figure 7. Registration amount in sample data set of teacher 5

In order to check if it is stationary or non-stationary more accurately, Augmented Dickey–Fuller (ADF) test was conducted which can check whether there is a stationarity in the data and whether differencing is needed or not. In ADF test, the null hypothesis is the data is stationary which means there is no need to do differencing with confidence level 0.05. From the test result (table 13), it can be seen that the time series is non-stationary (0.2121>0.05) and after 1 level of differencing, the sample data would be stationary which illustrated that d=1.

Table 12. Augmented Dickey Fuller test with 0 difference and 1 difference

The autocorrelation and partial autocorrelation were investigated in sample set, to help identify the orders of ARIMA model (table 2 in appendix 5). In order to compare the quality of model with different p and q value more directly, the Akaike information criterion (AIC) was conducted to select the best p and q value. The AIC test value is a measurement of the quality of models for a set of data which can be a criteria for model selection. The test yields an unbiased estimate of the quantity that measures the average fit of the estimated model (Findley & Wei, 2002). Based on the Akaike result (table 13) which can estimate the quality of models, for the sample data set of teacher 5, ARIMA (0,1,1) model is identified as the best model which has the smallest AIC value.

(29)

29

Table 13. Akaike value of models

Then according to the three steps of model developing, the white noise diagnose was conducted by doing Correlogram-Q-Statistics. For a model to be acceptable, no discernable information should be left back in the residuals. A convenient test for white noise is called the Correlogram-Q-Statistics test. A model is acceptable when the null hypothesis that the residuals are white noise is acceptable and it is worth pointing out that the Q test is implemented on the residuals (Jaggia,2014). It can be seen from figure 8, the model (0,1,1) did not pass the diagnose which means there are some hidden information in the data with all Prob. of Q-statistics equals to 0.00. With multiple experiments, model with parameter (1,1,1) went through the white noise diagnose with relative small Akaike value. Hence, the ARIMA (1,1,1) would be chosen as the best one. And the adjusted R-square which represents the fit of model and values is 0.5058.

Figure 8. white noise diagnose for model (0,1,1) and (1,1,1) respectively

After the three steps above, the ARIMA model for teacher 5 without influencing factors would be ARIMA(1,1,1). With defined (1,1,1), the parameters can be defined which are illustrated in table 14 of column “coefficient”.

-20 -10 0 10 20 30 40 -10 0 10 20 30 40 25 50 75 100 125 150 175 200 225 Residual Actual Fitted

(30)

30 Table 14. Parameters of model

After developing the model, the prediction of period 2016022913-2016040715 was conducted and then compared with actual validate data set. So the forecasting performance was represented as figure 9 and table 15.

Figure 9. forecasting performance

Table 15. TIC and RMSE and Adjusted R-square for teacher 5 without influencing factors.

From figure 9 and table 14, it can be seen that the TIC of model is 0.4 and RMSE is 7.25. 5.2 ARIMA model with influencing factors

As mentioned before, models in table 11 would be implemented to investigated the effectiveness of influencing factors. In this section, also teacher 5 would be selected as an example of forecasting

(31)

31

process to do the forecasting with influencing factors in. And then, the forecasting performance of other teachers would be illustrated in chapter 5.3.

The same with 5.1, teacher 5 would be first tested as an example. In forecasting process, in order to check the significance of the influence factors, 7 rounds of experiments are conducts which are in table 16. As can be seen in table 15, no influence factors show significant influence when doing forecasting which is totally different with the result of data analysis. And for teacher 5, AM and PM is the strongest influence factor. This might be caused by the difference of data set. The data set using in data analysis is part of sample data test, so after more data are available, the defined influence factor might disappeared or weakened. However when checking the adjusted R-square of models, which represent the fit of model and actual value, the best fit can be obtained when adding only factor of AMPM.

Teacher 5 1 2 3 4 5 6 7 8

ARIMA with AMPM E W W and

AMPM AMPM and E E and W AMPM,W and E No factors Adjusted R-square 0.5103 0.5045 0.5049 0.5092 0.5091 0.5036 0.5080 0.5058 Pro. of factors 0.1011 0.8088 0.4548 0.1079; 0.5024 0.0985;0. 7706 0.8168;0.468 0 0.1055;0.519 5,0.7748

Table 16.different adjust R square value combine influence factors in prediction.

Figure 10. forecasting performance of teacher 5 with AMPM

It can be noticed that in figure 10, the TIC and RMSE is 0.4038 and 7.104743 respectively. Comparing the best fit of model with influence factor AMPM and the model without influence factors, the performance of forecasting model was shown as table 17. It can be seen that the performance is slightly improved with both TIC kept the same and RMSE slightly decreased.

-30 -20 -10 0 10 20 30 40 250 260 270 280 290 300 310 320 330 340 350 Y F ± 2 S.E. Forecast: YF Actual: Y Forecast sample: 247 353 Included observations: 107

(32)

32

Table 17. TIC and RMSE of ARIMA model and ARIMA with influence factors for teacher 5.

5.3 Comparison with other teachers

In this section, other teachers would be tested to see if all the results are consistent. First all the performance of different teachers without influence factors are listed in table 15.

Teacher1 Teahcer2 Teacher3 Teacher6 Teacher7 Teacher8 Teacher9

(d,p,q) (2,0,4) (4,1,5) (4,0,3) (2,1,3) (2,1,3) (1,0,1) (2,1,2)

TIC 0.484 0.50 0.6 0.53 0.41 0.38 0.38

RMSE 3.85 6.03 7.19 5.04 5.31 4.55 4.28

Table 18. forecasting performance of teachers without influencing factors

Teacher 1 Teacher 2 Teacher 3 Teacher 6 Teacher 7 Teacher 8 Teacher 9

Best fit model

AMPM AMPM and

W

AMPM AMPM AMPM

and W

W AMPM

TIC 0.338 0.48 0.51 0.54 0.44 0.37 0.37

RMSE 3.14 5.93 6.37 5.17 5.76 4.50 4.147

Table 19. best fit model forecasting performance of teachers with influencing factors

(33)

33

significant impact on forecasting performance and either do not improve the fitness of model and actual value.

Figure 11. significance of factor Weekday of teacher 2 and 7

Figure 12. significance of factor AMPM in selected model of teacher 1 and 3

According to forecasting performances, it can be seen that performances have changes when adding the influence factors however the changes are slight and with both upward and downward changes. As for the effectiveness of individual influence factor during forecasting process, only AMPM and Weekday factors shows significance in some of the teachers and exam period factors did not show significant influence on forecasting process.

6. Discussions

(34)

34 6.1 Effectiveness of influence factors

In ARIMA model with influence factors, the influence factors show different level of impacts on forecasting process and also forecasting performances.

First, within forecasting process, it can be found that AMPM and Wed influence factors have significant influences in some forecasting models, but Exam period do not affect the forecasting process. This phenomenon also can be seem from the data analysis. Since for most of teachers, the average registrations of before and after exam difference less, therefore, it could be the reason that the exam period is not significant during forecasting process. More specifically, for teacher 5 and 9, there are relative big difference, however, the exam period still not significantly influence the forecasting process. The reason that exam period is not significant for teacher 5 and 9 might be the function of other influence factors during process which are not investigated in this thesis, for instance, advice from coach. Since after mid-term exam, the advice of coach might play an important role in pupils lecture selection based on their mid-term grades and also other factors which are not investigated due to lack of information. For the factor of AMPM which shows significant influence on forecasting process of teacher 1 and 3 ( Pro.=0.0013 and 0.0000 respectively) and Wed which demonstrates impacts on forecasting process of teacher 2 and 7 with Pro=0.00 and 0.0003, they obtain significant influence on forecasting process of teacher. Also it can be seen from the data analysis result, registration of all teachers expect teacher 4 are influenced by AMPM factors and more than half of registration of teachers would be effected by Weekday factors. However, it did not affect the forecasting process so much as data analysis shown. The reason might be that the sample data is larger than the first data set which is used in data analysis. In sample data set, the influence factor might be weakened. And another reason could be the influence of uninvestigated influence factors. So for individual influence factor in forecasting process, the effectiveness of factor AMPM and Weekday is stronger and do show actively function in forecasting process, however, the exam period factors is inactive during forecasting process.

(35)

35

the biases from data analysis stage also decrease the effectiveness of defined factors. Furthermore, it can be found during the data gathering process, there are some mistakes and absent of data. If the model can be tested in school every day and be validated with updated data, the performance might be much more accurate and the effectiveness of influence factors might be better revealed. Giving notice to TIC, the fluctuation of TIC value before and after adding the influencing factors are smaller than 0,07. Hence, the effectiveness of influence factors on improving the performance of forecasting model is sufficient but are not significant as expected. The models with influencing factor is not excellently over perform than pure time- series model ARIMA.

Furthermore, when investigating the models, it was found that the interactions between influence factors existing. When doing the modelling of teacher 7, the significance of AMPM was increased by adding the influence factors of Weekdays. As discussed before, exam period factor did not actively working during forecasting process, but it interacted with AMPM and Wed factors during forecasting process and have influence on forecasting performance. As for teacher 5 and 9, after adding the influence factor exam period, the significant of other factors changed and also the forecasting performance changed. For example, when adding the influence factor of exam in teacher 9 with AMPM, the significance of AMPM reduced comparing with only adding AMPM in forecasting process. The forecasting performance improved after adding exam period. It can be explained by that there are preferences of AMPM and fluctuation of registration in exam period. In this case, there might be an overlap of AMPM and exam period, for instance, there are more people attending lecture after exam and at the same time in the afternoon. So these two factors together improved the performance. According to the discussion above, the significance of influence factors would change when adding with different factors. The factors would effectively influence each other and have influence on forecasting performance.

7. Conclusion

The aim of this thesis was to investigate the effectiveness of influencing factors in forecasting of pupils lecture registration. To achieve an answer to main question, this thesis divided the main question into 5 sub-questions.

Referenties

GERELATEERDE DOCUMENTEN

In international human rights law, homosexual couples were first recognized in the context of “a most intimate aspect” of their private life (i.e. in their sexual life). 107

Based on research showing that pupil dilation is positively associated with increased cognitive load (Hess & Polt, 1964 ; see also Just & Carpenter, 1993 ) a possibility

The fact that immigrant pupils with high prior math knowledge were more motivated to cooperate when they received no stimulation of their high quality helping behaviour resembles

Cross sectional data from the 2013-14 Pakistan Social and Living Standards Measurement survey is used to provide estimates for the effect of the benefit on women’s decision making

The variables that were used in the questionnaire were age distribution, language, marital status, occupation, travel group size, nights spent, province of origin,

In this thesis, the research question was: can a mathematical model based on machine learning provide more accurate forecasts as compared to conventional forecasting methods.. In

If the goal is only to obtain forecasts for the demand on BA, the ARIMA and VECM models can be used; the PMS model is harder to use for forecasting because of the inclusion of

Delhij, van Solingen and Wijnands (2015) explained the concept of eduScrum: a so called Scrumbut (an altered form of the original Scrum method) for education. The tree pillars of