Recommender System Techniques for Computerized Adaptive Testing

(1)

Recommender System Techniques

for Computerized Adaptive

Testing

Casper E.C. Broertjes 10421300

Bachelor thesis Credits: 12 EC

Bachelor Opleiding Informatiekunde University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. Maarten J. Marx Informatics Institute (ILPS)

Faculty of Science University of Amsterdam

Science Park 904 1098 XH Amsterdam

(2)

1 Abstract

Computerized Adaptive Testing (CAT) systems provide a way of computer-based testing which adapts the difficulty of a question to the ability of its user. A CAT system consists of several components, one being the prediction of stu-dent performance. In this research we will attempt to use recommender system techniques to aid this component. We will do this by recreating previous work by Thai-Nghe et al.(2010). They made use of state-of-the-art recommender system techniques like collaborative filtering to predict student performance, which we will also implement. In this research we propose the use of content-based recommender system techniques, by making use of a dataset provided by the KDD Cup 2010 Data Mining Challenge (Stamper et al., 2010). Also, we propose a hybrid approach where we combine content-based and collaborative filtering techniques. We evaluate our predictions by calculating the root mean squared error (RMSE). Our results showed that the use of both content-based and hybrid recommender techniques can aid the student performance prediction component of a CAT system.

(3)

2 Introduction

Recommender systems have proven to be useful in many different areas like movies, music, news and books. With the emergence of Massive Open Online Courses (MOOCS) a new way of online learning has started, where a student’s activity is tracked in high detail. As a result, computerized adaptive testing (CAT) can prove to be more and more effective, as more data is available about examiners. CAT systems try to adjust the difficulty of a question to the ability of an examiner. This research will attempt to aid CAT systems to predict student performance on a given question by using recommender system techniques.

Traditional recommender systems and CAT systems can be seen as similar, since they both try to predict whether or not a given item can be seen as "good" for a user. For example: movie recommender systems will try to predict whether or not a user will like a certain movie. In comparison, CAT systems will try to predict whether or not a user will answer a given question correctly. Both sce-narios use activity in the past of a user to predict future behaviour. Therefore it seems logical to use recommender system techniques in this domain. Not much research has been done in this area, besides the work by Thai-Nghe et al.(2010). In this research several recommender system techniques like collaborative filter-ing and matrix factorization have been proposed to aid CAT systems in their student performance prediction task. Almost all techniques have shown to be accurate in its predictions. However, several popular techniques have not yet been researched, like content-based recommender techniques. This paper will try to answer one main research question:

RQ: Can content-based recommender system techniques aid the student per-formance prediction component of computerized adaptive testing systems? This question will be answered by answering several sub questions:

SQ1: Can we replicate the collaborative filtering recommender system created by Thai-Nghe et al.(2010) ?

SQ2: Can content-based recommender systems compete with state-of-the-art recommender techniques to predict student performance on a given ques-tion?

SQ3: Can hybrid recommender systems compete with state-of-the-art recom-mender techniques to predict student performance on a given question? In order to answer SQ1, we will attempt to recreate several techniques used in the work by Thai-Nghe et al.(2010). From this point we reference to their work by calling it the replication study. Specifically, their collaborative filtering techniques will be recreated, trying to obtain similar results to theirs. We will also discuss to what extend it is possible to recreate their work, given the information about how they conducted their research. When this replication is done, some unused techniques will be added to their research. For SQ2 this

(5)

means a content-based recommender system will be created and for SQ3 this means a hybrid system will be created.

For this research we will use a large dataset provided by the KDD Cup 2010 Data Mining Challenge (Stamper et al., 2010). This dataset contains 8 million records of questions answered by approximately 3500 students. Each record provides information about who answered the question, which question it was, whether or not the question was answered correctly and the skills the question demands. Much more information was available, but was not used.

We will make use of recommender systems to solve the task of predicting whether or not a student will answer a given question correctly. We will evaluate our recommender systems by creating a training set and a test set and by using a common measurement, the root mean squared error (RMSE), to measure the accuracy of our predictions. If recommender systems prove to be accurate, this could indicate that they can aid CAT systems, which then would require further research.

3 Related Work

In this section previous work in several fields will be discussed. In the first sec-tion we will tell about recommender systems in general, where different types of recommender system methods will be explained. In the next section rec-ommender systems in the educational context will be discussed in which an overview will be given on the current applications of educational recommender systems. The last section we will discuss CAT systems in more detail.

3.1 Recommender Systems

Typically recommender systems gather information on the preferences of its users. This is either done explicitly (by ratings) or implicitly (by monitoring behaviour for example in an online environment). Most research papers focus on movie recommender systems (Carrer-Neto et al., 2012), but there is also much to read about different topics like music (Lee et al., 2010), television (Yu et al., 2006), books (Núñez-Valdéz et al., 2012), e-learning (Zaíane, 2002) and many more. Recommender systems can base their recommendations on three dif-ferent filtering mechanisms: collaborative filtering, content-based filtering and hybrid filtering. Hybrid recommender systems provide a way to combine differ-ent filtering techniques. In short, these filtering mechanisms are (Adomavicius & Tuzhilin, 2005):

• Collaborative recommendations: The user will be given items that users who have similar tastes have given a high rating.

• Content-based recommendations: The user will be given items similar to the ones the user has previously given a high rating.

• Hybrid recommendations: A combination of the above mechanisms will be used to establish a recommendation.

(6)

In the following subsections we will tell more about these three filtering mechanisms.

3.1.1 Collaborative Filtering

A promising recommender system technique (which is relevant for SQ1) is col-laborative filtering (CF) (Su & Khoshgoftaar, 2009). Many CF systems have been developed, for both academic and industrial purposes. The first system was the Grundy system (Rich, 1979), which was a system of stereotypes to rec-ommend relevant books to its users. The first systems to automatically predict ratings were: Grouplens (Konstan et al., 1997), Video Recommender (Hill et al., 1995) and Ringo (Shardanand & Maes, 1995). Another famous example is Amazon.com for book recommendations.

Collaborative filtering techniques use a database filled with item preferences per user. Using these preferences a user is matched with neighbors, which resemble users with a similar taste. Items which are liked by a neighbor are then recommended to the user. An important assumption that CF uses, is that if user X and Y rate n items similarly, this means that they will rate other items similarly as well (Goldberg et. al, 2001). When we use CF for this research, this assumption will be extended to the fact that if user X and user Y answer many questions similarly, this automatically means that they will answer other questions similarly as well.

In a regular CF scenario, there is a set of m users (u1, u2, ..., um) and set

of n items (i1, i2, ..., in). Each user ui has a list of rated items Iiu. These lists

of rated items for each user can be placed inside a user-item rating matrix. In practice, most recommender systems use very large sets of items. This means that the user-item matrix of our CF can become sparse, because users typically rate a limited amount of items. Data sparsity plays a big role when a user wants to use the system, but has not rated any items yet (Yu et al., 2004). This is called the cold start problem. Not only is there a cold start problem for users, but also for items. It is difficult for a recommender system to recommend an item which has not yet received any ratings. To measure how sparse a dataset is, usually the coverage of the dataset is calculated. The coverage of a dataset is the percentage of items for which the recommender system is able to provide recommendations.

The most widely known algorithm for CF is the k-Nearest Neighbors (k-NN) algorithm. This method consists of three main components. Firstly, the system calculates the k neighbors who are most similar to the user. Secondly, the ratings of these neighbors are aggregated by an aggregation function. This could simply be the average of their rating or another more complex function. Finally, the predictions from the second component are used to suggest the top N recommendations. In the method section we will tell more about the CF techniques used for this research.

(7)

3.1.2 Content-based systems

To answer SQ2, this section will explain what the state-of-the-art content-based recommender system techniques are. Content-based systems are applied by for example The Music Genome Project (Castelluccio, 2006). Content-based rec-ommender systems usually analyze a set of documents or items with descriptions which are previously rated by a user. They then try to build a user model or user profile based on a user’s ratings. Recommendations are usually based on the matching of attributes of an item against the attributes of a user profile. The result of the prediction will be a relevance judgement or relevance probability that measures the level of interest in that item.

In a survey by Lops et. al (2011) an overview is given of the three main recommendation steps handled by a content-based recommender system:

• Content Analyzer: The first step of the process is done by the content analyzer, which usually uses information retrieval system techniques to gather information. This analyzer gathers features (keywords, concepts, n-grams...), for example from text, to be able to represent an item. These will be stored in the represented item repository.

• Profile Learner: The profile learner module gathers a user’s preferences and tries to generalize this information, so that it can build a user profile. These profiles will be stored in the profile repository. Normally, this profile will be created by some machine learning techniques.

• Filtering Component: The last component tries to find relevant items by matching the profile against items. This component then attempts to create a top N recommendations and shows them to the user.

The profile learner for this research is based on research done by Uluyagmur et al. (2012). They suggested a model where the profile creation process is done by assigning every user’s feature weights for each feature in the feature set. They suggested that predictions could be made by using features like: actor, genre or director.

3.1.3 Hybrid systems

Hybrid recommender systems combine collaborative filtering and content-based filtering (CBF), so that the limitations of both techniques can be overcome. Much research has shown that hybrid approaches can improve the performance of recommender systems, like research done by Melville et al. (2002). They showed that content-boosted collaborative filtering improved accuracy. Also Soboro (1999) has shown that a hybrid approach combining collaboration and content improves the filtering performance when filtering for text. Adomavicius & Tuzhilin (2005) list different ways to combine both systems:

1. Combining predictions: This can be done by linear combinations or a voting system. An early research by Vogt et al. (1996) already showed

(8)

that any information retrieval system will improve when using linear com-binations of scores by different information retrieval systems. Work by Claypool et. al (1999) built upon this research by combining CBF and CF. Their system adjusts weights overtime for every user, for more accu-rate predictions. These weights seemed to stabilize overtime.

2. Adding CBF characteristics to CF models: The CF system will be ex-tended with content-based profiles. In this approach, not the similarity between commonly rated items between users is calculated, but between the content-based user profiles.

3. Adding CF characteristics to CBF models: The most common approach here is dimensionality reduction on a group of user profiles. One could use latent semantic indexing (LSI) to generalize among user profiles.

4. A single unifying recommendation model: This technique is becoming more popular among researchers. Here they try to combine content-based and collaborative characteristics in one single rule-content-based classifi-cation method or unified probabilistic method based on latent semantic analysis.

To answer SQ3 we will use the first combination method. The hybrid system will linearly try to find an optimal weight distribution of both recommender techniques for each user, while minimizing the root mean squared error.

3.2 Educational Recommender Systems

Recommender systems are increasingly making their way into adaptive educa-tional online systems. Most systems try to recommend online learning materials to their users. This section will give a short overview of recent work done in this area.

Research done by Okoye et al. (2011) showed a prototype of a system that could provide recommendations for online library resources to a user, by giv-ing content-based recommendations. They base their recommendations on the knowledge level of the user, by evaluation of their essay writing. Casali et. al (2011) suggested a similar recommender system, which intends to help learners find educational resources that fit their needs and preferences. Another kind of educational recommender systems is created by Hennis et. al (2011). Here knowledge sharing in a peer-based online learning community is central. This system made use of collaborative filtering techniques to recommend resources to other users. Furthermore, recommender systems for course enrollment have been created. In a study by O’mahony & Smyth (2007) a recommender sys-tem was made to aid students in choosing obligatory extra courses outside their core study. Both a collaborative and a content-based recommender system was created. Finally, an educational recommender system for predicting student performance on tests was created by Thai-Nghe et al. (2012). More on this research will be told in the method section, since it is part of our own research.

(9)

3.3 Computerized Adaptive Testing

Computerized Adaptive Testing (CAT) is an approach that chooses and displays questions that fit the ability (performance) of its user. The system processes the answer and then updates the user’s ability. Modern CAT systems make use of item response theory (IRT). There are three item response functions (IRF) which are commonly used, each using different parameters as their basis (Segall, 2005). The most commonly used IRF is the three parameter logistics (3PL) function. This function takes into account three parameters for every question:

1. Item difficulty. An attribute each question contains, indicating the diffi-culty of the question.

2. Slope. A maximum slope which resembles the difficulty of an item for a user given his ability.

3. Guessing chance. If a question is multiple choice for example, chance is taken into account.

The 2PL IRF assumes that there is no chance of guessing a correct answer at all, so it removes the guessing parameter. The 1PL IRF does not only assume that there is no guessing chance, but also assumes there is an identical slope for every question.

According to Weiss & Kingsbury (1984) a traditional CAT systems consists of five separate components. First of all, an item pool component must be cre-ated where the CAT system can pick his questions from. Secondly, the CAT system must find a starting point or entry level for the user. In recommender systems this is known as the cold start problem. CAT systems often use an av-erage entry level to start with. Component three is an item selection algorithm (like IRT). After an item has been answered by a user, the ability has to be updated by the fourth component. The CAT system iterates over component three and four until a termination criterion (component five) is met. Termi-nation could be when the item pool is empty or when the CAT system knows enough about the user’s ability.

4 Method

The method section is divided into three main topics. The first section will explain how SQ1 will be answered. A brief explanation about the content of the replication study will be given. Furthermore, the dataset (which is also used in our research) will be examined with visualisations, statistics and the challenges it poses. It will also explain which CF techniques will be used in the replication. In the second section we will tell how we propose to use content-based recommender system techniques to answer SQ2. The last section will explain how we use hybrid recommender system techniques to answer SQ3.

(10)

4.1 Collaborative Filtering Research Replication

The research done by Thai-Nghe et al. (2010) concerned two main objectives. First of all they attempted to use recommender system techniques like matrix factorization and CF to predict student performance. For these techniques they made use of the MyMedia library (Gantner et. al, 2011), which provides several recommender functions. Second of all they attempted to map the educational data triples to user-item triples in regular recommender system. Normally rec-ommender system triples are of the following form:

user− > item− > rating

They suggested a mapping where a student resembles a user, an item resem-bles a question and a binary state variable (1 if the student was able to answer correctly, 0 if not) resembles a rating. We will call the binary state value the "score" from this point. The form is as follows:

student− > question− > score

Their results showed that matrix factorization in combination with collab-orative filtering were most accurate in its predictions and thus their suggested mapping seemed to be useful. Matrix factorization will not be replicated in this research, as it did not fit in the scope of the project.

To calculate the performance of their recommender systems, they made use of the root mean squared error (RMSE):

RM SE = s_X

ui(rui− ˆrui)

n (1)

where rui denotes the actual score, ˆrui denotes the predicted score and n

equals the number of test cases. Predictions and scores are values between 0 and 1, thus the errors are values ranging between 0 an 1. The lower the RMSE the better the predictions of the recommender system. For this research we will use the exact same evaluation measurement.

4.1.1 Dataset Description

This section will discuss several characteristics of the used dataset. For this research we made use of the "algebra" dataset provided by the KDD Cup 2010 (Stamper et. al, 2010). The KDD Cup is an educational data mining challenge, where the goal of the challenge is the same as our research, namely predicting student performance. This research is not a participation in the challenge. Ex-amining the dataset closely can help in the ways in which we want to implement the recommender system techniques, therefore it seems logical to do so.

(11)

Figure 1: Algebra Dataset Knowledge Component Distribution 0 50 100 150 200 250 300 350 400 450 500 550 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ·105 Knowledge Component Occurrences (x 100.000)

The dataset consists of approximately 8 million rows, each row containing much information about the activity of a student during the taking of a math-ematical question. Much information will not be used in this research, like the given timestamps, telling when a student started a question and when he finished. What we will use is the student ID, question name, score and the knowledge components for each question. A row can be given 1 to 8 knowledge components, telling which skills are demanded of the user to correctly answer the question. The training set consists of 541 unique knowledge components.

Figure 1 shows the distribution of all knowledge components over the dataset. It looks like the number of times a knowledge component is used is decreasing exponentially. If we convert the distribution to a logarithmic scale (Figure 2), we can see that there is a straight line in the middle, also indicating exponential growth.

About 40 percent of the knowledge components occur in 1 percent or less of the rows. There are also many knowledge components which occur less than 10 times. This means not much is known about those knowledge components, making it more difficult for a recommender system to make accurate predictions. To test our predictions a training and a test set have to be created. The KDD Cup itself provides a test and a training set, however the test set does not contain the score for each question. Therefore we cannot measure the accuracy of our system if we use their test set. KDD cup participants had to hand in their predictions of the provided test set, so that the KDD Cup could measure the accuracy. We tried to replicate the test set, by randomly removing rows from our training set and adding them to our test set. We decided to keep the size of their test set, which means that about 10 percent of the training set will

(12)

Figure 2: Logarithmic Knowledge Component Distribution 0 50 100 150 200 250 300 350 400 450 500 550 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Knowledge Component Log Occurrences

be used as a test set. Given the size of the training set, we think that shrinking the training set with this amount will not drastically change our results. 4.1.2 Collaborative Filtering Recommender Techniques

In this section an attempt is done to recreate the CF system in the replica-tion study, by using state-of-the-art similarity measures. Thai-Nghe and his colleagues did not tell how they obtained their CF results in much detail. They decided to only explain the algorithm they used for matrix factorization. Their CF results were stated in their table of results, without any further explana-tion. This makes it more challenging to replicate their results. We decided to use state-of-the-art methods for this research.

As stated in the related work, the K nearest neighbors will have to be found, which then can be used in aggregation functions to weigh the found scores for each nearest neighbour. There are three commonly used aggregation functions (Adomavicius & Tuzhilin, 2005) to calculate the score rcsfor user c and item s:

rcs = 1 N X c0_{∈ ˆ}_C rc0_,s (2) rcs = κ X c0_{∈ ˆ}_C sim(c, c0) × rc0_,s (3) rcs = rc+ κ X c0_{∈ ˆ}_C sim(c, c0) × (rc0_,s− r_c0) (4)

(13)

where k is a normalization factor and is defined as: κ = 1 ×Xs ∈ SCrc,s

and the average score (rc) of user c is defined as rc = (1/|Sc|)

X

s∈Sc

rc,s where

Sc=s ∈ S|rc,s6= φ

The first aggregation function can be seen as a simple average of the scores of a user’s most similar neighbors. The second function is a weighted sum of all similar neighbors. However, using these functions does not take into account that there might be a bias in a user’s scores. For this reason the third aggregation function might be more suitable. The third aggregation function uses an adjusted weighted sum, where the deviation of the average rating of a user is used, instead of the rating itself. This approach seems useful when predicting ratings for a movie or book, however when looking at a situation where one wants to predict student performance, this might seem less important. In order to be able to use one of the aggregation functions, different ap-proaches for calculating similarity between users have been suggested. Mostly, correlation and cosine-based approaches are used. Sxy will represent the set

of items rated by user x and user y. For the correlation approach, Pearson correlation will be used:

sim(x, y) = X s∈Sxy (rx,s− rx)(ry,s− ry) s X s∈Sxy (rx,s− rx)2 X s∈Sxy (ry,s− ry)2 (5)

When we use the cosine-based approach, similarity is measured by repre-senting both users as a vector in M-dimensional space, where M is equal to Sxy.

||x * y|| denotes the dot-product of both vectors. The angle between the two vectors is calculated by:

sim(x, y) = cos(−→x , −→y ) =

x2_{− y}2

k−→x k ∗ k−→y k (6) For both the cosine and Pearson approach, all three aggregation functions will be tested.

4.2 A Simple Baseline

To compare our results we decided to create a simple baseline. By creating a baseline we can get a better sense of how well our recommender techniques work. We decided to use two easy to implement baselines. We will either always predict a wrong answer (0) or always predict a correct answer (1).

(14)

4.3 Content-based Recommender Techniques

As we stated in the related work, we will use the content-based recommender techniques suggested by Uluyagmur et al. (2012). In our dataset every question is labeled with several knowledge components, which we will call features. We calculated feature weights by calculating the probability that a user answers a question correctly, given a particular feature:

w(c, j) = X i∈Itrain c,j rc,i |Itrain c,j | (7)

where w(c, j) denotes the feature weight of feature j for user c, Itrain

c,j denotes

all questions answered by user c which contain feature j and rc,iequals the score.

The prediction function used for this research will not be exactly what Uluyagmur and his colleagues used, because it seemed illogical to use that func-tion in our domain. Their funcfunc-tion is as follows:

r(c, i) = 1 |Di|

X

j∈Di

w(c, j) (8)

where Di equals the set of features in item i. In other words this function

takes the average score of the features of an item. In our domain this average seems less logical to use, since when a question requires a skill which a certain user has always answered incorrectly, the odds are that he will answer the ques-tion wrongly despite the presence of an easy skill. Therefore, we decided to use a different function, namely:

r(c, i) = min(wc,j1, ..., wc,jn) (9)

where jn is equal to the number of features an item has. We expect that

taking the minimum will be more accurate to predict student performance.

4.4 Hybrid Recommender System

Our hybrid recommender system will use a combination of our best collaborative filtering system and our content-based recommender system. For each user it will attempt to find the ideal combination of both systems, by linearly trying every possible combination. An interval of 0.01 will be used.

We first start by giving our collaborative filtering system a weight of 1.0 and our content-based system 0.0. We then increment our content-based system by the interval of 0.01 and reduce our collaborative filtering weight by 0.01. For each setting we calculate the root mean squared error (RMSE) and save the setting for each user with the lowest RMSE. After we calculated all weights we will attempt to predict all questions in the test set using those weights.

(15)

Figure 3: Cosine Similarity based Collaborative Filtering. RMSE per number of nearest neighbors 0 10 20 30 40 50 0.35 0.4 0.45 0.5 Number of neighbors RMSE

5 Results

The results section will first show the results of our replication of the collabo-rative filtering techniques used by Thai-Nghe et al.(2010). After that we will show our content-based results and hybrid results. We will finalize this section by comparing our results to those of Thai-Nghe and his colleagues.

5.1 Collaborative Filtering Results

In Table 1 we can see the RMSE scores for our different collaborative filtering algorithms, compared to our baseline.

As we can see in Table 1, almost every technique outperformed our baseline, except for our techniques with the first aggregation function (2). We saw in our results that if we increased the number of neighbors for the first aggregation function, our RMSE kept increasing. This may be caused by the fact that a simple non-weighted average is taken from the scores of their neighbors.

Our best performing similarity measure is cosine (6) in combination with the second aggregation function (3).

Overall we can see that Pearson performs slightly less well than cosine. The number of nearest neighbors seemed to stabilize for all well performing algo-rithms after about 40 neighbors. In figure 3 we can see how the RMSE stabilizes for our cosine-based approach.

(16)

Table 1: Collaborative Filtering RMSE

Technique RMSE number of neighbors Baseline 0.40024 n.a. Cosine, aggregation 1 0.46299 2 Cosine, aggregation 2 0.37132 55 Cosine, aggregation 3 0.37612 41 Pearson, aggregation 1 0.41657 3 Pearson, aggregation 2 0.37828 32 Pearson, aggregation 3 0.39421 45

5.2 Content-based results

Our content-based recommender system outperformed both cosine and Pearson with a RMSE of 0.33916.

5.3 Hybrid Results

Our hybrid recommender system made use of our best performing collaborative filter (cosine and second aggregation function) and our content-based system. The weights for each user differed a lot, some relying mostly on the collabora-tive filter, some relying mostly on the content-based system and many using a combination of both. The achieved RMSE of our hybrid system was 0.32266.

5.4 Compared Results

Table 2: Our best RMSE per technique versus replication study Our Technique RMSE Their Technique RMSE Content-based 0.33916 Matrix Factorization 0.33752

CF 0.36532 CF 0.32240

Hybrid 0.32266 Matrix Factorization + CF 0.30228 Baseline 0.40024

Overall the results of Thai-Nge et al. (2012) were better. What is similar to their results is that a hybrid approach outperformed all other techniques.

6 Conclusions and discussion

For this research we tried to find out if content-based recommender systems can aid the student performance prediction component of computerized adaptive testing systems. We created several different recommender systems, one by recreating a study by Thai-Nghe et al. (2012) and several others by using state-of-the-art recommender system techniques.

(17)

Our first subquestion (SQ1) was about the replicability of the collaborative filter created by Thai-Nghe and his colleagues. Even though our collaborative filter was quite accurate in its predictions, we were still not able to exactly replicate their results. During the recreation process we found that there were some steps in their research which were not explained. For example, there were no indications of the similarity measure or aggregation function they used. Their research stated that they made use of the MyMedia library (Gantner et. al, 2011), which has a collaborative filtering function. We were not able to find any specifics of the methods used by the MyMedia library, which is why we decided to use the state-of-the-art.

For SQ2 we tried to create a content-based recommender system. If we compare the results of this method to the state-of-the-art methods like matrix factorization from the replication study, we can conclude that content-based recommender systems are also able to aid student performance prediction.

We created a hybrid recommender system to answer SQ3. For this system we used a combination of our collaborative filter and our content-based filter. Our results show that this system is even better than our content-based system, which means that this system can also aid student performance prediction.

We can answer our research question by saying that content-based recom-mender system techniques can help the student performance prediction compo-nent of a CAT system. This research did not concern the implementation of the proposed prediction methods. More research is required to find a way in which these methods can be implemented in a CAT system.

References

1. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible ex-tensions. Knowledge and Data Engineering, IEEE Transactions on, 17(6), 734-749.

2. Billsus, D., & Pazzani, M. J. (1998, July). Learning Collaborative Infor-mation Filters. In ICML (Vol. 98, pp. 46-54).

3. Carrer-Neto, W., Hernández-Alcaraz, M. L., Valencia-García, R., & García-Sánchez, F. (2012). Social knowledge-based recommender system. Appli-cation to the movies domain. Expert Systems with AppliAppli-cations, 39(12), 10990-11000.

4. Casali, A., Gerling, V., Deco, C., & Bender, C. (2011). A recommender system for learning objects personalized retrieval. Educational Recom-mender Systems and Technologies: Practices and Challenges: Practices and Challenges, 182.

5. Castelluccio, M. (2006). The music genome project. Strategic Finance, 88(6), 57.

(18)

6. Gantner, Z., Rendle, S., Freudenthaler, C., & Schmidt-Thieme, L. (2011, October). MyMediaLite: A free recommender system library. In Proceed-ings of the fifth ACM conference on Recommender systems (pp. 305-308). ACM.

7. Goldberg, K., Roeder, T., Gupta, D., Perkins, C. (2001). Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133-151.

8. Hennis, T., Lukosch, S., Veen, W. (2011). Reputation in peer-based learning environments. Educational recommender systems and technolo-gies, 95-128.

9. Hill, W., Stead, L., Rosenstein, M., & Furnas, G. (1995, May). Recom-mending and evaluating choices in a virtual community of use. In Proceed-ings of the SIGCHI conference on Human factors in computing systems (pp. 194-201). ACM Press/Addison-Wesley Publishing Co., 1995. 10. Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R.,&

Riedl, J. (1997). GroupLens: applying collaborative filtering to Usenet news. Communications of the ACM, 40(3), 77-87.

11. Lee, S. K., Cho, Y. H., & Kim, S. H. (2010). Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations. Information Sciences, 180(11), 2142-2155.

12. Lops, P., De Gemmis, M., & Semeraro, G. (2011). Content-based recom-mender systems: State of the art and trends. In Recomrecom-mender systems handbook (pp. 73-105). Springer US.

13. Lu, J. (2004, January). Personalized e-learning material recommender system. In International conference on information technology for appli-cation (pp. 374-379).

14. Okoye, Ifeyinwa, et al. "Educational recommendation in an informal in-tentional learning system." Educational Recommender Systems and Tech-nologies: Practices and Challenges: Practices and Challenges (2011): 1. 15. O’Mahony, M. P., & Smyth, B. (2007, October). A recommender system

for on-line course enrolment: an initial study. In Proceedings of the 2007 ACM conference on Recommender systems (pp. 133-136). ACM.

16. Núñez-Valdéz, E. R., Lovelle, J. M. C., Martínez, O. S., García-Díaz, V., de Pablos, P. O., & Marín, C. E. M. (2012). Implicit feedback techniques on recommender systems applied to electronic books. Computers in Hu-man Behavior, 28(4), 1186-1193.

17. Pazzani, M. J. (1999). A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 13(5-6), 393-408.

(19)

18. Rich, E., “User Modeling via Stereotypes,” Cognitive Science, vol. 3,no. 4, pp. 329-354, 1979

19. Romero, C., Ventura, S., Delgado, J. A., & De Bra, P. (2007). Personal-ized links recommendation based on data mining in adaptive educational hypermedia systems. In Creating New Learning Experiences on a Global Scale (pp. 292-306). Springer Berlin Heidelberg.

20. Segall, D. O. (2005). Computerized adaptive testing. Encyclopedia of Social Measurement. Amsterdam: Elsevier.

21. Shardanand, U., & Maes, P. (1995, May). Social information filtering: algorithms for automating “word of mouth”. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 210-217). ACM Press/Addison-Wesley Publishing Co., 1995.

22. Su, X., & Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009, 4.

23. Thai-Nghe, N., Drumond, L., Krohn-Grimberghe, A., & Schmidt-Thieme, L. (2010). Recommender system for predicting student performance. Pro-cedia Computer Science, 1(2), 2811-2819.

24. Trewin, S. (2000). Knowledge-based recommender systems. Encyclopedia of Library and Information Science: Volume 69-Supplement 32, 180. 25. Uluyagmur, M., & Tayfur, E. (2012). Content-based movie

recommenda-tion using different feature sets. In Proceedings of the World Congress on Engineering and Computer Science (Vol. 1, pp. 17-24).

26. Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Mea-surement, 361-375. ISO 690

27. Yu, K., Schwaighofer, A., Tresp, V., Xu, X., & Kriegel, H. P. (2004). Probabilistic memory-based collaborative filtering. Knowledge and Data Engineering, IEEE Transactions on, 16(1), 56-69.

28. Yu, Z., Zhou, X., Hao, Y., & Gu, J. (2006). TV program recommendation for multiple viewers based on user profile merging. User modeling and user-adapted interaction, 16(1), 63-82.

29. Zaíane, O. R. (2002, December). Building a recommender agent for e-learning systems. In Computers in Education, 2002. Proceedings. Inter-national Conference on (pp. 55-59). IEEE.

(20)

Data Citations

1. Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G.J., & Koedinger, K.R. (2010). Algebra I 2008-2009. Challenge data set from KDD Cup 2010 Educa-tional Data Mining Challenge. Find it at http://pslcdatashop.web.cmu.edu/-KDDCup/downloads.jsp

Recommender System Techniques for Computerized Adaptive Testing