• No results found

How to become good in playing the Guessing Game Hicham Boulahfa

N/A
N/A
Protected

Academic year: 2021

Share "How to become good in playing the Guessing Game Hicham Boulahfa"

Copied!
18
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

How to Become Good in Playing the Guessing

Game Hicham Boulahfa

Hicham Boulahfa

.

15 July 2015

Abstract

Abstract In this paper we look at the guessing game and try to find variables that can predict whether a player plays the game good. The game was played by 239 first year economic and bussiness students. Next to the guessing game the students also filled in a questionair containing questions related to IQ and EQ. The results showed that math grade is a good predictor for whether a player will play the game good or not. Also we conducted three highly popular Big Data classifiers using the praised Rapidminer software. These gave us significant non-linear relations to predict player performance.

BSc Economics and Business at the University of Amsterdam, specialisation Economics.

(2)

Statement of Originality

This document is written by Student Hicham Boulahfa who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1

Introduction

Nagel (1995) investigated the behavior of people in the guessing game, in which people choose a number between 0 and 100. The winner of the game is the person closest to the mean of all the players multiplied by the parameter p. This parameter p can be changed between different variations of the game. The person that wins the game receives a price. In the case the games have more than one winner, the price is split. This game is a way of modeling and thereby examining the decision process of people which are in situations in which they have to decide what is best for them, taking into account the decisions of other people. Such a situation is for instance applicable to the stockmarket. In order to make a profit with trading shares, you have to know or at least predict what all other people are going to do. Buying a share which people donât want anymore will lower the demand of the share and cause the share price to go down. But when buying a share that people do want to buy, the demand for the share will give a rise in the value of your share. Judging correctly what other people choose can than make a profit. Nagel (1995) used the values p = 12, 23 and 32 in his experiment. The Nash equilibrium in the guessing game where p is between 0 and 1, in this case p = 12or 23, is p being equal to 0. So when every player chooses the number 0 the game is in Nash-equilibrium. This can be seen by the formula T = p ∗ (x1 + x2 + .. + xn)/n where T is the target and xiwith i = 1, 2, , n the number player i chooses. Because everyone wants to win

the game, they will all set the target as the number they choose, giving the outcome xi= T = 0. For a p of32 the nash equilibrium is not only found at 0 but also at 100. The k-level of a person tells us something about the depth of reasoning. Lets say p equals 23. With a k-level of zero one might think that numbers will be chosen uniformly randomly between zero and hundred with the expected value of 50 to try and win the game. With a k-level of one, a person will think one step ahead, thinking the rest will set their price at 50, and pick 23 of 50. A person thinking that all other players pick 2/3 of 50 will set itâs number at (23)2∗ 50, with a k-level of 2. So in general the k-level of a person can be calculated by the formula xi= 50pki

where kiis the k-level of person i for i = 1, 2, . . . , n. Contrary to the rational choice

of the Nash-equalibrium, people tend to choose something else. This is according to Nagel (1995) the principal of bounded rationality. There is a limitation on the depth of reasoning a player has, making the player just partially rational. The player than guesses a number that best fits its beliefs about the other players. Nagel (1995) found that playing the game more than once, giving players feedback of the winning number after each round, will cause the winning number to move towards the Nash-equilibrium. Players now have more information about the other players after each round and because the winning number is lower than the average number, when 0 ≥ p < 1, the new number people set will on average be lower than the

(4)

previous round. This weakly dominated strategy is also a way of proving that the Nash-equilibrium is equal to 0. In the game where p>1, in our case 3/2, the same reasoning applies. The only difference is that the winning number will be higher than the average number, pushing the choice of players towards 100. Summing up the main findings of Nagel (1995), she found that the starting point of people in their reasoning is 50, that people are bounded in their rationality but when playing the game more than once the winning number will eventually converge towards the Nash-equilibrium. Knowing how the game works and how people play the game, it is interesting to find out what defines a good player in this game. Because the game is won by the player who is able to predict the winning number, it would be logical to look at the intelligence quotient (IQ) and emotional intelligence (EQ) of a player. IQ is a score that measures the intelligence of humans. The average IQ score is 100 with a standard deviation of 15 by definition. Because the guessing game looks at the performance of a person in a game where you try to outthink the other player, one might assume that IQ is a variable that matters in the analysis. EQ is a measure of the ability of humans to recognize ones own and other peopleâs emotions. The average and standard deviation of EQ are the same as IQ. People with high EQ can read other peoples minds or think of ways people would behave in situations Mayer et al. (2004). Knowing how other people think is an important part of the guessing game, because it determines the outcome of the game. One problem with using an IQ or EQ test is the time it costs. But there are variables that are easy to subtract from the participants and are known to be strongly correlating with IQ and EQ. Two of these variables are math performance and school performance, with a correlation of 0.5 with IQ Neisser et al. (1996). Another variable that correlates highly with IQ is the cognitive reflection test (CRT), with a correlation of 0.44 Frederick (2005). This test contains three questions that are usually perceived to be initially counter intuitive. The questions are as follows: “A bat and a ball cost $ 1.10 in total. The bat costs $ 1.00 more than the ball. How much does the ball cost?” “If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?” “In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?” The answers to these questions are respectively 0,05 dollar, 5 hours and 47 days, but people have a hard time figuring these answers out with time constraints. There are also variables that are correlated with EQ. Alcohol abuse and drug abuse are negatively correlated with EQ, while hours of sleep and the number of friends a person has are positively correlated with EQ Brackett and Mayer (2003); Killgore et al. (2007) ; Killgore et al. (2007). Another variable that can be used is just asking what the participant thinks his IQ and EQ are. Petrides and Furnham (2000) looked at all the experiments that had been done with self-estimates of IQ. It this article it is stated that a weak correlation between self-estimates of IQ and real IQ are found.

(5)

This meta-analysis was done 15 years ago, while nowadays a lot more people know their real IQ because more high schools started conducting IQ test, more websites started offering IQ tests and there is even a national IQ test on tv every year. So my rational reasoning lead me to believe that this correlation could only have increased over the years. Petrides and Furnham (2000) also found a correlations between self-estimated EQ and EQ scores.

1.1 Experimental design

There were 239 first year economic and business students at the University of Amsterdam that participated in the experiment. I conducted the experiment before my Macro-economics class started. Because of the availability of a big lecture room, the experiment was conducted all in a short time period. All participants were asked to go to a website, via either a laptop or a mobile phone, where they could answer the questions that were prepared. Some of the questions could be filled in immediately and had no time constraint. Other questions needed explanation via the big screen in the lecture hall and had a time constraint. It was not allowed to talk to other participants. The whole procedure took a total of 25 minutes to conduct.

The first part of the experiment was a questionnaire that they could fill in without any time constraint. Because of limited time it was not possible to conduct an IQ test, so I used variables that correlated with IQ. Two of these variables are math grade and school performance. School performance was measured by asking how many study credits they had managed to score in the first year. At the time of conducting, the maximum possible credits for a student who is nominal in the programme was 42 ECTS. One of the courses they attended was math. Because math skills are necessary to understand the task and score a high IQ, math grade was used as a variable itself by asking the math grade they scored this year in the course mathematics. EQ is the other variable that I wanted to use as predictor for playing the guessing game good. It is very difficult to measure EQ in the short time at hand, so other variables were used that are expected to highly correlate with EQ. Alcohol abuse, drug abuse and having not so many friends are all negatively correlated with EQ. Hours of sleep is positively correlated with EQ. This is why the participants were asked how many close friends they have, how many drinks they have each week, how many drugs they use in a year and how many times they go out for clubbing within month. The participants were also asked wat they would think their IQ and EQ would be. Because the concept of EQ is not so widely known, the students were given the following definition: âEQ is a measure of the ability of humans to recognize ones own and other peopleâs emotions. Average EQ is 100.â The second part of the experiment consisted of three CRT questions, which are mentioned earlier and three questions about the guessing game. Students

(6)

had 90 seconds to answer each question. For the guessing game questions they had to pick a number between 0 and 100 with the three different values of p, namely œ, 2/3 and 3/2. The winning player, which chose the number nearest to p times the average, would win a price. The price was 20 euro each round. The three variations of the guessing game where only played once, so no multiple rounds like Nagel (1995) did. The students were given an extra 90 seconds to understand the explanation of the guessing game.

1.2 Hypothesis

As mentioned before the expextation is that IQ and EQ are positively correlated with playing the guessing game. We expect this result since the guessing game looks at the performance of a person in a game where you try to outthink the other players, so to be more intelligent. Additionally, knowing how other people think is an important part of this game, so predicting how people react in a certain situation. Taking into account the relations mentioned in the previous part, we might expect that math grade, study credits, estimating IQ, estimating EQ, CRT score, number of friends and hours of sleep all positively correlate with playing the game well. Alcohol and drug use are then negativily correlated with playing the game good. The number of times a person goes out for clubbing might be an ambiguous one. Someone with no friends will probably party less than someone with a lot of friends, but drug and alcohol are probably also correlated with the times someone goes clubbing.

2

Results

2.1 Lineair regression

With the help of linear regression we tried to estimate which variables are significant predictors for playing the guessing game good. Our definition of playing the game good is having a low total absolute difference between the chosen numbers and the winning numbers in the three variations of the game combined. In figure the results are shown. One might think that using the k-level of thinking as a dependant variable, like Nagel (1995) did, would easly show which players play the game good. But this k-level would not represent the ability to play the game correct, for a very high k-level could just mean that the player chose a number near the Nash-equilibrium which might be far from the winning number. Before the regression I made some transformations. The distribution of the variables are all shown in the appendix. The data of the variable drug use was quite evenly divided between people using and people not using. So half of people didnât use

(7)

drugs and the other half uses drugs ranging from 1 to 250 times a year. Because of this asymmetric distribution I divided the data into drug users and not drug users, making it a dummy variable. The variables alcohol use, number of good friends and times of clubbing were also quite asymmetrically distributed so I divided the data into low, medium and high. To test the hypothesis mentioned earlier, I used high as base category for the variables alcohol use, number of good friends and times of clubbing. Using an F-test, the model showed to only predict 4.2 % of the variation in the dependent variable (p=0.758). And looking at the adjusted R squared, because of the many variables used, the explanation of variation is even lower. This model can be seen in the appendix. When reducing the model, the best model possible is the one simple regression with only math grade (p=0.042) as a predictor for total absolute difference, explaining 1.7% of the variation. In the table below the values for beta are shown in different options for a simple regression.

Table 1:

Variable Beta Significands

IQ -0.117 0.538

EQ -0.053 0.67

Math grade -2.677 0.042

Hours of Sleep -0.391 0.767

Study Credits -0.149 0.425

Cognitive Reflection Test -1.221 0.501

Alcohol_Low 1.459 0.743 Alcohol_Medium -6.145 0.167 Drugs_No -0.304 0.934 Clubbing_Low 1.756 0.7 Clubbing_Medium -3.242 0.454 Friends_Low 0.084 0.985 Friends_Medium -5.398 0.226

Most values are exactly as the hypothesis predicts. Alcohol_Low and Friends_Medium are the exceptions. Alcohol_low should have a negative sign, as mentioned in the hypothesis, because the absolute total score a player has should go down if he drinks less. Friends_Medium should be positive if we look at the hy-pothesis because having less friends should give a higher absolute total difference. The exact effect of clubbing was difficult to predict as it could have an ambiguous effect of having more friends but also drinking more and using more drugs. With the data found it is still not clear what effect it has on the total absolute difference for the dummy variables have a positive and a negative effect on the absolute total

(8)
(9)

difference. So the model that predicts best whether a player is good in playing the guessing game is the following simple regression:

T OTAL_DIFFERENCE = 65.548 − 2.677MAT H_GRADE (1)

Meaning that if your math grade goes up by one point, total difference will go down with 2.677 points. So the higher your math grade, the better you play the guessing game.

2.2 Big Date Analysis

In the next analysis I will try to look at other algorithms to find patterns that are different from linear regression. With the help of big date analysis, using the program Rapidminer, I was able to try some other analyzing tools for the data I collected. Big data analysis is upcoming and already used in some econometric and computer science analyses. What it does is recognize patterns in a big set of data. But it does nog only use linear regression, because the world does not only consist of linear relations. As a matter of fact, there are far more non-linear relations. Practical applications are for instance companies that try to analyse why some people click a given picture or button on a website and others donât. Also think of complicated datasets of patients docters have and their symptoms. Wouldnât it be great to already predict what diseases a patient has by using an algorithm that finds patterns in the behavior and symptoms of these people using docter receipts in the online patient registrations? Another possible application, witin the infinite possible applications which might be quite new, is the use of big data analysis on experimental data. It might even find patterns we never looked at before. Because I have used a bunch of variables, it is interesting to see whether there are different patterns than the ones we looked at with lineair regression.

Rapidminer is a program that makes use of machine learning (pattern recogni-tion) to compute a model with predictive power. To make use of Rapidminer I first conducted an extra manipulation to the dataset. I divided the dependent variable, absolute total difference, in three evenly large categories: Low, average and high. This is easier for Rapidminer to work with.This is because we want generalisability in order to have the most meaningful predictions possible. It would not be very generalisable if we would predict an exact score.

One of the algorithms that I used was the decision tree classifier. It starts in one point and then grows downward while dividing the data into subsets. At the end of each branch there is a small subset which is chosen by the algorithm. This decision tree can predict outcomes by looking at the variables and then predict in which subset it belongs. The Naive Bayes algorithm looks at the data and assumes that all variables are independent. So for instance a fruit is a banana if it is yellow,

(10)

Figure 2

curved and around 15 centimeters long. All these features are dependent of each other but they all contribute in the naïve Bayes algorithm independently. The last algorithm I tried was the k nearest neighbour algorithm. This classifier looks at the points in the dataset that are nearest to the variables you are predicting and takes the weighted (more distances, les weight) average of the k nearest neighbours. The optimal number of nearest neighbours can be found by trial and error. So for instance if there is a dependent variable âmath gradeâ And you want to predict if the math grade is the grade of a boy or a girl, the nearest neighbor algorithm with a k of 3 takes the three nearest neighbors and takes the weighted average of these 3 points. In the figure the prediction would then be that the math grade belongs to a girl. If for instance the distribution of girls and boys is equal in this example, a random prediction would be right half of the times. So if this algorithm predicts better than half of the time, than it would be an algorithm that is useful.

This is only an example of a two dimensional nearest neighbor algorithm. The data set I use is multidimensional because I have more than one independent variable. In table the results of the three algorithms are described. Since we use 5-fold X-validations in order to get the accuracy and the std. deviation, we actually do the following. We divide our complete dataset into two different sets, that is the training set and the validation set. The training set is used to train our algorithm, for example for the 4-nearest neighbours we are so to say âgetting to know our neighboursâ and then we perform the predictive algorithm to our validation set. We repeat the splitting of the dataset five times and then we denote the accuracy as

(11)

Table 2

Algorithm Accuracy Standard deviation

Decision Tree 34.73% ±1.18%

Naïve Bays 30.55% ±2.88%

4-Nearest Neighbor 45.14% ±6.55%

X1, . . . , X5and calculate the average accuracy as M = ∑5i=1Xiand our std. deviation as σ =

q

∑5i=1(Xi−M)

2

(5−1) . Notice that the model must at least predict a third of the

times correct, otherwise it is worse than predicting randomly because total absolute difference is now divided into three evenly sized categories.

With the nearest neighbour algorithm, using 4 neighbors, the algorithm is accurate in 45.14% of the times. This means that the algorithm will correctly predict in 45.14% of the times on average whether the player will end up playing the game good, average or bad, given the replay on the questions in the experiment. This is a better predictor than the decision tree classifier or the naïve bayes classifier. To see whether this algorithm is significant, we should look at the difference between the accuracy and one standard deviation, which must me bigger than the 33,33 % prediction when predicting randomly. In this case it is, so the 4-nearest neighbor algorithm is a good predictor. In our research we have established an excellent result, which enables us to predict the abilities of guessing game players by their set of attributes using a 4-Nearest Neighbour classifier, which classifies them by their nearest neighbours in multi-dimensional Euclidian space.

3

Discussion

In this paper we found that math grade good predictor for playing the guessing game with succes. Also we found a non-linear relationship by using big data anaysis. The results found are not easy to generalise for a few reasons. One problem with the linear regression is the number of variables used. Because we used a significants level of 5% one might assume that even when there is no relation at all a relation might be found because of the 5% error. In fact, this is something that could explain the results found. An other problem with the data is the assumption that everyone answered honestly on the questionair. It is difficult to know this but we have no reason to think the opposite is true.

Here we will discuss some pros and cons of several algorithms. Let us start by saying that none of this is generalisable to all Big Data problems. One must always use multiple classifiers and test using X-validations to see which algorithm

(12)

performs best in specific problems. In our problem the k-Nearest Neighbour (k-NN) algorithm has emerged as the best algorithm. The greatest advantage of the k-NN algorithm is that it is very simple and intuitive and obviously very effective for rather simple problems. When however the classifier performs very poorly for different values of k, itâs not easy to see why this happens. This classifier is therefore known to be very sensitive to the local structure of the data. The Naive Bayes algorithm is arguably the most wide spread Big Data algorithm since it pretty much enables a multiple variable probability model with the assumption that the variables are all indepent of each other. This is per defitinition the weak element of this algorithm, and we see that this is the reason for failiure for our dataset. Much of our variables are highly intercorrelated with each other. The decision tree classifier will in the end be a very useful tool because it provides us with a decision tree. Though many scientist use these a lot these days it has itâs issues. Most important is the fact that the use of too many variables can undermine the generalisability of the classifier. This problem is already partly eliminated with pre-pruning and pruning the nodes of the decision tree. Still in some cases the problem of overspecification remains. This also appear to happen in our case.

(13)

Figure 3 Figure 4

Figure 5 Figure 6

(14)

Figure 7 Figure 8

Figure 9 Figure 10

(15)
(16)
(17)

A

Appendix

Table 3

Unstandardized Standardized

Coefficients Coefficients

Model Coefficients∗ B Std. Error Beta t Sig.

(Constant) 97689 31906 3062 ,002 Slapen -,496 1444 -,024 -,344 ,731 Wiskunde -2620 1618 -,128 -1619 ,107 Studiepunten ,111 ,236 ,038 ,472 ,637 IQ -,179 ,215 -,060 -,835 ,405 a_weinig 2046 6711 ,034 ,305 ,761 a_medium -4683 5129 -,078 -,913 ,362 drugs -,767 4250 -,013 -,181 ,857 s_weinig -2393 6561 -,039 -,365 ,716 s+medium -3894 5105 -,065 -,763 ,446 v_weinig 2060 5195 ,034 ,396 ,692 v_medium -2967 4919 -,049 -,603 ,547 EQ -,039 ,128 -,021 -,302 ,763 total -,774 1973 -,028 -,393 ,695

*Dependent Variable: tot_diff

Bibliography

Brackett, M. A., and Mayer, J. D. (2003). Convergent, discriminant, and incremen-tal validity of competing measures of emotional intelligence. Personality and social psychology bulletin, 29(9): 1147–1158.

Frederick, S. (2005). Cognitive reflection and decision making. Journal of Eco-nomic perspectives, pages 25–42.

Killgore, W. D., Killgore, D. B., Day, L. M., Li, C., Kamimori, G. H., and Balkin, T. J. (2007). The effects of 53 hours of sleep deprivation on moral judgment. SLEEP-NEW YORK THEN WESTCHESTER-, 30(3): 345.

Mayer, J. D., Salovey, P., and Caruso, D. R. (2004). Emotional intelligence: Theory, findings, and implications. Psychological inquiry, pages 197–215.

(18)

Nagel, R. (1995). Unraveling in guessing games: An experimental study. The American Economic Review, pages 1313–1326.

Neisser, U., Boodoo, G., Bouchard Jr, T. J., Boykin, A. W., Brody, N., Ceci, S. J., Halpern, D. F., Loehlin, J. C., Perloff, R., Sternberg, R. J., et al. (1996). Intelligence: knowns and unknowns. American psychologist, 51(2): 77.

Petrides, K., and Furnham, A. (2000). On the dimensional structure of emotional intelligence. Personality and individual differences, 29(2): 313–320.

Referenties

GERELATEERDE DOCUMENTEN

Echter weten patiënten vaak niet waar zij allemaal om kunnen vragen omdat de etenskar op de gang staat en nergens te vinden is en vaak niet wordt verteld wat erin zit. er zou van

daarom is Jongere van sportvereniging afgegaan.’ De zelfwaardering van gepeste jongeren kwam ernstiger onder druk te staan naarmate ze in de puberteit kwamen: ‘Jongere liep heel

Overnight pulse oximetry data was collected on the Phone Oximeter-OSA app for three nights at home before surgery, as well as three consecutive nights immediately post- surgery at

Prospective study of breast cancer incidence in women with a BRCA1 or BRCA2 mutation under surveillance with and without magnetic resonance imaging.. Junod B, Zahl PH, Kaplan RM,

In the subsequent quality of life study of the patients recruited in the PORTEC study, patients receiving vaginal brachytherapy reported an improved quality of life compared

Keywords: public debt level, interest rate, yield, sovereign credit risk, profitability, bank performance, eurozone, financial crisis.. 1 Burchtstraat 13 b , 9711LT Groningen, e-mail:

To analyze whether the motives and direct ambivalence influence less future meat consumption, a regression of less future meat consumption on the ethical-,

› Of the different motives, the ethical motive positively influences less future meat consumption. › Direct ambivalence positively influences less future