• No results found

Development andvValidation of a General Mental Ability Test

N/A
N/A
Protected

Academic year: 2021

Share "Development andvValidation of a General Mental Ability Test"

Copied!
54
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Development and Validation of a

General Mental Ability Test

Master’s thesis

Pieter van der Giessen

Student number 5969549

Master’s in Business Studies

Supervisor: S.T. Mol

Second reader: W. van Eerde

August 15, 2014

(2)

2

Abstract

This study describes the development and validation of the Center of Job Knowledge Research (CJKR) test of general mental ability (GMA). Based on a wide range of literature and existing GMA tests a new test is developed. This development is

described in depth, after which the hypotheses to assess its validity are presented. The test is presented to employees of a Dutch restaurant (N = 75) and the findings are related to their scores on job performance, as assessed by supervisor evaluations. The results show that most of the hypotheses are not supported, and careful analysis of both the subtests and the individual items shows that especially the difficulty of the items needs further attention. As this difficulty was very hard to assess beforehand, these results should not come as a surprise, but be used as valuable feedback to improve the CJKR test further. Next to this, some attention is drawn to the hospitality field and the drivers of job performance in that regard.

(3)

3

Table of contents

ABSTRACT 2 TABLE OF CONTENTS 3 TABLES 4 FIGURES 4 CHAPTER 1: INTRODUCTION 5 CHAPTER 2: THEORETICAL FRAMEWORK 8 CHAPTER 3: TEST DEVELOPMENT 11 CHAPTER 4: HYPOTHESES DEVELOPMENT 15 CHAPTER 5: METHODS 18

5.1 Company 18

5.2 Sample 18

5.3 General mental ability test 19

5.3.1 Procedure 19

5.3.2 Fluid reasoning 21

5.3.3 Comprehension knowledge 23

5.3.4 Visual processing 23

5.3.5 Long-term storage and retrieval 24

5.3.6 Pretest 25 5.4 Control variable 25 5.5 Independent variables 25 5.6 Dependent variables 25 CHAPTER 6: RESULTS 27 6.1 Descriptives 27 6.2 Reliability 27 6.3 Correlations 29

6.4 Tests for normality 29

6.5 Regression analyses 32

6.6 Hypothesis testing 34

6.6.1 Hypothesis 1: relationship GMA and job performance 34 6.6.2 Hypothesis 2: relationship education and GMA 34 6.6.3 Hypothesis 3a, 3b, 3c and 3d: relationship subtests and job

performance 34

6.6.4 Hypothesis 4: subtests correlations 34

6.6.5 Hypothesis 5a, 5b, 5c, 5d and 5e: gender differences in GMA 35

CHAPTER 7: DISCUSSION 37

7.1 Evaluation of the content of the CJKR test 37 7.2 Evaluation of the test environment and associated effects 41 7.3 Strengths and weaknesses of the current study 43

7.4 Suggestions for future research 44

CHAPTER 8: CONCLUSION 46

(4)

4

Tables

Table 1. Broad Cognitive Abilities as Used in Cattell-Horn-Carroll Theory 10 Table 2. Different Authors Reporting on Factor-Analytic Data Loading on g Using the Woodcock-Johnson III Tests of Cognitive Abilities 13 Table 3. Sample-Weighted Average of the Factor-Analytic Data Loading on g 13 Table 4. Tests Identified during Orientation Process and the Abilities They

Measure 14

Table 5. Items Measuring In-Role Performance 26

Table 6. Items Measuring Organizational Citizenship Behavior Directed Towards the

Organization 26

Table 7. Values of KR-20 for the Different Subtests 28 Table 8. Factor Analysis: Rotated Component Matrix (Varimax) 29

Table 9. Correlation Matrix 30

Table 10. Tests for Normality 32

Table 11. Hierarchical Regression Analysis Containing Age, General Mental Ability and The Different Subtests as Predictors of Job Performance 33 Table 12. Correlations of the Different Sub-Abilities 36 Table 13. Results of the Independent Samples t-test for Equality of Means. 36

Figures

Figure 1. An example fluid reasoning item and its possible answers. 22 Figure 2. An example visual processing item and its possible answers. 24

(5)

5

Chapter 1: Introduction

General mental ability tests are a common method in employee selection all over the world (Ryan, McFarland, Baron, & Page, 1999). General mental ability, or

intelligence, is essentially the ability to learn (Hunter & Schmidt, 1996; Hunter, 1986). Meta-analytic research has shown that general mental ability is the strongest single predictor of overall job performance across jobs (Schmidt, Hunter, &

Pearlman, 1981), countries (Salgado, Anderson, Moscoso, Bertua, & De Fruyt, 2003), and organizations (Hunter & Hunter, 1984). Also, the validity of general mental ability measures generalizes across different tests (or operationalizations) (Salgado, Anderson, Moscoso, Bertua, & De Fruyt, 2003). The relationship between general mental ability and job performance is stable over time (Murphy, 1989; Schmidt, Hunter, Outerbridge, & Goff, 1988). The effect that general mental ability has on job performance also holds for different levels of job complexity. However there is some evidence that the effect of general mental ability on performance is somewhat

stronger for more complex jobs (Hunter & Hunter, 1984; Salgado, Anderson, Moscoso, Bertua, & De Fruyt, 2003).

In explaining the relationship between general mental ability and job

performance, research suggests that job knowledge acts as a mediator (Hunter, 1986; McCloy, Campbell, & Cudeck, 1994; Schmidt, Hunter, & Outerbridge, 1986;

Viswesvaran & Ones, 2000). The major effect of general mental ability is on the acquisition of job knowledge. People higher in general mental ability are expected to acquire more knowledge and do so faster (Schmidt & Hunter, 1998). Ultimately, higher levels of job knowledge leads to higher levels of job performance, even for less complex jobs (Hunter & Schmidt, 1996; Hunter, 1986).

Given the role that job knowledge plays, the lack of academic attention towards this subject is surprising. Especially, because nowadays job related

competencies are more knowledge-based than before, in particular since the 1990s (Hansen, Nohria, & Tierney, 2000). Industrialized economies are no longer solely based on natural resources, since the economy is now more focused on intellectual assets (Kismihók, Vas, & Mol, 2012). Job knowledge can also be considered a measure of practical intelligence (Schmidt & Hunter, 1993). However, this practical intelligence is not an ability construct such as general intelligence, but rather a knowledge construct. Practical intelligence “involves applying the components of

(6)

6

intelligence to experience so as to (a) adapt to, (b) shape, and (c) select environments” (Sternberg, 2011, p. 305). In part it consists of tacit knowledge, defined as ‘what one needs to know in order to work effectively in an environment that one is not explicitly taught and that often is not even verbalized’ (Sternberg, 2011, p. 511). Such an

environment could perfectly be the workplace, in which tacit knowledge is used to perform. The quote above illustrates the mediating role job knowledge has between general mental ability and job performance.

Despite the role job knowledge and/or general mental ability could be playing, many organizations continue to rely on other instruments during their selection

procedures, such as (un)structured interviews, years of working experience and educational background (Ryan et al., 1999). In situations where cognitive ability tests are used to differentiate between candidates, perceived fairness is a potential problem among the people who are not hired (Hausknecht, Day, & Thomas, 2004). The

message rejected applicants receive is: “You are not smart enough for this job.” Since cognitive ability is a stable trait (Deary, Whalley, Lemmon, Crawford, & Starr, 2000), this implies that these disappointed applicants will never reach the required level. Perceived fairness is also an outcome of test bias. Racioethnic minority groups, e.g. non-Whites in the United States of America, tend to systematically score lower, but it is not always clear whether this is a result of an inappropriately designed test, ‘an outcome bias resulting from discrimination against members of this group by society at large’ or the result of actual differences between the ethnicities (Helms, 2006; Suzuki, Short, & Lee, 2011, p. 278). These subgroup differences are large enough to reduce employment opportunities for racioethnic minority groups and women. For employers, who simultaneously want to identify high-quality candidates and establish a diverse workforce, this can create a diversity-validity dilemma. Of course, many organizations also want to avoid charges of discrimination (Pyburn, Ployhart, & Kravitz, 2008). The system outlined below intends to reduce this dilemma by minimizing subgroup differences, by combining general mental ability with job knowledge. Combining general mental ability with other relevant constructs (such as knowledge) is considered a strategy to achieve diversity without minority preference (Sackett, Schmitt, Ellingson, & Kabin, 2001).

No matter how important job knowledge is, the fact that cognitive ability is the best predictor of job performance cannot be ignored. Therefore, researchers at the University of Amsterdam are developing a test environment that scores participants

(7)

7

on both job knowledge and general mental ability as part of the Center of Job Knowledge Research (CJKR). This test environment allows employers to obtain a more complete assessment of their future employees compared to choosing between an ability test or a knowledge test. Applicants scoring low(er) on general mental ability may ‘compensate’ this by showing excellent job knowledge. Vice versa, employers might also choose to select applicants higher on general mental ability and lower on job knowledge, because of the prospect that these intelligent applicants will learn faster once hired than people who score lower on general mental ability; a similar choice can be made in the case where more knowledgeable individuals are not available.

This system also allows for more positive feedback to be given to rejected applicants. The message ‘you are not smart enough for this job’ is replaced by concrete knowledge areas in which the applicant may improve his or her fit with job requirements. For instance, specific literature or courses could be proposed to this extent.

For a system that assesses both general mental ability and job knowledge to be truly valuable, it needs to be embraced by actual employers. The use of this system by employers would justify the development of job knowledge tests. For a single job this takes months of full time desk research and interviews with job incumbents.

The above-mentioned project until now used the Ability Profiler to assess general mental ability, an instrument provided by the O*NET Resource Center (O*NET Resource Center, 2013). This instrument, however, has not been validated and is therefore not applicable to ‘real-life’ selection procedures. Since selection is the core environment of the future activities described above, using this ability profiler is no longer an option. Given the high costs of publicly available general mental ability tests, the current investigation aims to develop and validate a general mental ability test of its own. This test will be referred to as the CJKR test.

This paper continues as follows. In the next paragraph the theoretical framework on which this research is based is presented. After this, the formation of this test is described and hypotheses that assess the validity of the CJKR test are presented. This is followed by the method section, in which detailed information is given about the construction of the different question types and scoring. Also the complete process of validation is described. All this will be followed by the results. A discussion and suggestions for future research follow.

(8)

8

Chapter 2: Theoretical framework

The development of cognitive ability tests and the theories describing intelligence have gone through multiple phases of development. In this paper the terms

intelligence, cognitive ability and general mental ability (GMA) are used interchangeably. A lot has changed since Plato called intelligence ‘the love of

learning’ in his The Republic (around 380 BC) and St. Augustine stated that ‘superior intelligence might lead people away from God’, in his De doctrina christiana (397 AD) (Kahn, 1987; Sternberg, 1990). The foundation of theories of cognitive ability lies in the assumption that people actually differ in cognitive ability. According to this assumption, differences are naturally inherited and only partially formed through education. Francis Galton, a brother of Charles Darwin, was among the first to raise this issue and, by doing so, initiated the well-known nature-nurture debate (Urbina, Sternberg, & Kaufman, 2011).

The first person who was able to deliver a measure of cognitive ability was Alfred Binet in the late nineteenth century. His main contribution was the insight that children become more competent as they grow older. Hence, a good measure of intelligence should be one that is found easier by older children and harder by younger children. This ultimately led to the development of the intelligence quotient (IQ) (Mackintosh, 2011).

While other researchers were focused on developing the tests of different abilities (such as attention, memory and abstraction) the concept of the positive manifold was developed by Charles Spearman (Willis, Dumont, & Kaufman, 2011). During his research, Spearman concluded that every person has a certain level of general intelligence, “which the person can demonstrate in most areas of endeavor, although it will be expressed differently under different circumstances” (Willis et al., 2011, p. 40). This factor was labeled g for general intelligence. Spearman’s results showed that the specific abilities mentioned above measured not only that specific ability, but also a general factor that was present in all the tests in the battery. This article will not go into the details of his computations, but despite the few

mathematical errors Spearman made, his conclusion checks out. Contemporary statistical analyses show that the correlations between the general factors of three totally different intelligence batteries, after correcting for unreliability in each

(9)

9

measure, are as close to 1.00 as possible (.99, .99, 1.00) (Johnson, Bouchard, Krueger, McGue, & Gottesman, 2004).

As can be imagined, the statement that one general factor is the explanation for all differences in intelligence was not accepted by everyone. The rise of factor analytical methods opened up a new route of theory development. Thurstone came up with seven independent ‘primary mental abilities’: verbal comprehension, verbal fluency, number, spatial visualization, inductive reasoning, memory and perceptual speed (Mackintosh, 2011; Thurstone, 1938). Another scholar who came up with multiple unique abilities was Joy Paul Guilford, who came up with as many as 180 (Guilford, 1967; Willis et al., 2011).

A set of types of intelligence still often referred to today was the division proposed by Cattell in 1941. He distinguished between ‘fluid intelligence’ (Gf) and ‘crystallized intelligence’ (Gc). The former refers to inductive, deductive and

quantitative reasoning; the latter refers to ‘the application of acquired knowledge and learned skills’. This second type of intelligence is for example tested by vocabulary exercises and questions referring to ‘common sense’. According to Cattell Gf was the biological basis of intelligence, Gc the expression of that ability under cultural

influences (Willis et al., 2011).

Modern techniques allowed Carroll (1993) to (re)analyze as much as 461 datasets of administered cognitive abilities tests. The result of his factor analyses was a set of ‘primary mental abilities’, including Language, Reasoning, Memory and Learning and Visual Perception. On the basis of these analyses he later presented ‘A Theory of Cognitive Abilities: the Three-Stratum Theory” (Carroll, 1993, 1997). The first stratum contains all the narrow sub abilities, around seventy in total. The second stratum includes the broader abilities mentioned above. The third stratum contains the general intelligence factor, g.

The last two theories were combined into the nowadays widely known Cattell-Horn-Carroll (CHC) theory, notwithstanding the differences in opinion between Cattell and Carroll, including the question of whether g actually exists (Willis et al., 2011). CHC theory also describes a three-stratum model, with Cattell’s Gf and Gc factors as second stratum abilities. The broad abilities are shown in Table 1. For exact and extensive definitions of all these constructs, including over seventy second stratum abilities and their definitions, see Newton and McCrew (2010).

(10)

10 Table 1

Broad Cognitive Abilities as Used in Cattell-Horn-Carroll Theory Fluid reasoning (Gf)

General (domain-specific) knowledge (Gkn) Visual processing (Gv)

Auditory processing (Ga) Short-term memory (Gsm)

Long-term storage and retrieval (Glr) Processing Speed (Gs)

Reaction and decision speed (Gt) Psychomotor speed (Gps)

Quantitative reasoning (Gq) Reading and writing (Grw) Psychomotor abilities (Gp) Olfactory abilities (Go) Tactile abilities (Gh) Kinesthetic abilities (Gk) .

The CHC model is the basis for many contemporary cognitive tests (Willis et al., 2011). Therefore it shall also be used as the foundation for the development of the CJKR test. These different abilities all load on g, which is of particular interest for this research, since g is the driver of job performance and not (one of) the different broad or narrow abilities (Hunter, 1986). This is confirmed by a study on general mental ability by Schmidt: “[...] any combination of two or three or more specific aptitudes is actually a measure of [general mental ability].” (2002, p. 189).

Job performance is usually measured by supervisory ratings (Schmidt, 2002). However, under some conditions supervisors do not have the ability to fully observe and evaluate job performance; e.g. in situations where the supervisor and employee do not work at the same place. Therefore, studies have also used training performance and work sample measures as the criterion. In this study supervisor evaluations will be used. Ratings of job performance will be accurate if the supervisors have direct contact with their employees throughout the day (Schmidt, 2002).

(11)

11

Chapter 3: Test development

The development of the test started with a broad review of the literature on the tests measuring general mental ability. In this orientation process leading tests in both the academic and commercial field were identified. Insofar as available, guiding white papers, validity data and reliability data were collected. The search for tests consisted of both online and offline research. Online Google and Google Scholar were used to find both commercial and academic tests, and articles reporting on the usage of those tests. The search was limited to the English and Dutch literature, as these were the mastered languages of the author. However, meta-analytical research has shown that general mental ability has predictive validity across countries all over the world (Bertua, Anderson, & Salgado, 2005; Salgado, Anderson, Moscoso, Bertua, De Fruyt, et al., 2003), so the findings in the Dutch and English literature are generalizable to other languages. Keywords used were ‘general mental ability’, ‘gma’, ‘cognitive ability’, ‘iq’, ‘(general) intelligence’, ‘g-factor’ and ‘general ability’. At first, the results were limited to those published in the last ten years, but further analysis revealed that most of the articles were referring to tests published before that, so that parameter was dropped. Most of the tests were identified using widely cited articles, including meta-analyses, reporting on general mental ability (Johnson & Bouchard Jr., 2011; Salgado, Anderson, Moscoso, Bertua, De Fruyt, et al., 2003; Schmidt &

Hunter, 2004). The offline tests were identified on the basis of the Cambridge Handbook of Intelligence (Sternberg & Kaufman, 2011) and the Handbook of Employee Selection (Farr & Tippins, 2010).

The gathered tests were examined to form an overview of the different subtests they consist of, their validity and the subtests’ loadings on the general

intelligence factor, g. During the examination, guiding white papers, validity data and reliability data were studied. The tests were organized in a matrix in which they were classified based on the Cattell-Horn-Carroll (CHC) theory (Newton & McGrew, 2010). The tests and their CHC abilities are shown in Table 4. Several of the tests only measured one broad ability; for example Raven’s progressive matrices only assesses fluid reasoning (Raven, 2000). All identified tests were taken into account to obtain a comprehensive picture of contemporary general mental ability tests. Many tests were encountered multiple times during the literature review, raising the

(12)

12

confidence that the tests shown in Table 4 represent a solid sample of the tests currently in use.

The fact that certain abilities are present in contemporary tests, does not guarantee their impact on g. Therefore, factor-analytic data were gathered and examined. Factor analytic data were available on the Woodcock-Johnson III Tests of Cognitive Abilities. Table 2 shows the main findings of four different studies in which these factor analyses were conducted. A sample-weighted average of these four publications reveals that fluid reasoning (Gf, .89), long-term storage and retrieval (Glr, .85) and comprehension knowledge (Gc, .81) were the factors loading highest on g in these samples. The rankings of the other broad abilities are shown in Table 3.

Based on these rankings it was decided that tests measuring fluid reasoning, long-term storage and retrieval and comprehension knowledge would be included in the CJKR test. Visual processing (Gv, .77) was added to the above-mentioned list of three, because of it’s widespread presence in the identified tests (14 out of 21 tests, 66,7 percent). The lower ranked abilities were not only loading less on g, they also would be more difficult or impossible to measure using an online survey format. Think for example of auditory assignments that require loudspeakers, the correct software or an examiner.

After deciding which abilities would be measured, decisions had to be made about what kind of exercises would be used to measure the different abilities.

For fluid reasoning number series were selected, because this exercise was used most frequently in the observed tests. Next to this, Raven’s progressive matrices were selected, since they are the ‘best researched of all culture-reduced tests of GMA’ (Rushton, Bons, Vernon, & Čvorović, 2007, p. 1774). For long-term storage and retrieval the exercise to remember previously unrelated pairs was selected, as this was the way of measuring this construct in all but one of the identified tests that measured long–term storage and retrieval. For visual processing a combination was chosen of both sub-abilities, namely visualization (“The ability to apprehend a spatial form, object, or scene and match it with another spatial object, form, or scene with the requirement to rotate it (one or more times) in two or three dimensions.”) and spatial relations (“Ability to rapidly perceive and manipulate (mental rotation,

transformations, reflection, etc.) visual patterns or to maintain orientation with respect to objects in space.”) (Newton & McGrew, 2010, p. 624). Both abilities were equally present in the identified tests and will be measured by ‘cube folding’ and ‘paper

(13)

13

folding’ exercises respectively. Comprehension knowledge includes exercises that measure the ‘extent of vocabulary (nouns, verbs, or adjectives) that can be understood in terms of correct word (semantic) meanings (Newton & McGrew, 2010, p. 623).

More extensive descriptions of the constructs and exercises can be found in the methods section.

The aim of the rest of this paper is to describe the validation of this CJKR test. It continues by describing the development of the hypotheses that will be tested.

Table 2

Different Authors Reporting on Factor-Analytic Data Loading on g Using the Woodcock-Johnson III Tests of Cognitive Abilities

Authors Sample Gf Gc Gv Glr Ga Gs Gsm

Taub and McGrew, 2004 7485 .922 .845 .913 11 .826 .647 .854 Edwards and Oakland,

2006 2379 .781 .762 .477 .713 .595 .556 .694 Keith et al., 2008 6970 .941 .853 .815 .806 .834 .637 .862 Floyd et al., 2009 3577 .791 .693 .606 .742 .674 .517 .615 Table 3

Sample-Weighted Average of the Factor-Analytic Data Loading on g

Gf Gc Gv Glr Ga Gs Gsm

(14)

Table 4

Tests Identified During Orientation Process and the Abilities They Measure

Name Gf Gc Gkn Gv Ga Gsm Glr Gs Gt Gq Grw

GL Assessment Cognitive Abilities Test (Calvin, Fernandes, Smith, Visscher, & Deary, 2009) x x x Johnson O'Connor Research Foundation (Haier et al., 2009) x x x x x Kaufman Adolescent and Adult Intelligence Test (Lassiter, Matthews, Bell, & Maher, 2002) x x x Leiter International Performance Scale-Revised (Hooper & Bell, 2006) x x x x Miller Analogies Test (Meagher & Education, 2008) x Pearson Bennett Mechanical Comprehension Test (Klenk, Forbus, Tomai, & Kim, 2011) x x x x Pearson Beta III (Unsworth & Engle, 2006) x x x x Pearson Comprehensive Test of Nonverbal Intelligence (Bradley-Johnson, 1997) x Pearson General Ability Measure for Adults (Lassiter et al., 2002) x x Pearson Naglieri Nonverbal Ability Test (Naglieri & Ford, 2003) x Pearson Revised Minnesota Paper Form Board Test (Pietschnig, Voracek, & Formann, 2010) x x Raven’s Progressive Matrices (Raven, 2000) x The Comprehensive Ability Battery (Johnson et al., 2004) x x x x x

The Hawaii Battery (Johnson et al., 2004) x x x x x

The Wechsler Adult Intelligence Scale (Johnson et al., 2004) x x x x x x x x Universal Nonverbal Intelligence Test (Hooper & Bell, 2006) x Wonderlic Cognitive Ability Pretest (Wright & Meade, 2011) x x x Wonderlic Classic Cognitive Ability Test (Welter, 2013) x x x Wonderlic Contemporary Cognitive Ability Test (Randall, 2013) x x x Woodcock-Johnson III Tests of Achievement (Taub & McGrew, 2004) x x x x x x x x x Woodcock-Johnson III Tests of Cognitive Abilities (Taub & McGrew, 2004) x x x x x x x x

(15)

Chapter 4: Hypotheses development

As described above, based on the literature it was decided which (broad) abilities the CJKR test will measure and which types of exercises are used to do so. To assess to what extent the CJKR test is a valid measurement of general mental ability multiple hypotheses were developed. These hypotheses are described below.

4.1 Concurrent validity

Concurrent validity is a form of criterion-related validity that assesses the extent to which the test correlates with related constructs. In this case, the main goal of this research was to develop a general mental ability test that enables the prediction of future job performance. This was strongly based on meta-analytic research showing that general mental ability is the strongest single predictor of overall job performance across jobs (Schmidt et al., 1981), countries (Salgado, Anderson, Moscoso, Bertua, & De Fruyt, 2003), and organizations (Hunter & Hunter, 1984). As the main argument above is that general mental ability predicts job performance, this relationship is expected in this research as well:

Hypothesis 1: General mental ability as measured by the CJKR test has a positive relationship with job performance.

Another related variable is educational achievement. As intelligence is often

described as the ability to learn, it should come as no surprise that general intelligence is the best-known predictor of academic achievement across different domains

(Rohde & Thompson, 2007). Measures of cognitive ability and educational

achievement usually correlate around .50 (Gustafsson & Undheim, 1996). The same relationship is expected to emerge in this research. Educational achievement is measured by highest level of education received:

Hypothesis 2: Highest level of education has a positive relationship with general mental ability as measured by the CJKR test.

(16)

16

4.2 Construct validity

Construct validity of a test is ‘the extent to which its patterns of subtest and item inter-correlations or its distribution of scores conforms to psychometric theory’ (Winship, 2003, p. 7). In this case, multiple patterns are assessed.

Firstly, previous research suggests that g is the driver of job performance and not (one of) the different broad or narrow abilities (Hunter, 1986; Schmidt & Hunter, 2004). This implies that the different subtests do not explain variance in job performance above and beyond the variance already explained by general mental ability. Therefore, the following hypotheses are suggested. Note: as it is statistically impossible to proof the existence of no relationship between variables, these hypotheses will mainly be used to indicate if the findings are largely in the right direction.

Hypothesis 3a: Fluid reasoning (Gf) as measured by the CJKR test does not explain additional variance above and beyond the effect of general mental ability on job performance.

Hypothesis 3b: Comprehension knowledge (Gc) as measured by the CJKR test does not explain additional variance above and beyond the effect of general mental ability on job performance.

Hypothesis 3c: Long-term storage and retrieval (Glr) as measured by the CJKR test does not explain additional variance above and beyond the effect of general mental ability on job performance.

Hypothesis 3d: Visual processing (Gv) as measured by the CJKR test does not explain additional variance above and beyond the effect of general mental ability on job performance.

Secondly, it is expected that the different broad abilities that the CJKR test measures, correlate in the same way with each other as described in the literature. This implies that the concept of the positive manifold (Willis et al., 2011), which describes that different subtests of abilities all load on one factor of general intelligence, g, also can be observed here. As a result of the positive manifold, it is expected that all the

different subtests correlate moderately (.20 to .60) with each other, indicating that ‘the broad cognitive abilities are related to, but distinct from, one another’ (Schrank, McGrew, & Woodcock, 2001, p. 17).

(17)

17

Hypothesis 4: The four subtests fluid reasoning (Gf), comprehension knowledge (Gc), long-term storage and retrieval (Glr) and visual processing (Gv) correlate moderately (,20 to ,60) with each other.

Thirdly, a lot of scientific attention was drawn to gender differences. The question ‘which is the smarter sex?’ appeals to the imagination of many researchers. To answer this question many intelligence tests are not usable, since they are designed to

produce no differences between males and females (Halpern, Beninger, & Straight, 2011). This implies that questions that differentiate between men and women are removed from those tests. Tests that are not normed to eliminate sex differences suggest that no significant differences exist in the outcome of g (Jensen, 1998). Jensen concluded: “No evidence was found for sex differences in the mean level of g or in the variability of g. ... Males, on average, excel on some factors; females on others” (pp. 531–532). So overall, when looking at g there is no smarter sex, but differences can be found when using subtests.

Findings that have been replicated many times indicate that women tend to score higher on tests of verbal abilities and memory tasks (Jensen, 1998). Males are expected to perform better on reasoning tasks and spatial ability tasks (Hyde, 2005). These outcomes are also expected in this research.

Hypothesis 5a: There is no relationship between gender and general mental ability as measured by the CJKR test.

Hypothesis 5b: There is a relationship between gender and fluid reasoning as measured by the CJKR test, so that males score higher than females.

Hypothesis 5c: There is a relationship between gender and comprehension knowledge as measured by the CJKR test, so that females score higher than males.

Hypothesis 5d: There is a relationship between gender and long-term storage and retrieval as measured by the CJKR test, so that females score higher than males. Hypothesis 5e: There is a relationship between gender and visual processing as measured by the CJKR test, so that males score higher than females.

The next chapter will describe the methods to be used to test the above-mentioned hypotheses.

(18)

18

Chapter 5: Methods

In the previous chapter the theoretical foundations of the test were described. The chapter that follows describes the methods used to develop and administer this test for validation purposes. Descriptions are given of the company in which the test was administered and of the sample that the test was administered to. After this, the exact content of the CJKR test is outlined and the (in)dependent variables are described.

5.1 Company

This CJKR test was administered in a company in the hospitality industry in The Netherlands. At the time of administration the company had 155 employees who were employed through a payroll organization and 10 members of the management staff, who were in permanent employment at the company itself.

During the selection procedure this company only selects on social skills, motivation and first impression, as the latter is considered very important in the hospitality branch. Therefore, expected future job performance is not explicitly considered as the most important criterion, as motivated but less skilled employees are given the chance to develop themselves. Next to this, the company has never fired an underperformer. Instead, managers sit down with them and try to improve their job performance. As a result of the above, scores on both the CJKR test and supervisor ratings of job performance are expected to spread well across the possible range. Therefore, the link between general mental ability and job performance can be

investigated with minimal restriction in range. This is a core strength of this research, because in this way selection bias is minimized (Fernandez & Weinberg, 1997).

5.2 Sample

Of the 155 employees, the 145 regular employees (non-management) were invited personally either by phone or in person to fill in the test. Participation in the test was completely voluntarily and the employees received an email with instructions how find the test after the conversation. Other reminders were sent two weeks and one week before the end of testing, by email and SMS respectively. If applicable, during accidental face-to-face meetings employees were also reminded to fill in the test if they had not done so already. Before the reminders, 149 test entries were recorded.

(19)

19

After the reminders another 48 entries were recorded. The total of 197 test entries included 38 entries that were identified as irrelevant (pretest) entries, because they were filled in using a non-issued company ID (e.g. ID number 1).

Of the above-mentioned 145 employees, 116 employees promised to fill in the test and 15 employees declined right away because they indicated to have no time to fill in the test. After 4 weeks of testing it turned out that 98 unique employees had actually filled in the test. Due to technical failures (see the discussion) 23 of the entries were not usable, leaving 75 complete entries (N = 75), and a response rate of 51,7 percent.

5.3 General mental ability test

The test was developed as an online survey in an adjusted Wordpress environment.1 The online setting was used to maximize efficiency: data did not have to be recorded manually and the respondents could take the test at a time convenient to them.

The four subtests were developed as follows:

5.3.1 Procedure

Participants were instructed to visit a dedicated website to start the test2. This website showed instructions of the procedure of the test. In the instructions the goal of the research and privacy concerns were addressed. This included the notion that all the data would only be used for academic research and not be shared with anyone within the company. In this way respondents knew that their results would not influence their salary, promotions or have any other consequences. It was also indicated that test takers would have a chance to win one of two 50 euros vouchers.

On the next page, general information on timing and scoring was given. People were informed that their answers would be recorded the moment the timer reached zero. Next to this, the participants were informed that some parts of the test corrected for guessing, advising them not to guess on those subtests, unless they knew one or more of the answers to be wrong.

1

The test was developed by a team of five members. These members were dr. S.T. Mol, G. Kismihók, B. Kovács, G. Wischy and P. van der Giessen.

2

http:// dev.jobknowledge.eu/theehuis/

(20)

20

After this, participants were asked to fill in their company issued ID number. This ID number allowed us to record who took the test. As this identification was necessary to match the survey data with supervisor performance evaluations, participants could not continue without filling in this ID.

Every subtest was preceded by a specific introduction, indicating the type and number of questions to be asked for that specific subtest. Next to this information on scoring and, if applicable, correction for guessing was given. Also the time limit was indicated, if present. Once a participant started the subtest, there was a button in place to review the instructions again. Clicking this button would not pause the timer. Upon clicking on the selected answer, the page moved downwards just enough to place the next question on top to minimize scrolling for the respondent.

The first subtest was long-term storage and retrieval. Participants were informed that when clicking on the ‘next’ button, a timer of five minutes would start in which they would have to remember as many word pairs as possible. On the next page, the twenty word pairs were shown, also showing a button allowing the

participants to continue if they finished within five minutes.

The next subtest contained the number series questions. Ten number series questions were shown below each other. After this, the progressive matrices were introduced. In this case, no example item was shown, as the objective of these questions is for participants to discover the underlying logic by themselves (Raven, 2000). The next page showed ten questions, with a button to a page containing another ten questions. Participants could go back and forth between these pages, as they desired. After finishing the progressive matrices, the comprehensive knowledge items were introduced. Two pages of questions with ten items each were shown.

The next part included the paper folding items. A comprehensive explanation was provided to make sure that the respondent knew exactly what was expected of him or her, as the time per ten questions was only three minutes. Two sets of ten questions were shown. Going back to the first set was not possible, as the timer was set per ten questions.

After the paper folding exercises the cube folding part of the test was introduced. An example item was shown along with an explanation why only one answer could be correct. This was done to remove the idea on the part of respondents that multiple answers seemed correct; something that emerged during pretesting. On the next page ten items were shown.

(21)

21

In the final part, a page was shown that introduced the retrieval from memory of the word pairs introduced earlier. On the pages that followed a single item was shown with four possible answers. Once answered, the respondent had to click ‘continue’ to go to the next question. Going back was not possible to minimize the information that the respondent could gather from the items themselves.

After this part, a page was shown that asked for some final information: highest level of education and, if desired, an email address to send feedback to, once the results were known. On the next page, scores on the different subtests were shown, indicating the number of correct, incorrect, and unanswered items.

Respondents were encouraged again to contact one of the researchers if they had any questions and there was space to provide feedback right away in a default textbox. The test was concluded by a page thanking the respondent and a message that the website could be closed.

5.3.2 Fluid reasoning (Gf, 30 items)

This ability refers to ‘the use of deliberate and controlled mental operations, often in a flexible manner, to solve novel problems that cannot be performed automatically’ (Newton & McGrew, 2010, p. 623). This construct was measured by two different subtests: number series and progressive matrices.

The 10 ‘number series’ questions consisted of a series of numbers and the respondent was instructed to identify the most logical next number (Lohman & Lakin, 2011). An example would be: “What is the most logical next number: 8, 10, 12, 14, ...” with 16, 18, 20 and 14 as possible answers. The ten questions could be answered without a time limit, but for wrong answers one third of a point was deducted from the total score. An unanswered question did not result in points awarded nor deducted. The probability of guessing the correct answer and being awarded a full point was (1/4). Next to this, the probability of guessing an incorrect answer and being awarded -1/3 was (2/3). Therefore the expected value of the number of points gained due to guessing was ((1/4)(1)+(3/4)(-1/3) which is equal to zero; which makes guessing not an effective strategy. In this CJKR test correction for guessing was only applied for those subtests that were found to have such correction in other general mental ability tests as well (Rohde & Thompson, 2007).

To allow for the efficient generation of items, algorithms were developed that described the logic in the question. These tests use logic that is comparable to other

(22)

22

tests (Dodrill, 1983). This made it possible to generate multiple questions using the same logic, without having to create them manually. This will be of great use in the future when item banks have to be generated to allow for testing on a greater scale.

The 20 ‘progressive matrices’ questions were based on the well-known Raven’s progressive matrices (Raven, 2000). The items were self-developed for the purposes of the current investigation3.

Every item showed an image out of which a piece was removed. The respondent’s task consisted of identifying the most logical piece to fit into the gap. These items were developed to range from very easy to very hard. A fairly easy example item is shown in Figure 1. Twenty minutes were provided to answer twenty items with six possible answers each. For each wrong answer a fifth of a point was deducted from the final score. An unanswered question did not result in points awarded nor deducted.

Figure 1. An example fluid reasoning item and its possible answers.

3

Twelve items were developed by P. van der Giessen, eight by G. Wischy.

(23)

23 5.3.3 Comprehension knowledge (Gc, 20 items)

This ability refers to “a person’s breadth and depth of acquired knowledge of the language, information, and concepts of a specific culture and/or the application of this knowledge” (Newton & McGrew, 2010, p. 623). Historically it is often referred to as crystallized intelligence.

This part was designed to test a participant’s ability to make connections between different constructs and words. With this in mind, a database of common words and (if applicable) their mutual relations was established. The relations used were: A ‘is similar to’ B, A ‘is the opposite of’ B, A ‘is a kind of’ B, A ‘is a part of’ B and A ‘causes’ B. The words and their relations were initially gathered on the basis of the combined knowledge of the team members. Their results were verified by

comparing them with an open source online graphical dictionary (“Visuwords,” 2013). This dictionary was also used to add additional words and relations.

Questions were generated by taking three related words and one unrelated word, posing the question: “which of these four does not belong in the group”. As a result, participants were confronted with a question with four possible options; such as ‘car’, ‘wheel’, ‘brakes’ and ‘steering wheel’. The correct answer in this case would be ‘car’ since all the other answers are a part of a car. Another question was posed as follows: ‘wheel : car | wing : … ‘, asking the respondent to identify the fact that a wheel is a part of a car (the relationship) and deduce that a wing is a part of an

airplane. The three alternative answer possibilities included other parts of an airplane. Five minutes were provided to answer twenty questions. For each wrong answer a third of a point was deducted from the final score. An unanswered question did not result in points awarded nor deducted.

5.3.4 Visual processing (Gv, 30 items)

This construct refers to the ability ‘to generate, store, retrieve, and transform visual images and sensations’ (Newton & McGrew, 2010, p. 624). This subtest consisted of two parts: paper folding and cube folding4.

The 20 paper folding exercises described folded squares of paper through which a hole was punched (Moffat & Hampson, 1996). Participants had to unfold this imaginary piece of paper and indicate where the holes would show up in the unfolded

4

All the visual processing items were developed by P. van der Giessen.

(24)

24

square. Five answer items were provided and the time limit was three minutes per ten questions. In total two sessions of ten questions were administered. For each wrong answer a fourth of a point was deducted from the final score (Hegarty & Waller, 2004; Moffat & Hampson, 1996). An unanswered question did not result in points awarded nor deducted.

The 10 cube folding exercises showed the layout of an unfolded cube. The participant had to create a mental image of the folded cube and identify which of the answer options was a correct representation of the folded cube (Johnson et al., 2004). An example question and its possible answers are shown in Figure 2. It can be seen that the two colored surfaces can never be adjacent to each other, so the only correct answer here is the last one. Five minutes were provided to answer ten questions. For each wrong answer a fourth of a point was deducted from the final score. An

unanswered question did not result in points awarded nor deducted.

Figure 2. An example visual processing item and its possible answers.

5.3.5 Long-term storage and retrieval (Glr, 20 items)

This construct refers to ‘the ability to store and consolidate new information in long-term memory and later fluently retrieve the stored information (e.g., concepts, ideas, items, names) through association’ (Newton & McGrew, 2010, p. 626).

From the same database as used in the comprehension knowledge part, forty random words were selected, which were used to form twenty previously unrelated pairs. These pairs were presented to the respondent at the beginning of the test. After all the other subtests were presented, the respondent got twenty multiple choice

(25)

25

questions each showing one of the two words of a different pair. The four answer options included the correct answer and three random words chosen from the remaining 38 words.

5.3.6 Pretest

The developed test was administered to three current employees of the company as a pretest. Based on their recommendations some of the introductory texts were clarified. Texts that were open to various interpretations were edited. No changes to the test content were made, as the test was experienced as ‘fun’, ‘sometimes difficult’ and ‘clear’.

5.4 Control variable

Age was used as a control variable. The data were obtained from company data. The company issued personal ID was used as a link between the test scores, supervisor evaluations and company data (age, tenure).

5.5 Independent variables

The scores on the subtests of the CJKR test were recorded to serve as measure for the different abilities. These scores were subsequently used to calculate an item-weighted average that served as an overall score for the test. Furthermore, tenure and gender were obtained from company data. Highest education was requested from the participant at the end of the test.

5.6 Dependent variables

Job performance measures exist in different forms. For this research two scales were selected that closest matched the company’s conceptualization of job performance. Two supervisors were asked to score the respondents on the following items by indicating to what extent they agreed with these statements. All the texts were

translated to Dutch, the native language of the supervisors. Answers were provided on a 7-point Likert scale, with the following possible answers: strongly disagree (1), disagree (2), disagree somewhat (3), neither agree nor disagree (4), agree somewhat (5), agree (6), strongly agree (7).

The first scale is task (or in-role) performance as suggested by Williams and Anderson (1991). The items are shown in Table 5. The second scale measured

(26)

26

organizational citizenship behavior (OCB). This scale originally consists of two subscales: OCBs directed to individuals (OCBI) and those directed to the organization (OCBO) (Lee & Allen, 2002). Given the nature of the jobs (and the limited amount of freedom individuals have to demonstrate OCBs directed to individuals) only the OCBO measure was included. The items are shown in Table 6. Furthermore, there were no changes to the items.

Table 5

Items Measuring In-Role Performance (Williams & Anderson, 1991) Adequately completes assigned duties

Fulfills responsibilities specified in job description Performs tasks that are expected of him/her

Meets formal performance requirements of the job

Engages in activities that will directly affect his/her performance evaluation Neglects aspects of the job he/she is obligated to perform (contraindicative) Fails to perform essential duties (contraindicative)

Table 6

Items Measuring Organizational Citizenship Behavior Directed Towards the Organization (Lee & Allen, 2002)

Attends functions that are not required but that help the organizational image. Keeps up with developments in the organization.

Defends the organization when other employees criticize it. Shows pride when representing the organization in public. Offers ideas to improve the functioning of the organization. Expresses loyalty toward the organization.

Takes action to protect the organization from potential problems. Demonstrates concern about the image of the organization.

(27)

27

Chapter 6: Results

In this chapter the results of the analyses will be shown. Descriptive statistics will be outlined, reliability and normality statistics are reported and the correlations between the variables will be shown. Finally, the regression analyses will be described based on which the hypotheses will be supported or rejected.

6.1 Descriptives

Of the 75 participants, 37 were male (N = 75). Their average age was 20,4 (SD = 2,65) years. On average, a participant had worked for the company for 40,0 months (SD = 31,4). All of the respondents had the Dutch nationality. Highest education was as follows: 30,6 percent secondary school, 5,3 percent vocational education (“MBO”), 29,3 percent higher professional education (“HBO”) and 34,7 percent university or higher.

The minimum scores on the subtest were below zero (due to the correction for guessing) for progressive matrices (Mean = 9,2; SD = 6,25) and cube folding (Mean = 2,74; SD = 2,80). The subtest with the highest minimum score was comprehension knowledge where all the respondents managed to answer at least sixty percent correct (Mean = 17,18; SD = 1,75). The minimum score on the number series was 2,02 (Mean = 7,25; SD = 1,77), on paper folding 0,20 (Mean = 8,48; SD = 4,82) and on long-term storage and retrieval 1,34 (Mean = 17,16; SD = 4,64). For each subtest there was at least one respondent who answered all questions correctly.

Supervisor evaluations on task performance ranged from 3,64 to 6,86 on a scale from 1 to 7 (Mean = 5,55; SD = 0,65). Supervisor evaluations on OCBO ranged from 1,75 to 6,56 (Mean = 4,17; SD = 1,37).

6.2 Reliability

The Kuder–Richardson Formula 20 (KR-20) statistics were calculated for the

different subtests. The results are shown in Table 7. The KR-20 statistic was used, as it is analogous to Cronbach’s alpha but also applicable to dichotomous (correct-wrong) variables (Cortina, 1993). All the questions were dichotomous, as the given answers were either right or wrong.

(28)

28

The table shows that the different subtests have very divergent values for KR-20, indicating different levels of internal consistency. None of the scales would significantly benefit from the removal of specific items to increase the value of KR-20. The low reliability scores for number series and comprehension knowledge were not the result of scoring issues, which was double-checked. A possible explanation could lie in the divergent item difficulty of these items, on which will be elaborated upon in the discussion.

Table 7

Values of KR-20 for the Different Subtests

Subtest Number of items KR-20

Number series 10 0,489

Progressive matrices 20 0,745

Comprehension knowledge 20 0,341

Paper folding 20 0,879

Cube folding 10 0,693

Long-term storage and retrieval 20 0,906

Two-way mixed intraclass correlations were calculated to assess the consistency between raters (interrater reliability). The raters scored the respondents on task performance and organizational citizenship behavior directed to the organization (OCBO). The intraclass correlation for task performance was found to be 0,619 (p < ,001), indicating substantial agreement (Landis & Koch, 1977). Intraclass correlation for OCBO was found to be 0,831 (p < ,001), indicating almost perfect agreement.

A factor analysis was conducted to explore whether the different subtests loaded on one single factor. Conducting a factor analysis on the item level was problematic, as the number of cases (75) was lower than the number of variables (100). To this extent a principal components analysis was conducted using varimax rotations, comparable to a factor analysis of the Wechsler memory scale by Ernst et al. (1986). The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy was found to be 0,659, which is classified as ‘mediocre’ by Kaiser (as cited in Dziuban & Shirkey, 1974). Bartlett’s test of sphericity was found to be significant (p < .001), indicating that correlations in the data set are present that are appropriate for factor

(29)

29

analysis (Dziuban & Shirkey, 1974). The factor analysis yielded two factors with an eigenvalue greater than 1. The rotated component matrix is shown in Table 8.

The two factors that emerge together account for 53,553 percent of the explained variance. As Table 8 shows, visual processing subtests cube and paper folding, and comprehension knowledge load highly on the first factor (explaining 33,869 percent of the variance). Long-term storage and retrieval, and number series load highly on the second factor (explaining 19,684 percent of the variance). The ‘progressive matrices’ subtest loads on both factors, but higher on the second factor.

Table 8

Factor Analysis: Rotated Component Matrix (Varimax)

Subtest/component 1 2 Number series 0,221 0,671 Progressive matrices 0,416 0,627 Comprehension knowledge 0,640 -0,028 Paper folding 0,728 0,077 Cube folding 0,694 0,220

Long-term storage and retrieval -0,235 0,785

6.3 Correlations

Table 9 shows the correlations between the different variables. An eye-catching result is for example that there is no evidence that task performance correlates significantly with any variable.

6.4 Tests for Normality

Table 10 shows the tests for normality for all the different subtests. These statistics will be used in the discussion to evaluate the item difficulty in more depth.

(30)

Table 9 Correlation Matrix Mean SD 1 2 3 4 5 6 7 1 Gender 0,51 0,50 2 Age 20,39 2,65 -0,017 3 Education 4,96 0,58 -0,039 0,668** 4 Tenure 39,97 31,39 -0,133 0,812** 0,496** 5 Test score 11,40 2,40 -0,172 0,271* 0,240* 0,187 6 Number series 7,25 1,77 -0,247* 0,09 0,043 0,139 0,456** (0,489) 7 Progressive matrices 9,20 6,25 -0,222 0,174 0,248* 0,112 0,804** 0,353** (0,745) 8 Comprehension knowledge 17,18 1,75 -0,104 0,073 0,046 0,03 0,364** 0,084 0,235* 9 Paper folding 8,48 4,82 -0,125 0,292* 0,217 0,202 0,598** 0,217 0,251* 10 Cube folding 2,74 2,80 -0,014 0,270* 0,204 0,211 0,507** 0,177 0,282*

11 Long-term storage and retrieval 17,16 4,64 0,072 0,037 -0,025 0,021 0,511** 0,206 0,235*

12 Task performance 5,55 0,65 0,088 0,018 0,06 0,072 0,021 -0,127 0,135

13 OCBO 4,17 1,37 -0,177 0,475** 0,318** 0,616** 0,240* 0,13 0,233*

14 Overall performance 4,86 0,90 -0,103 0,367** 0,263* 0,493** 0,19 0,053 0,226

Notes. Gender coded as: Male = “0”, female = “1”. Education coded according to Standaard Onderwijsindeling 2006 (Centraal Bureau voor de Statistiek, 2014). Tenure measured in months. Reliability statistics on the diagonal. N = 75.

(31)

31 Table 9 (continued) Correlation matrix 8 9 10 11 12 13 14 1 Gender 2 Age 3 Education 4 Tenure 5 Test score 6 Number series 7 Progressive matrices 8 Comprehension knowledge (,341) 9 Paper folding 0,177 (,879) 10 Cube folding 0,264* 0,379** (,693)

11 Long-term storage and retrieval -0,03 -0,05 0,107 (,906)

12 Task performance -0,153 -0,078 -0,032 0,046 (,853)

13 OCBO 0,001 0,061 0,072 0,196 0,542** (,940)

14 Overall performance -0,054 0,019 0,043 0,165 0,770** 0,954** (,936)

Notes. Gender coded as: Male = “0”, female = “1”. Education coded according to Standaard Onderwijsindeling 2006 (Centraal Bureau voor de Statistiek, 2014). Tenure measured in months. N = 75.

(32)

Table 10

Tests for Normality

Shapiro-Wilk Sig. Skewness Kurtosis

General mental ability 0,988 ,676 0,175 0,155

Number series 0,947 ,003 -0,560 0,348

Progressive matrices 0,969 ,061 -0,349 -0,338

Comprehension knowledge 0,926 ,000 -0,540 -0,207

Paper folding 0,926 ,000 0,751 -0,207

Cube folding 0,946 ,003 0,605 -0,448

Long-term storage and retrieval 0,662 ,000 -2,064 3,574 Notes. Degrees of freedom = 75.

6.5 Regression analyses

Based on the literature outlined above a hierarchical regression analysis was conducted that tested a model in which the job performance was predicted by the general mental ability as measured by the CJKR test, while controlling for age only. In this first analysis job performance was measured by task performance and OCBO combined (Cronbach’s alpha = ,936). The model itself was significant (F(2, 72) = 6,040, p = ,004, ΔR2 = ,135) explaining 14,4 percent of the variance in job

performance, but this was largely due to the influence of control variable Age (B = 0,116, t(72) = 3,010, p = ,004), since the general intelligence score did not come near a significant influence (B = 0,037, t(72) = 0,859, p = ,393, ΔR2 = ,009) on job

performance.

As the results above do not fall in line with the theory, an alternative model was created, based on the findings so far. As the correlation matrix shown in Table 9 indicates, using OCBO as the performance measure is likely to yield more significant results. Next to this, tenure was added as a control variable as this variable also correlated significantly with OCBO at the 0,01 level.

The above-mentioned model is found to be significant and explains as much as 40 percent of the variance in job performance (F(3, 71) = 15,776, p < ,001, ΔR2 = ,019). However, the influence of general mental ability on job performance (OCBO in this case), is still not significant (B = 0,082, t(71) = 1,507, p = ,136), although it is clear that the direction of the effect is positive and that the effect is closer to

(33)

33

significance than it was before. The significant effect of the control variable age on performance has disappeared and even turned negative (B = -0,063, t(71) = -0,760, p = ,450). The other control variable, turnover, has a positive and significant effect on OCBO (B = 0,030, t(71) = 4,363, p < ,001).

To test the relationship between highest level of education and general mental ability (hypothesis 2) a third hierarchical regression analysis was run, where age was used as a control variable. This model, with education as the independent variable, came just short of significant (p = ,051) while it explains 8 percent of the variance in general mental ability (F(2, 72) = 3,110). However, the effect of highest level of education is not significant (B = 0,447, t(72) = 0,703, p = ,484), nor is the controlling effect of age (B = 0,181, t(72) = 1,310, p = ,194).

The fourth and last hierarchical regression analysis consisted of three models. In the first model the control variable age (B = 0,125, t(73) = 3,374, p = ,001) was entered as the only predictor of job performance, leading to an overall significant model (F(1, 73) = 11,384). In the second model, general mental ability was added as a predictor (B = 0,037, t(72) = 0,859, p = ,393), leaving the model as a whole

significant (F(2, 72) = 6,040, p = ,004). In the third model, the different subtests were added. Not only the analysis indicated that the effects of the different subtests were not significant, also the significance of the model as a whole dropped to 0,027 (F(7, 67)). The effects of the different subtests are shown in Table 11.

Table 11

Hierarchical Regression Analysis Containing Age, General Mental Ability and The Different Subtests as Predictors of Job Performance

Variables/step (1) (2) (3)

Age 0,367** 0,341** 0,381**

General mental ability 0,097 0,404

Number series -0,068

Comprehension knowledge -0,15

Paper folding -0,252

Cube folding -0,112

Long-term storage and retrieval -0,046

R ,367 ,379 ,451

R2 ,135 ,144 ,204

ΔR2

,009 ,06

Notes. All beta values in the table are standardized regression coefficients; N = 75. * p < ,05; ** p < ,01;

(34)

34

6.6 Hypothesis testing

Below the process of hypothesis testing is described, based on the results above and additional analyses.

6.6.1 Hypothesis 1: Relationship between GMA and job performance

Hypothesis 1 states that general mental ability as measured by the CJKR test has a positive relationship with job performance. In the regression analysis above, two models describing this relationship were tested. Both models are significant: the model in which GMA predicted overall performance (task performance and OCBO combined) while controlling for age has a p-value of 0,004 and the model in which GMA predicted OCBO while controlling for age and tenure has a p-value smaller than 0,001. However, in both models the effect of general mental ability on job performance is not found to be significant (p-values of 0,393 and 0,136 respectively). Therefore hypothesis 1 is not supported.

6.6.2 Hypothesis 2: Relationship between education and GMA

Hypothesis 2 states that highest level of education has a positive relationship with general mental ability as measured by the CJKR test. Although the correlation matrix suggests that this effect is present at the p < ,05 level (r = ,240), the effect disappears when controlling for age, as is shown in the third regression analysis. Both the model itself (p = ,051) and the effect of education on GMA (p = ,484) are not significant. Therefore hypothesis 2 is not supported.

6.6.3 Hypothesis 3a, 3b, 3c and 3d: Relationship between subtests and job performance

These hypotheses state that all the different subtests do not explain additional variance in job performance above and beyond the variance already explained by general mental ability. As none of the subtests have a significant effect on job performance in the fourth hierarchical regression analysis (Table 11), these hypotheses are supported.

6.6.4 Hypothesis 4: subtests correlations

Hypothesis 4 states that the four subtests correlate moderately with each other (,20 to ,60). The correlations are shown in Table 12. As can be seen, three of the six

(35)

35

correlations are significant and in the ,20-,60 range. A fourth correlation (between comprehension knowledge and fluid reasoning) is also in that range (r = ,215) but not at a significant level. The correlation between visual processing and long-term storage and retrieval is very close to zero (r = ,041), as is the correlation between

comprehension knowledge and long-term storage and retrieval (r = -,03) the direction of which is even negative. Because of these outcomes hypothesis 4 is partially

supported.

6.6.5 Hypothesis 5a, 5b, 5c, 5d and 5e: gender differences in GMA

To assess the differences in performance on the CJKR test between men and women an Independent Samples T-test was conducted to compare the means of men and women. The results are shown in Table 13.

Hypothesis 5a states that there is no relationship observed between gender and general mental ability as measured by the CJKR test. This hypothesis is supported (p = ,139.

Hypothesis 5b states that there is a relationship between gender and fluid reasoning (Gf) as measured by the CJKR test, so that males score higher than females. This hypothesis is supported, since the one-tailed p-values of number series and progressive matrices are both below 0,05 (0,017 and 0,029 respectively).

Hypothesis 5c states that there is a relationship between gender and

comprehension knowledge (Gc) as measured by the CJKR test, so that females score higher than males. This hypothesis is not supported (p = ,189).

Hypothesis 5d states that there is a relationship between gender and long-term storage and retrieval (Glr) as measured by the CJKR test, so that females score higher than males. This hypothesis is not supported (p = ,27).

Hypothesis 5e states that there is a relationship between gender and visual processing (Gv) as measured by the CJKR test, so that males score higher than females. This hypothesis is not supported since the one-tailed p-values of paper and cube folding are both above the 0,05 level (p = ,145 and ,452 respectively).

The implications of the above-presented results will be outlined in the discussion that follows in the next chapter.

(36)

36 Table 12

Correlations of the Different Sub-Abilities

1 2 3

1 Fluid reasoning (Gf)

2 Comprehension knowledge (Gc) ,215

3 Long-term storage and retrieval (Glr) ,268* -,03

4 Visual processing (Gv) ,347** ,269* ,041

Notes. * Correlation is significant at the 0,05 level (2-tailed). ** Correlation is significant at the 0,01 level (2-tailed). N = 75.

Table 13

Results of The Independent Samples t-test for Equality of Means

Difference t df Significance

General mental abilitya 0,82377 1,496 73 ,139

Number seriesa 0,86602 2,177 73 ,033

Progressive matricesb 2,75249 1,936 67,513 ,057

Comprehension knowledgea 0,35915 0,889 73 ,377

Paper foldingb 1,19701 1,071 60,539 ,289

Cube foldinga 0,07984 0,123 73 ,903

Long-term storage and retrievala -0,66275 -0,616 73 0,540 Notes. A positive difference indicates a higher mean for men. Significance is two-tailed.

a

Equal variances assumed using Levene's Test for Equality of Variances. b Equal variances not assumed using Levene's Test for Equality of Variances. N = 75.

(37)

37

Chapter 7: Discussion

In this discussion the results will be interpreted, along with an evaluation of the research process. Furthermore, suggestions for future research and implications will be given, both for theory and for practice.

The goal of this research was the development and validation of a test of general mental ability to use within the CJKR project. The hypotheses that were developed to assess the concurrent validity were not supported. The hypotheses describing the relationship between GMA and job performance (hypothesis 1) and the relationship between education and GMA (hypothesis 2) were not supported. The hypothesis that all the different subtests do not explain additional variance in job performance above and beyond the variance already explained by general mental ability was supported (hypothesis 3). Hypotheses 4 and 5, on subtest correlations and on gender differences, were both only partially supported.

Given the solid theoretical foundation that this CJKR test was build on these results might come as a surprise. There are two explanations for these results: on the one hand the content and set-up of the CJKR test will be reviewed, and on the other hand the environment in which the test was validated will be taken into account. The former is very important in the ongoing development process of the CJKR test and hence has major practical implications. The latter might not only be a reason why this validation process did not yield the expected results, but also turn (future) research in a direction as it comes to job performance in combination with young people, part-time work and specific social processes.

7.1 Evaluation of the content of the CJKR test

The review of the content of the CJKR test is set up as a walk along the different subtests. These subtests are all reviewed in depth and strengths and weaknesses are pointed out.

The first test that the participants were asked to fill in were the number series items. The results indicate that questions were partially very easy (the easiest two questions were answered correctly by respectively 100 and 97,3 percent and partially very hard (the hardest two questions were answered correctly by respectively 29,3 percent and 33,3 percent of the respondents). As there were only ten number series

Referenties

GERELATEERDE DOCUMENTEN

This offers an interesting perspective on our findings that the com- bination of sexual abuse and emotional maltreatment (and not emotional maltreatment alone) was related to

In addition, the spectrotemporal structure revealed three major changes: (1) a helium-concentration- dependent increase in modulation frequency from approximately 1.16 times the

Dan trachten ouders door middel van negatief affect het disruptieve gedrag te stoppen (Patterson, 2002). Het doel van het huidige onderzoek is om meer inzicht te krijgen in

Table 4-4: Preparation of the different concentrations of quinine sulfate solution used for the linear regression analysis of the method verification of the dissolution

Second, human capital is considered to be the most valuable asset of the firm at nascent ventures (Delmar &amp; Shane, 2006). The effect of emotional conflict on performance

In feite is het voor ons veel gemakkelijker om de koeien het hele jaar door binnen te houden: meer grasopbrengst van de weidepercelen; omdat het buiten zeer warm kan zijn, zien we

Moreover, as there exist several methods to match individuals with the aid of propensity scores, some of these methods are reviewed to make sure the best method for this research

Het concept oordeel van de commissie is dat bij de behandeling van relapsing remitting multiple sclerose, teriflunomide een therapeutisch gelijke waarde heeft ten opzichte van