Three meta-analyses and publication bias analyses : does statistical discrimination explain labor market discrimination against immigrants, women, and homosexuals?

(1)

1 Three meta-analyses and publication bias analyses: Does statistical discrimination explain labor market discrimination against immigrants, women, and homosexuals?

Master thesis Work & Organizational Psychology

Author: Bouke Nederhof

Studentnummer 10127283

Supervisor: Dr. Jan te Nijenhuis

Section: Work & Organizational Psychology

Year: 2015

Location: Amsterdam, The Netherlands Institution: University of Amsterdam

(2)

2 Abstract

The theory of statistical discrimination (Arrow, 1973; Phelps, 1972) states that between-group differences in cognitive abilities, personality, and other job-relevant characteristics are

reflected in between-group differences in job performance and, therefore, account for labor market discrimination against immigrants, women, and homosexuals. We performed meta-analyses on audit studies on labor market discrimination (immigrants: K = 159, N = 39,796; women: K = 86, N = 50,969; homosexuals: K = 22, N = 7,690) and, in addition, we tested whether publication bias had influenced the results from our meta-analyses. Our results provided little support for our hypotheses. We did find that immigrants were discriminated against less by non-profit organizations, which could serve as support for the role of statistical discrimination. We also found strong indications of publication bias. A number of alternative explanations and directions for future research were discussed. Misinterpretations of

moderators in our data sets and different perspectives on how statistical discrimination could be detected could have accounted for the fact that most findings were not in line with our predictions. While more research is strongly recommended, we propose a model in which both taste discrimination and statistical discrimination account for the total labor market discrimination against immigrants, women, and homosexuals.

Acknowledgement

I would like to thank Jan te Nijenhuis for his supervision and encouragements during the whole process of writing this thesis, and Edwin van Hooft for his role as the second assessor of this project. I also want to thank David Verhagen for providing me with the data from his initial search of field studies on labor market discrimination. Finally I would like to thank Mike

(3)

3 McDaniel, Hanna Rothstein, and Michael Borenstein for their help in running the analyses on

publication bias.

Table of contents 1 Introduction

1.1 Audit studies on labor market discrimination 1.1.1 Three types of audit testing.

1.1.2 Signaling testers’ race, sex and sexual orientation. 1.1.3 Limitations and advantages of audit studies.

1.1.4 Purpose of audit studies on labor market discrimination. 1.2 Statistical discrimination theory

1.2.1 Statistical discrimination and stereotyping. 1.2.2 Statistical discrimination in the labor market.

1.2.2.1 Necessity of selection procedures. 1.2.2.2 Discrimination as rational behavior. 1.3 Job-relevant group differences: intelligence

1.3.1 Ethnic and racial differences in IQ scores. 1.3.1.1 Diminishing differences?

1.3.1.2 Average national IQ scores.

1.3.2 Ethnic differences in measures of job performance. 1.3.3 Sex differences in measures of cognitive ability.

1.3.3.1 Differences in variation 1.3.3.2 Differences on scales.

1.3.4 Sex differences in non-cognitive abilities. 1.3.4.1 Differences in personality.

(4)

4 1.3.6 Differences in cognitive abilities for homosexuals.

1.3.7 Summary.

1.4 Testing the theory of statistical discrimination 1.4.1 Publication bias.

1.4.1.1 Abundance of positive results in psychology. 1.4.1.2 Examples of publication bias.

1.4.1.3 Sociopolitical diversity in social science. 2 Current study

2.1 Alternative explanations

2.2 Decision makers from minority groups

2.3 Non-profit organizations and affirmative action programs 2.4 Discrimination in the different stages of the selection procedure 2.5 Job complexity

2.6 Differences between ethnic groups

2.7 Male-dominated, female-dominated, or mixed jobs 2.8 Lesbian and gay applicants

2.9 Publication bias 3 Methods

3.1 4/5ths rule

3.2 Collecting studies for the meta-analysis 3.3 Selection of studies

3.3.1 Vacant job openings or open applications. 3.3.2 Comparability of paired testers.

3.3.3 Amount of information on résumés. 3.3.4 Type of audit technique used.

(5)

5 3.3.5 Specific criteria per hypothesis.

3.4 Effect size measure

3.5 Categorizing groups according to their average IQ scores 3.6 Coding for job complexity

3.7 Categorizing profit and non-profit organizations 3.8 Testing for publication bias

3.8.1 Statistical analyses. 3.8.1.1 Funnel plots.

3.8.1.2 Begg’s and Egger’s test. 3.8.1.3 Orwin’s Failsafe N.

3.8.1.4 Duval and Tweedie’s trim and fill. 3.8.1.5 Cumulative meta-analysis.

4 Results

4.1 Overall results 4.2 Profit and non-profit 4.3 Type of audit testing 4.4 Minority decision makers 4.5 Job complexity

4.6 Average IQ score 4.7 Sex-specific jobs 4.8 Homosexuals

4.9 Differences per continent

4.9.1 Results for ethnic discrimination per continent. 4.9.1.1 Profit or non-profit.

(6)

6 4.9.1.3 Job complexity.

4.9.1.4 Average IQ score.

4.9.2 Results for homosexuals per continent. 4.10 Publication bias analyses

4.10.1 Ethnic discrimination.

4.10.1.1 Studies on Ethnic discrimination conducted in North-America. 4.10.1.1.1 Published and unpublished studies.

4.10.1.1.2 Type of audit testing. 4.10.1.1.3 Source.

4.10.1.2 Studies on ethnic discrimination conducted in Europe. 4.10.1.2.1 Published and unpublished studies.

4.10.1.2.2 Type of audit testing. 4.10.1.2.2 Source.

4.10.2 Sex discrimination.

4.10.2.1 Published and unpublished studies. 4.10.2.2 Type of audit testing.

4.10.2.3 Country. 4.10.2.4 Source.

4.10.3 Data set on sexual-orientation discrimination. 4.10.3.1 Published and unpublished studies. 4.10.3.2 Country.

4.10.3.3 Source. 5 Discussion

5.1 Alternative explanations

(7)

7 5.1.2 Non-profit and AAP organizations.

5.1.3 Types of audit testing. 5.1.4 Job complexity.

5.1.5 Intelligence of decision makers. 5.1.6 Discrimination regardless of IQ.

5.1.7 Difference between Europe and North-America. 5.1.8 Discrimination against women.

5.1.9 Discrimination against homosexuals.

5.1.10 The abundance of positive results from psychological studies. 5.1.11 Difference between Europe and North-America.

5.2 Limitations

5.2.1 Methods for calculating net discrimination. 5.2.2 Missing information about decision makers. 6 Conclusion

(8)

8 Three meta-analyses and publication bias analyses: Does statistical discrimination explain labor market discrimination against immigrants, women, and homosexuals?

1 Introduction

Mustafa and Peter both apply for an entry-level accountant position at a company in Amsterdam, the Netherlands. Although their résumés are equal in educational background and work experience, a common scenario is that Peter is invited for a job interview and Mustafa is not, based on his Moroccan descent (Andriessen, Nievers, Dagevos, Faulk, 2012; Bovenkerk, Gras, Ramsoedh, Dankoor, & Havelaar, 1994).

To date, discrimination towards minority groups, women, and homosexuals is an important subject for politicians, for scientists, and is often encountered in the public debate. Direct discrimination occurs when people are treated unequally on the basis of unchangeable characteristics, such as race, sex, religion, age, physical disabilities, mental disabilities, or sexual orientation. Discrimination is morally unacceptable behavior and is prohibited by law in many countries (Fredman, 2011), but it still occurs. The report “Discrimination in the EU in 2012” (European Committee, 2012) estimated that seventeen percent of all EU citizens have had experience with being discriminated against and one third of the respondents reported witnessing or hearing of discrimination within the twelve months prior to the survey (European Committee, 2012). Ethnic origin (56%) was generally seen as the most common ground for discrimination, but also having disabilities (47%) and being homosexual (46%) were seen as common reasons to be discriminated against. Twenty-seven percent of the Europeans who belong to an ethnic minority group reported that they had felt discriminated against on grounds of ethnic origin (European Committee, 2012).

(9)

9 An important task for scientists has been to gain a better understanding of

discriminatory behavior in order to help governments, organizations, and individuals to reduce the occurrence of discrimination. For instance, experimental research in the field of social psychology has focused on identifying and understanding the mechanisms and factors that lead to discriminatory behavior (Brown, 2010). Discriminatory behavior has been found to be highly complex and influenced by many individual and situational factors (Brown, 2010). Also, a large number of field studies have been conducted to measure levels of direct discrimination that members of minority groups face in different aspects of daily life (Riach & Rich, 2002).

For our study, we collected available data from a specific category of field studies on labor market discrimination, namely audit studies. In audit studies, two testers apply for the same job vacancy after they were carefully matched on as many job-relevant characteristics as practically possible and were expected to differ only on their race, sex, or sexual orientation. The difference in unequal treatment between the two testers in the selection procedure is considered as a direct measure of discrimination (Riach & Rich, 1991, 2002). However, a weak point of these audit studies is that they cannot control for between-group differences in, for instance, job performance, when they exist in the population. By combining all data for meta-analyses we were able to test the theory of statistical discrimination (Arrow, 1973; Phelps, 1972), which states that discrimination occurs because decision makers base their decisions on statistical information about groups. In the context of this study, this occurs when hiring personnel. When groups differ on job-relevant characteristics and when information about an individual is only partially available or difficult to obtain, direct

discrimination can be explained as rational behavior (Arrow, 1973; Phelps, 1972). This theory from Arrow (1973) and Phelbs (1972) offers an alternative explanation of discriminatory behavior for the ‘distaste theory’ by Becker (1957), which assumes that discriminatory

(10)

10 behavior is the result of a ‘general distaste’ from members of the majority group towards members of minority groups.

So, in this study we performed meta-analyses for audit studies on labor market discrimination based on race, sex, and sexual orientation. Our goal was to add to the knowledge on underlying mechanisms of direct discrimination by testing the theory of

statistical discrimination. We used data from audit studies that measured direct discrimination in real-life situations and we meta-analytically combined the results from thirty-eight studies on race discrimination with a total of 39,796 observations, twelve studies on sex

discrimination with a total of 50,969 observations, and six studies on sexual orientation discrimination with a total of 7,690 observations. Our results allowed us to give reliable answer to our research question: “Does the theory of statistical discrimination explain labor market discrimination?”.

First, we will discuss the different forms of audit testing, their limitations, and their purpose. Then, we will discuss the theory of statistical discrimination and how it could explain labor market discrimination. Then, we will discuss how group differences in the population could lead to group differences in job performance, which, we hypothesize, could lead to statistical discrimination. We also explain why we expected that publication bias might threaten the accuracy of the meta-analytical estimates that we used to test our hypotheses.

1.1 Audit studies on labor market discrimination

1.1.1 Three types of audit testing. In audit studies, participants are paired, trained, and then sent to apply for a job. Audit testing is used to measure direct discrimination in different stages of the application process and therefore three types of audit testing can be distinguished: résumé testing (also known as correspondence testing), telephone testing, and in-person testing. In résumé testing, written applications are carefully matched on essential

(11)

11 characteristics to make sure that they only differ on group membership, such as race, sex, or sexual orientation. Unequal treatment between the two groups is observed when one of the fictitious applicants is invited for a job interview while a negative response is received for the other applicant of the same pair. This technique enables researchers to control the content of the application and therefore is considered the most objective procedure to measure

discrimination in the field. Net discrimination rates are calculated by subtracting the percentage of unfavorable treatment of the majority applicants from the percentage unfavorable treatment of the minority applicants (Weichselbaumer, 2004). In telephone testing and in-person testing, one applicant of a specific group and one of the other group are carefully matched. Qualifications and presentation styles are matched as closely as possible so that they are expected to differ only in the characteristic which is believed to cause

discrimination, such as race, sex, or sexual preference. The matched pairs are trained to match their overall demeanor and in what to respond to questions about their backgrounds, personal characteristics, schooling, qualifications, and job experience (Riach & Rich, 2002).

1.1.2 Signaling testers’ race, sex and sexual orientation. The dependent variables race, sex, or sexual orientation are controlled in different ways for every type of audit testing. In résumé testing, the race of testers is varied by choosing typical names that are generally recognized as being typical Afro-American, Moroccan, or, for example, Asian. Information about one’s race can also be given by providing information about language proficiency or by including a photo on the résumé (Riach & Rich, 2002). Since mentioning one’s sex on résumés is very common, this is, in combination with choosing a typical name for men or women, an easy part of the résumé studies (Petit, 2007; Weichselbaumer, 2004). Indicating one’s sexual preference, however, requires some more delicacy since it is not common to mention it directly on résumés. In most studies, sexual orientation was indicated by mentioning membership of an organization for gays and/or lesbians. These were often

(12)

12 student network organizations or organizations that defended equal rights for gays and

lesbians (Drydakis, 2009; Weichselbaumer, 2003). Telephone testing has only been used for audit studies on direct discrimination based on race. Race is signaled to decision makers by controlling testers’ accents, in which they are trained if necessary. Because a very strong accent would potentially be a reason for employers not to hire a person, testers are usually trained to speak with slight accents, so that their race is easily recognized through the telephone, but is not disturbing for a good performance on the job (Allasino, Venturini, & Zincone, 2004; Goldberg, Mourinho, & Kulke, 1996). In-person testing has only been used for measuring direct discrimination based on race and sex and in both cases the dependent variable was easily controlled by sending testers of, respectively, different race and sex (Bovenkerk et al., 1995).

1.1.3 Limitations and advantages of audit studies. There are some

methodological limitations to the method of audit testing. First, due to the necessity of the thorough preparation and training, participants are always aware of the purpose of the study and are therefore subject to experimenter bias. So, there is always a risk that participants will alter their behavior because of this, and thus audit studies are not double blind (Bertrand & Mullainathan, 2004). Second, participants are role playing instead of actually applying for the job. This could influence their behavior and the behavior of the decision makers they face. The third and the biggest threat for the internal validity of quasi-experiments is selection: the experimental group and the control group differ on a variable that is causally related to the outcome variable. It is difficult to assess how effective procedures for selecting, training, and matching pairs of testers are in ensuring that pairs are identical in job-relevant characteristics except their race, sex, or sexual orientation (Cook & Campbell, 1979). These limitations are all true for telephone and in-person testing but do not apply to résumé testing which is the most frequently used technique (Bertrand and Mullainathan 2004; Pager 2007).

(13)

13 Even with these limitations taken into account, audit testing seems a reliable method for measuring direct discrimination (D’Souza, 1995). By making a concentrated effort not to have differences in job-relevant characteristics between the two testers, they provide more direct evidence about the impact of discrimination than studies on wage regressions (Pager, 2007). Because audit studies measure direct discrimination in a real-life environment with real job openings and real employers, their results are generally considered to be more reliable predictors of discrimination in real-life situations than results from experimental studies that were conducted in laboratory settings (Baumeister, Vohs, & Funder, 2007; Tilcsik, 2011).

1.1.4 Purpose of audit studies on labor market discrimination. In general, audit studies on labor market discrimination are conducted in order to estimate the average levels of direct discrimination that members of specific groups encounter when applying for jobs. Access to the labor market is important to enable specific groups to fully integrate in a society and therefore discrimination is an important subject for politicians and scientists (Bovenkerk et al., 1995; Kaas & Manger, 2011). The results of field studies on direct discrimination have been often used as input for governmental reports on new or existing anti-discrimination legislations (Allasino et al., 2004; Bovenkerk et al., 1995). Some studies have tried to explore if levels of discrimination were influenced by specific moderators like age, company size, job-complexity, generation of immigrant group, and group membership of the decision maker (e.g. Andriessen, Nievers, Dagevos, Faulk, 2012; Bovenkerk et al., 1995; Carlsson, 2010). By measuring different moderators, these studies have provided more information on factors that could influence the level of labor market discrimination (Bovenkerk et al., 1995).

In this study we used data from all audit studies on labor market discrimination that were conducted over approximately forty-five years in many different countries. Individual studies are limited in the number of tests that were conducted, the specific area in which the tests took place, and the number of minority groups that were included. Combining the results

(14)

14 from all audit studies provides us with far more accurate estimates of levels of labor market discrimination for specific groups and moderating factors. And, while the majority of these audit studies was not conducted with the goal of testing the theory of statistical

discrimination, we used the combined data to test this theory. 1.2 Statistical discrimination theory

The theory of statistical discrimination states that labor market discrimination is the result of profit-maximizing behavior and the uncertainty about the actual productivity of individual applicants (Arrow, 1973; Phelbs, 1972). Thus, it states that discrimination is not based on a distaste towards people from certain minority groups, but rational behavior based on decision makers’ perception about average group differences in job performance that reflect actual group differences on job-relevant characteristics. When information about an individual is not – or only partially – available and/or is costly to obtain, decision makers are tempted to base their decision on their perception of group differences in job performance (Arrow, 1973; D’Souza, 1995; Phelbs, 1972).

Statistical discrimination is acceptable and legal in some contexts. For example, when women in the United States with a family history of breast cancer need life insurance, it is legal for insurance companies to offer them insurance under more stringent terms. However, statistical discrimination is illegal in the case of racial profiling and the consideration of race when developing a profile of suspected criminals by the police. Denying individuals a job based on their group membership is also illegal, but, according to the theory of statistical discrimination, the same principles apply.

1.2.1 Statistical discrimination and stereotyping. Although the theory of statistical discrimination originates from economics, it does not conflict with theories from social psychology about stereotyping and its influence on human behavior. Stereotypes can influence the judgments that decision makers have to make about individuals, but are

(15)

15 dependent on the relative salience of information about the individual and about the group. Stereotyping can influence judgment especially when the category and the associated characteristic are relevant for the decision that has to be made (Brown, 2010). This process can occur automatically or consciously and is more likely to occur when decision makers are cognitively or emotionally pre-occupied with other concerns. It seems that when available cognitive resources are low, stereotyping can serve as a ‘short-cut’. The latter is also in line with the theory of statistical discrimination, which says that when the costs of obtaining the information about the individual are high and when the potential reward to be made with that information is equally high or even lower, decision makers tend to use information about the group (Arrow, 1973; Brown, 2010; Phelbs, 1972). In both cases, the judgment of decision makers is based on information about a group and not on information about an individual. Moreover, in both cases, the probability that decision makers judge an individual based on his group membership increases when there is limited opportunity to collect information about the individual (Arrow, 1973; Brown, 2010; Phelps, 1972).

Established group differences, for instance in job performance, may very well be reflected in stereotypes, but the accuracy of stereotypes has not been of much interest for social psychologists. Social psychologists have traditionally been more interested in the valence, operation, and changeability of stereotypes than whether they reflect actual group differences that exist in the population because: “…the socially and psychologically more important issue concerns the attributed reason or cause of that group difference (…). For reasons like this (as well as for other reasons), many social psychologists continue to be critical of the value of focusing unduly on stereotype accuracy” (Brown, 2010, p. 71). On the contrary, the theory of statistical discrimination suggests that the possible accuracy of

stereotypes should not be ignored when trying to explain discriminatory behavior. We suggest that testing the theory of statistical discrimination adds to the knowledge on discriminatory

(16)

16 behavior because, contrary to most studies in social psychology, it acknowledges the

possibility that group differences exist and – when they are relevant to the specific context – might explain labor market discrimination (Arrow, 1973; Phelbs, 1972).

1.2.2 Statistical discrimination in the labor market. We will now explain how the principles of the theory of statistical discrimination can be translated to the specific

context of labor market discrimination. The assumption that employers are motivated to make profits, for instance by reducing costs, is central to the theory of statistical discrimination. In the context of labor market discrimination, profits are made by hiring a candidate that will be productive and costs are initially made by the screening of applicants before making the decision. Costs of the selection procedure consist of the invested time of the decision makers and the use of selection instruments. Cascio and Silbey (1979) estimated the direct cost per job interview at $290 ($947 in 2014 dollars) and Cascio and Ramos (1986) estimated the costs of an assessment at $403 per applicant ($872 in 2014 dollars). So, the screening process can consist of costly elements and therefore decision makers are motivated to organize it efficiently. The selection procedure begins by inviting applicants for a job interview based on their résumés and cover letters. Screening of résumés is the least costly part of the selection procedure because it is not very time consuming and résumés and cover letters are received from applicants at no costs for the decision makers. In order to reduce costs, decision makers are motivated to invite candidates for a job interview who appear most likely to be productive future employees. Although résumés can be obtained for free, the information in résumés is not sufficient upon which to base final selection decisions, and therefore, an extra and more costly step in the selection procedure is necessary.

1.2.2.1 Necessity of selection procedures. Testing applicants before hiring them is

necessary to verify the candidate’s competencies and experience that they mentioned on the résumé. Kinard (1996, as described in Engleman & Kleiner, 1998) published a study in which

(17)

17 one hundred résumés were checked by Equifax Services from Atlanta, Georgia. A total of 68 résumés appeared to contain 129 items that were false. Kuo, Schroeder, Shah, Shah, Jacobs, and Pietrobont (2007) tried to verify the listed 596 publications by a total of 493 applicants for the Duke general surgery residency of Duke University. Their results show that 30% of the publications could not be verified.

DeKay and DeKay (2008)1 described several studies on the frequency of résumé fraud. A study from ADP Screening and Selection Services (2001, cited in DeKay & DeKay) performed 2.6 million background checks and found that 44% of the applicants were

dishonest about their work experience, 41% were dishonest about their educational history, and 23% lied about credentials or licenses. DeKay and DeKay also mention a publication from 2003 on the results of eleven surveys among human resource professionals conducted between 1998 and 2002. The estimates from the surveys about the percentages of all résumés that contain at least one falsified detail varied strongly from 16% to 67% (Aamodt, 2002, cited in DeKay & DeKay, 2008). It seems clear that information on résumés is not always trustworthy and, therefore, it is important for employers to verify the experience and skills that applicants report. Therefore, in order to select the best candidates it is important not to rely solely on the information from résumés, but to have a thorough selection procedure and at least verify the reported knowledge and experience that is reported in résumés during a job interview.

1.2.2.2 Discrimination as rational behavior. A selection of applicants are

invited to the next stage in the selection procedure. The selection of applicants who will advance to the next stage has to be based on the imperfect information from the résumés. When selectors know that the job interview is costly and that members of a specific group

1

The study from DeKay and DeKay (2008) was written for the 2008 Association for Business Communication Annual Convention and we consider it to be of modest quality because some of the cited survey researches concerned internal publications of companies that were impossible to trace. We decided to mention this study because there is limited literature on the subject of résumé fraud.

(18)

18 perform – on average – less on one or more job-related statistics, it makes economic sense to invite fewer members of that minority group for a job interview. The same principle applies to the eventual hiring decision. Although, at the end of the screening process, the decision maker has invested more time and made more costs to obtain information about the productivity of individual applicants, the motivation to reduce costs can still cause discrimination based on group membership. The information about individual candidates is never one hundred percent perfect because applicants might have been dishonest in the interview. Also, the selection instruments that were used in the screening process are not perfect in predicting an

individual’s productivity. That means that there is a chance of hiring a false-positive, which means that the hired candidate turns out to be insufficiently productive afterwards. Hiring false-positives will be costly when the screening process has to be organized again and when the falsely hired candidate causes damage to the organization by performing below standards. The costs incurred by training new hires on and off the job in order to provide them with the knowledge and skills they need to perform well on the job far exceed the costs that were incurred during the selection procedure. Thus, the risk of incurring costs by hiring a false-positive can be a reason to discriminate against individuals based on group membership.

If decision makers know that members of a specific group are less productive, it makes economic sense to invite less members of that group for a job interview and less often offer them a job after the screening process.

1.3 Job-relevant group differences: intelligence

In the field of personnel selection, the cognitive ability test has been found to be most effective in predicting an individual’s job performance, with a predictive validity of r = .51 (Salgado, Anderson, Moscoso, Bertua, & Fruyt, 2003; Schmidt & Hunter, 1998). Extensive research has also been conducted to establish whether there are group differences in

(19)

19 1.3.1 Ethnic and racial differences in IQ scores. A vast literature on group

differences in performance on tests of intelligence shows that Blacks score, on average, approximately one standard deviation (SD) lower in comparison with the White majority group (Roth, Bevier, Bobko, Switzer, & Tyler, 2001; Rushton & Jensen, 2005; te Nijenhuis & van der Flier, 2001). This difference of one SD on an IQ test equals 15 IQ points. Moreover, Roth et al. (2001) reported an average IQ score of 91 for Hispanics. Rushton and Jensen (2005) reviewed IQ scores around the world and showed averages of 106 for Northeast Asians, 100 for Whites, 85 for U.S. Blacks, and 70 for sub-Saharan Africans.

1.3.1.1 Diminishing differences? te Nijenhuis, de Jong, Evers, and van der Flier

(2004) showed that the distribution of IQ scores between Dutch and immigrants in the Netherlands are similar to averages of Whites and Blacks in the United States. They found a difference of one SD for Surinamese and Antillians, and one and a half SD for Turks and Moroccans. However, they did find a rising IQ for second-generation immigrants. The improvement was roughly estimated at seven IQ points for second-generation Turks and Moroccans. In a more recent study, Rindermann and Thompson (2013) analyzed the data from the longitudinal National Assessment of Educational Progress (NAEP) from 1971 until 2008. Over a 37-year time span, they found an increase of 11.27 IQ points for Blacks in comparison with an increase of 4.77 for Whites, which suggests that the Black-White

difference in IQ in the United States might also be diminishing. However, after conducting a systematic review on all available research on this topic, Rushton and Jensen (2005)

concluded that the results of the majority of studies on this topic indicated that the Black-White IQ difference was stable over time. So, it seems that more research on this matter is needed before drawing any definitive conclusion about the diminishing of Black-White differences in IQ scores.

(20)

20

1.3.1.2 Average national IQ scores. We conclude that ethnic and racial differences in

IQ scores have been found in numerous studies on this topic (D’Souza, 1995; Nisbett, 2009; Rushton & Jensen, 2005). We decided to draw on the work of Lynn and Vanhanen (2002, 2006, 2012), who have documented their estimates of average IQ scores for all countries in the world. We are aware that their work has been criticized for excluding data from their analyses which probably have led to underestimates of average IQ scores of some countries, for example sub-Saharan African countries (Nisbett, 2009; Wicherts, Dolan, & van der Maas, 2010). On the other hand, their work has also been positively evaluated by Rushton (2006, p. 984), who states that the national IQ score estimates by Lynn and Vanhanen “…have very high validity as measures of national differences in cognitive ability”.

1.3.2 Ethnic differences in measures of job performance. As discussed earlier, group differences in intelligence can be expected to be reflected in group differences in job performance and indeed there is also evidence on ethnic and racial differences in job

performance. Roth, Huffcutt, and Bobko (2003) performed a meta-analysis on a data set of 61 studies and found a corrected difference of d = .40 in job performance between Blacks and Whites. The corrected mean difference for Hispanics of d = .05 was considerably smaller, which is to be expected due to the fact that Hispanics also score higher on IQ tests than Blacks. Later, McKay and McDaniel (2006) performed a meta-analysis on Black-White differences in job performance on an even larger data set of 572 studies with a total of 109,974 observations and found a corrected difference of d = .38 in favor of Whites. They also found that this difference was moderated by the cognitive loading of criteria: larger differences up to d = .60 were found for criteria with a high cognitive load (McKay & McDaniel, 2006).

Group differences in average IQ scores can serve as a basis for statistical discrimination in the labor market because they lead to group differences in job performance. Although it is not

(21)

21 likely that all decision makers have detailed knowledge of group differences in IQ averages, these differences are highly likely to be reflected in their experiences with members of different minority groups. Kirschenman and Neckerman (1991) interviewed employers from the United States about their motives for hiring and found a reluctance to hire minorities based on a perceived lower average quality of work. Moreover, Hooghiemstra, Kuipers, and Muus (1990) studied the selection procedures of fifty-two companies and nine temporary employment agencies in the Netherlands and found that with concern to work proficiency 34% of respondents had only negative experiences with Turks.

So, in the case of ethnic minority groups, group differences in IQ scores are well documented and result in group differences in job performance (McKay & McDaniel, 2006; Roth et al., 2003). According to the theory of statistical discrimination, these differences will cause decision makers to discriminate against groups with low average IQ scores as a result of their motivation to conduct an efficient selection procedure in order to reduce costs and eventually select candidates that make profits for the organization (Arrow, 1973; Phelps, 1972).

1.3.3 Sex differences in measures of cognitive ability. A possible difference in average general intelligence between men and women has also been the subject of many studies, but there is still no strong consensus. In their review, Halpern and LaMay (2000) concluded that there are no sex differences in general intelligence but that there are reliable differences on some aspects of cognitive ability. However, Irwing and Lynn (2005) performed a meta-analysis of twenty-two studies with samples of university students and concluded that average IQ scores were 4.6 points higher for men in comparison with women. Although their conclusions were criticized for excluding large studies in their data set and for their statistical methods (Blinkhorn, 2005), their findings were consistent with earlier studies that also used large sample sizes (Jackson & Rushton, 2006; Lynn & Irwing, 2004). Moreover, Irwing

(22)

22 (2012) analyzed nationally representative WAIS-III data with subjects ranging from age 16 to 89 and found a difference of approximately .20 SD in general intelligence. Recently, however, Saggino et al. (2014) concluded that, based on their sample of 1168 scores on the WAIS-R intelligence test from Italian adults ranging from 65 to 84 years old, no sex difference in general intelligence was found. So, it remains uncertain if there is a difference in total IQ scores between men and women (Halpern, 2012).

1.3.3.1 Differences in variation. Where differences for men and women in total IQ

scores are not consistently found by researchers, differences in the variation in IQ scores are regularly found (Halpern, 2012). It is found that men, with a SD of 15 IQ points, are more variable than women with an SD of 13-14 IQ points (Halpern, 2012). So, at the top and bottom percentiles of the distribution of IQ scores, men are overrepresented in comparison to women while on the average range, between-sex differences are small (Halpern, 2012; Hyde, 2005). As a result, when one is, for example, looking for candidates for a high-complexity position which requires an IQ score of approximately 130 or higher, which is the top 2.3% (SD = 15, z = 2.0) of the male population, and the top 1.3% (SD = 13.5, z = 2.22) of the female population, the total pool of candidates with sufficient IQ scores consists of 63.8% men and only 36.2% women under the assumption that the total number of men and women in the population is equal.

1.3.3.2 Differences on scales. While it remains uncertain if between-sex

differences exist on average scores on intelligence tests, they do exist on several scales of the cognitive ability tests. For instance, Jensen (1998) found a difference of .75 SD that favor men on spatial skills. Halpern and Lamay (2000) found that women performed better on tests of verbal ability and perceptual speed, where men performed better on tests of spatial ability and mathematical reasoning. These differences were found cross-culturally and were not

(23)

23 48 meta-analyses on sex differences from six categories of variables: Cognitive,

Communication, Social and personality, Psychological well-being, and Motor behavior. In the category of Cognitive variables, relevant differences that favor women were found on verbal ability with differences in ‘spelling’ of d = -.45 (Feingold, 1988), in ‘language’ of d = -.40 (Feingold, 1988), and in ‘speech production’ of d = -.33 (Hyde & Linn, 1988). Differences that favor women were also found in ‘perceptual speed’ of d = -.28 (Hedges & Nowell, 1995) and d = -.34 (Feingold, 1988).

Hyde (2005) found differences that favor men on variables relating to spatial ability with differences in ‘space relations’ of d = .15 (Feingold, 1988), in ‘spatial perception’ of d = .44 (Linn & Petersen, 1985; Voyer, Voyer, & Bryden, 1995), and in ‘mental rotation’ of d = .73 and d = .56, respectively (Linn & Petersen, 1985; Voyer et al., 1995). Most differences relating to mathematics were also found to favor men with differences in ‘mathematics’ of d = .16 (Hedges & Nowell, 1995), in ‘science’ of d = .32 (Hedges & Nowell, 1995, and in

‘mechanical reasoning’ of d = .76 (Feingold, 1988). So, the summarization of the results from nine meta-analytical studies on sex differences on cognitive variables by Hyde (2005)

confirmed the findings of Halpern and Lamey (2000) that differences favoring women exist in verbal ability and perceptual speed, and differences favoring men exist in spatial ability and mathematics.

1.3.4 Sex differences in non-cognitive abilities. In addition to the differences on the verbal scales of cognitive ability tests favoring women, Hyde (2005) also reported differences that favor women from the majority of meta-analyses in the category

Communication with differences in affiliative speech of d = -.26 (Leaper & Smith, 2004), in self-disclosure of d = -.18 (Dindia & Allen, 1992), and in smiling of d = -.40 (LaFrance, Hecht, & Paluck, 2003). This supports the view that women are better in verbal and non-verbal communication, which could be an important skill for jobs where interpersonal contact

(24)

24 plays an important role. Both meta-analyses on sex differences in the category Motor

behaviors showed differences that favor men of which the largest differences in ‘throw velocity’ of d = 2.18 (Thomas & French, 1985), in ‘throw distance’ of d = 1.98 (Thomas & French, 1985), in ‘sprinting’ of d = .63 (Thomas & French, 1985), and in ‘activity levels’ of d = .49 (Eaton & Enns, 1986). Women were only scored better than men on ‘flexibility’ with d = -.29 (Thomas & French, 1985). Given the observable sex differences in physique, it might not be surprising that men are stronger and more active than women, which can give them the advantage for jobs that require physical exertion.

1.3.4.1 Differences in personality. Schmidt and Hunter (1998) reported that tests of

personality were limited in predicting job performance and reported that only

Conscientiousness had a predictive value of r = .31. These findings were criticized by Hurtz and Donovan (2000), who suggested that the data of Schmidt and Hunter (1998) contained a threat to construct validity because they contained data that were not derived from studies that measured personality using the Big 5, or Five Factor Model (FFM). Hurtz and Donovan (2000) conducted a meta-analysis using only data from FFM studies and found an operational validity of r = .20 for Conscientiousness, r = .13 for Emotional stability, and r = .11 for Agreeableness. Later, Salgado (2003) also conducted a meta-analysis where he compared the predictive value of personality for FFM studies and none-FFM studies. Based on data from FFM studies he reported operational validities of r = .28 for Conscientiousness, r = .16 for Emotional stability, and r = .13 for Agreeableness (Salgado, 2003). Predictive values were considerably smaller for non-FFM studies with operational validities of r = .18 for

Conscientiousness, r = .05 for Emotional stability. The operational validities for

Agreeableness did not differ for non-FFM studies, with r = .13 (Salgado, 2003). So, we conclude that some scales of the Big 5 personality scales are modest predictors of job performance, of which Conscientiousness is the most important predictor.

(25)

25 Results from a meta-analysis from Feingold (1994) show that sex differences in

personality exist, with women scoring higher on Anxiety with d = -.32 (falls under Neuroticism in FFM), Trust with d = -.35 (falls under Agreeableness in FFM),

Tendermindedness with d = -.91 (falls under Agreeableness in FFM), and Conscientiousness with d = -.18. He found that men scored higher on Assertiveness with d = .51 (falls under Extraversion in FFM), and Openness with d = .19. Since a substantial part of the studies that were included in the meta-analysis by Feingold (1994) were non-FFM studies, estimates about sex differences in job performance as a result of these findings should be interpreted with caution. In a more recent study, Schmitt, Realo, Voracek, and Allik (2008) analyzed samples from fifty-five countries with a total N of 17,637 on sex differences in FFM measures of personality. Their cross-cultural findings showed that women scored higher on Neuroticism with d = .40, Extraversion with d = .10, Agreeableness with d = .15, and on

Conscientiousness with d = .12 (Schmitt et al., 2008).

The predictive value of personality measures is weaker in comparison to measures of general intelligence (Schmidt & Hunter, 1998), but Conscientiousness, Emotional stability, and Agreeableness are modest predictors of job performance. The fact that women score higher than men on Conscientiousness and Agreeableness (Feingold, 1994; Schmitt et al., 2008) and no sex differences were found on Emotional stability, suggests that sex differences in personality should result in sex differences in job performance that slightly favor women.

1.3.5 Sex differences in ratings of job performance. Sex differences in job performance ratings were analyzed by Roth, Purvis, and Bobko (2012), who performed a meta-analysis on a data set with N = 45,733 from a total of 61 studies that measured job performance in both experimental and real-life settings. They found a corrected difference of d = -.11 and concluded that overall, women scored a little higher on job performance than men, but that the difference was small. They also analyzed measures of promotability: the

(26)

26 extent to which supervisors thought an individual was suitable to make a promotion. In

contrast to their findings regarding job performance, men appeared to outscore women with a difference of d = .11 on promotability ratings. Roth et al. (2012) suggested that these

contrasting findings could be explained by the fact that promotability ratings were based on expectations for future job performance, instead of actual ratings of observable on-the-job performance. Therefore, they suggested, promotability ratings were less based on factual information in comparison with job performance ratings, and thus, were more influenced by stereotypes (Roth et al., 2012). They also noted that their data set with promotability ratings (K = 8, N = 4,550) was considerably smaller than their data set with job performance ratings. So, in line with studies that suggested that there are little or no sex differences in general intelligence (Halpern, 2012; Halpern and LaMay, 2000; Saggino et al., 2014), the sex

difference in overall ratings of job performance is small. The difference of d = -.11 that shows a small difference in favor women of women in job performance ratings might reflect findings of sex differences in ratings of personality, that in their turn, also predict a small difference in job performance in favor of women (Feingold, 1994; Hurtz & Donovan, 2000; Salgado, 2003; Schmidt & Hunter, 1998; Schmitt et al., 2008). Still, the findings of Roth et al. (2012) are based on various kinds of jobs and are not specified on gender-typical jobs. Sex differences clearly exist on some scales of tests of cognitive ability, namely differences favoring women in verbal ability and perceptual speed, and differences favoring men in spatial ability and mathematics (Halpern and Lamay 2000; Hyde 2005). Combined with sex differences in variations of IQ scores (Halpern, 2012), communication and physical abilities (Hyde, 2005), and with differences in personality (Feingold, 1994; Schmitt et al., 2008), we argue that there are jobs where larger sex differences in job performance are to be expected. When jobs require very high cognitive abilities, high spatial abilities, excellent mathematical reasoning, a strong physique, or a combination of those three, one would expect men to – on average –

(27)

27 outscore women on ratings of job performance. The opposite is true for jobs that require high verbal abilities, excellent perceptual speed, strong inter-personal communication skills, or a combination of those three. Thus, we argue that there are gender-typical jobs with associated job requirements which are differentially suited to men and women.

1.3.6 Differences in cognitive abilities for homosexuals. A general finding

considering homosexuals is that they are perceived to share characteristics with the opposite sex. For instance, gay men (which will from now on be referred to as ‘gays’) are perceived to be more feminine than heterosexual men (Connell 2005, as described in Tilcsik, 2011) and lesbian women (which will from now on be referred to as ‘lesbians’) are perceived to be more masculine than heterosexual women (Weichselbaumer, 2003). The relation between sexual orientation and cognitive ability has been studied by Tuttle and Pillard (1991) who compared Wechsler test scores of homosexuals and heterosexuals who also rated themselves on degree of masculinity and femininity. In addition to findings that homosexuals are perceived to share opposite-sex characteristics, the findings by Tuttle and Pillard (1991) showed that gays rated themselves as being more feminine than heterosexual men, and lesbians rated themselves as being more masculine than heterosexual women. They did not, however, find any differences on the average total scores on the Wechsler tests between the two groups, nor on the scales that were known to be related to sex differences as described earlier.

Rahman, Abrahams, and Wilson (2003) did find some within-sex differences related to sexual orientation in verbal fluency, where the results of gays and lesbians showed patterns that were more consistent with the opposite sex. Later, results from a large internet study by Maylor et al. (2007), showed that homosexuals performed less well than heterosexuals on tests of gender-typical scales of cognitive ability like mental rotation, judgments of line orientation, category fluency, and memory for location. These findings suggest that gays show a more feminine pattern and lesbians a more masculine pattern of cognitive abilities

(28)

28 (Halpern, 2012). So, although a definite statement cannot be made, taken together the data suggest that homosexuals are more similar to the opposite sex in cognitive abilities, and therefore, are somewhat better suited for gender-typical jobs for the opposite sex, than for gender-typical jobs for the same sex. It is clear that more studies need to be carried out.

1.3.7 Summary. Summarizing, we suggest that group differences in general intelligence, in specific domains of cognitive ability, in personality, and in other job-relevant characteristics are reflected in group differences in job performance and that these differences lead to statistical discrimination by decision makers when hiring personnel. These group differences in general intelligence are well documented for ethnic and racial groups. Differences in specific domains of cognitive ability, personality, communication, and

physicality are found between sexes. The data suggest that homosexuals might have cognitive abilities that are similar to those of the opposite sex.

1.4 Testing the theory of statistical discrimination

We have chosen to perform meta-analyses on three sets of audit studies on labor market discrimination that measured discrimination against ethnic and racial groups, women, and homosexuals. Meta-analysis is a technique that aggregates the results of individual studies in order to make accurate estimates about effects sizes based on a much larger sample size than the estimates of individual studies which generally are based on relatively small sample sizes.

We have primarily chosen to perform meta-analysis in this study because it allowed us to look at moderators that were explicitly measured in some of the studies in our data set and at moderators that were not explicitly measured by the researchers carrying out the original studies, but which could be assigned for the studies we collected. An example of such a moderator is the type of audit test: résumé, telephone or in-person. Most studies only used one of these three options but after combining them all in one data set we could look at

(29)

29 differences in levels of discrimination for different stages of the selection procedure. We will explain later how our moderators will be used to test the theory of statistical discrimination. A second reason to perform meta-analysis was that a large number of field studies was available on this topic. Combining results from all available studies will enable us to give far more accurate estimates of levels of discrimination.

1.4.1 Publication bias. There is, however, an important threat to the accuracy of these estimates, called publication bias. Publication bias occurs whenever the available literature in peer-reviewed journals is not an accurate representation of the whole of

completed studies (Rothstein, Sutton & Borenstein, 2005). When publication bias is present in field experiments on direct discrimination, the results of our meta-analysis would differ from the results of all research on discrimination. Publication bias arises when studies with a particular outcome are not published and consequently, meta-analytically-derived effect sizes would be overestimates or underestimates of the real effect sizes (Bakker, van Dijk, &

Wicherts, 2012). The necessity of performing publication bias analyses is addressed by Kepes, Banks, McDaniel, and Whetzel (2012), who argue that publication bias is the biggest threat to the accuracy and validity of meta-analytically derived estimates and conclusions. Therefore, they recommend conducting publication bias analyses as part of meta-analytical reviews. In addition, we suggest that there are specific reasons to expect publication bias for social psychological research topics, and more specifically for the topic of our study: labor market discrimination.

1.4.1.1 Abundance of positive results in psychology. Fanelli (2010) analyzed

2,434 published studies from twenty different disciplines and found that 84% reported a complete or partial support for the tested main hypothesis. For studies from psychology this was over 90%, which leaves only 10% of studies that reported negative or non-significant results. A negative result in this context means that the results show the opposite effect from

(30)

30 what was predicted and a non-significant result means that no significant effect was found in either direction. The findings by Fanelli (2010) strongly suggests that the vast majority of studies with negative or non-significant results are not published.

Negative or null-results might be explained as the result of a sample size that was too small, errors in the design or execution of the experiment, or other methodological

weaknesses, but it seems unlikely that these explanations fully justify the abundance of positive results in published psychology studies. Fanelli (2010) suggested four possible explanations for his observation. First, psychology is a relatively new science and psychological researchers therefore enjoy a large freedom in the methods they choose to collect, analyze, and interpret data. This freedom might increase the chance of finding positive results. Second, behavioral sciences might be more vulnerable to the biasing effects of researchers’ expectations because data are more open to interpretation in comparison with non-behavioral sciences and because participants might – subconsciously – be aware of those expectations (Fanelli, 2010). Third, the prevalence and strength of manipulating data and results in psychology might be higher in comparison with other sciences (Fanelli, 2010). Fourth, studies with negative or non-significant results are more often not published or the hypotheses are more often adjusted after the results are known to the researchers (Fanelli, 2010). This last suggestion by Fanelli (2010) implies that both researchers and reviewers from journals might think that negative or non-significant results are less interesting to publish in scientific journals. Consequently, when results are not in line with researchers’ beliefs or expectations, the deviating outcomes might be erroneously attributed to errors in the methods or designs used in the study, and therefore will not be published. Also, when authors do not attribute their deviating outcomes to such errors, they might still decide not to submit studies with non-significant results for publication in journals because they expect that chances on publication are small.

(31)

31 Although the third explanation suggested by Fanelli (2010) considering the

manipulation of data and results is not likely to lead to publication bias, the discovery of some recent fraud cases with social psychologists does suggests that this explanation by Fanelli (2010) also has to be considered, as we will do in Discussion. Based on the findings and the fourth explanation suggested by Fanelli (2010), we suggest that there is reason to expect publication bias in all topics in psychological research. This idea is supported by previous studies on publication bias in psychological research topics.

1.4.1.2 Examples of publication bias. Wicherts (2007) found evidence of

publication bias in his meta-analytic study on stereotype threat. After collecting and adding data from unpublished studies, the cumulative effect size was considerably smaller than the effect size that was derived only from published studies (Wicherts, 2007). More recently, Bakker et al. (2012) analyzed data from thirteen meta-analyses from psychology journals and they found indications of publication bias in five of the thirteen cases. They specifically mention a meta-analytic study of Greenwald, Poehlman, Uhlman, and Banaij (2009, cited in Bakker, et al., 2012, p. 552) on the predictive validity of the Implicit Association Test (IAT). They note: “The subset of studies that concerned racial discrimination is another example of an excess of significant results…” (Bakker et al., 2012, p. 552). This quote from Bakker et al. (2012) is important because it suggests that there might be reason to expect publication bias specifically in politically sensitive topics like discrimination.

1.4.1.3 Sociopolitical diversity in social sciences. Politically sensitive topics might

be more vulnerable to publication bias because of a lack of sociopolitical diversity among social scientists (Jussim, 2012; Tetlock, 2012). Redding (2001) performed a content analysis of articles published from 1990 to 1999 in The American Psychologist and Journal of Social Issues, both often-cited journals. He found that over 95% of all articles on social policy issues advanced liberal themes or liberal policies. Redding (2001) stated that the severe lack of

(32)

32 sociopolitical diversity among psychologists would lead to research bias in policy research, discrimination against conservative scholars and students, and to a loss of credibility for the profession. He referred to examples of researchers that advanced conservative themes who were directly linked to racism like the controversy surrounding ‘The Bell Curve’ (Herrnstein & Murray, 1996), which was judged as socially irresponsible, pro-fascist, and was held to higher standards for evaluation because of its conservative claims of biological inequality (Redding, 2001).

His study was criticized by colleagues on the fact that ‘liberal’ and ‘conservative’ are both multi-dimensional terms that cannot be used to label individuals, because people can differ in their opinion on different topics and thus it is not possible to make this simple distinction between people (Sampson, 2002). Also, his sampling of journals on social policy issues was criticized because it was not considered as a representative sample of all

psychology research (Campbell et al., 2002). Redding’s (2002) response was that his claims were made about studies on topics that were related to social policies, and indeed not about all psychology research. Although critical towards the claims by Redding (2001), the majority of his colleagues agreed that it would be desirable for psychology to include more sociopolitical diversity in the profession (Campbell, 2002; Rooney, 2002; Sampson, 2002).

Eleven years later, a special issue of the journal Perspectives on Psychological Science (2012, 7) was dedicated to addressing the lack of sociopolitical diversity among social

psychological scientists. Inbar and Lammers (2012) conducted an anonymous survey among eight-hundred social and personality psychologists and found that only 6% described

themselves as having an overall conservative political view. Most striking was their finding that a third of the respondents said they would discriminate against openly conservative colleagues when reviewing their papers or when considering them for hiring or promotion decisions. Their methods were criticized by Skitka (2012) because of biased sampling, the

(33)

33 way questions were asked in the survey, a missing control condition, and a hypothesis

confirmation bias. She concluded that there might very well be liberal and ideological bias that had to be considered by the field, but that the evidence presented by Inbar and Lammers (2012) was not reliable for drawing any definitive conclusions (Skitka, 2012).

So, we are not certain whether the lack of sociopolitical diversity in social psychology has had influence on the type of research questions that have been studied and whether it has had influence on which studies were published and which ones were not. Still, it does seem reasonable to conclude that, notwithstanding limitations of the study by Inbar and Lammers (2012), the majority of psychology researchers considers themselves to be liberal. Based on the findings and examples reported by Redding (2001), we argue that is quite plausible that studies that advanced conservative themes and studies that reported findings that contrasted with liberal beliefs on politically sensitive social topics were, at least, held to higher standards for evaluations and publication (Redding, 2001, 2002; Tetlock, 2012). Consequently, these studies would be less likely to be published and thus it is likely that publication bias threatens the accuracy of meta-analyses on politically sensitive and social policy topics.

If audit studies on labor market discrimination – being a typical social policy issue and politically sensitive topic – reported results that were not in line with liberal beliefs and were indeed published less often than audit studies reporting results in line with liberal beliefs, our data set would suffer from publication bias. Without further speculation on which results were in line with liberal beliefs and which results were not, the likelihood of a lack of sociopolitical diversity among social psychologists in combination with the recommendation by Kepes et al. (2012), the lack of negative and null results in published psychology studies showed by Fanelli (2010), as well as the examples of publication bias in earlier meta-analyses by Bakket et al. (2012), make a strong case for the possibility of publication bias in our data

(34)

34 set. So, we expected that our meta-analytical estimates were either under- or overestimates of the real discrimination rates in the populations, due to publication bias.

(35)

35 2 Current study

We collected all available data from audit studies on labor market discrimination that measured ethnic discrimination, sex discrimination, and sexual-orientation discrimination. We performed meta-analyses on these data to test several hypotheses based on the theory of statistical discrimination. We also performed publication bias analyses to check whether our meta-analytically derived estimates were over- or underestimates of the actual levels of direct discrimination as the result of publication bias. Our study is relevant because it will provide answers to the question of the possible causes of direct discrimination (from now on referred to as “discrimination”) in the labor market. By performing meta-analyses we can draw conclusions based on very large data sets that contain data from field studies. These studies measured discrimination in real-life situations instead of laboratory studies which predict discrimination in daily life less well (Baumeister et al., 2007). Combining data from existing studies allowed us to assign values to various moderators which are addressed below.

2.1 Alternative explanations

In audit studies, it is important that the decision makers are not aware of the fact that they are evaluating bogus applications. The use of data collected in real-life situations also brings with it a disadvantage, namely that the nature of these studies did not allow for

inclusion of relevant measurements of moderators. These moderators would have been helpful for testing the theory of statistical discrimination. Consequently, the methods used in this study leave more room for alternative explanations for our predictions in comparison with controlled laboratory studies. We are aware of this limitation to our study but we argue that the precision of our estimates due to the large number of observations in our data set, and the value of using data that was collected in real-life situations, sufficiently counter this

limitation. The results of our study might not comprehensively answer the question of whether labor market discrimination is explained by the theory of statistical discrimination,

(36)

36 but it will show whether our meta-analytical findings are in line with the theory, and it will give guidance for future research on his matter.

2.2 Decision makers from minority groups

If discrimination is rational behavior based on statistical group differences (Arrow, 1973; Phelbs, 1972), and not behavior based on a distaste that decision makers have towards members of certain minority groups (Becker, 1957), it should make no difference if the decision maker is member of the majority or a minority group. According to the distaste theory (Becker, 1957), one would expect levels of discrimination to be zero or at least

substantially lower when minority decision makers have to evaluate job applicants from their own minority group because it seems unlikely that one has a distaste towards members of one’s own group (Becker, 1957; Brown, 2010). According to the theory of statistical

discrimination, however, it may be that minority decision makers have more experience with or knowledge of the lower average quality of work than their majority group colleagues, and will discriminate as much against job applicants from their own minority group as their colleagues from the majority group.

So, we argue that confirmation of our first hypothesis would be an indicator of statistical discrimination. Because there are some alternative explanations that are likely to have an effect on the levels of discrimination by minority decision makers against applicants of the same group, the effect might be less strong than we predicted. In Discussion we will further discuss alternative explanations and their effect on our predictions. We will test this hypothesis for all three data sets, so for discrimination against ethnic minorities, against women, and against homosexuals.

Hypothesis 1: Decision makers from discriminated-against groups show the same levels of discrimination against job applicants from their own group in comparison with decision makers from the non-discriminated-against group.

(37)

37 2.3 Non-profit organizations and affirmative action programs

The motivation to make profits is central to the theory of statistical discrimination, and therefore one would expect that discrimination against specific groups with low job

performance will occur more in companies with a strong focus on making profits. Non-profit organizations are funded by governments, so their existence is less – or not at all – dependent on having a benefit in commercial competition in comparison to profit organizations. If discrimination is indeed the result of the urge to make profits, levels of discrimination should be lower for non-profit organizations in comparison to profit organizations. In the study of Goldberg, Mourinho, and Kulke (1996) applicants applied for jobs in both profit and non-profit organizations. They found a net discrimination of 25% for non-profit and 4% for non-non-profit organizations. Bovenkerk et al. (1995) found a net discrimination of 31% for profit, and 14% for non-profit organizations. Verhagen (2008) performed a meta-analysis on data from 13 studies and his results showed that non-profit organizations indeed show less direct

discrimination. In Europe, an average net discrimination of 12% was found for non-profit, and 21% for profit organizations.

In the study of Bovenkerk et al. (1995) participants were also sent to organizations with affirmative action programs (which from now on will be addressed as ‘AAPs’). AAPs are applied by organizations with the aim of increasing the number of employees from minority groups that are underrepresented in the organization. They take race, color, sex, or national origin into consideration and give preference to members of minority groups in case of equal capability (Fullinwider, 2009). Bovenkerk et al. (1995) found a net discrimination rate of 0% for organizations with AAPs.

So, for both non-profit organizations and organizations with AAPs, we expect to find lower levels of direct discrimination against discriminated-against groups in comparison to profit organizations without AAPs. We will test this hypothesis for all three data sets, namely,

(38)

38 discrimination against ethnic minority groups, towards women, and towards gays and

lesbians.

Hypothesis 2: For non-profit organizations and organizations with AAPs, we expect to find lower levels of direct discrimination against discriminated-against groups than for profit organizations without AAPs.

2.4 Discrimination in the different stages of the selection procedure

In the labor market, audit studies use three different forms of audit testing: résumé testing, telephone testing, and in-person testing. We expect to find differences in the levels of discrimination for these three types of audit testing. This is so because the costs for collecting information that is used for making a decision is hypothesized to influence the amount of discrimination, and these costs are substantially lower in the beginning of the selection procedure. Judging résumés is least time consuming and therefore least costly, and telephone interviews are less expensive than inviting candidates for job interviews with one or two interviewers. If discrimination is caused by decision makers’ motivation to reduce costs, it would be most efficient to reject minority members in the early and less costly stages of selection procedures. When candidates pass the early phases and are invited for job

interviews, costs for the initial stages and the interviews have already been made. Moreover, when applicants have finished the job interview, the high costs for the interview have already been made and thus, reducing the costs for the selection procedure has become much more difficult at this point. Riach and Rich (2002) stated that discrimination at the initial stages of hiring accounted for nearly 50% of the total level of discrimination recorded. We expect to find that levels of discrimination are higher for studies that used résumé testing in comparison with studies that used telephone testing. Furthermore, we expect that levels of discrimination for studies that used telephone testing are higher in comparison with studies that used