• No results found

The web-based aIAT: Replicating a Lab Experiment, Confirming the Detrimental Effect of Using Negative Sentences and Adding It to the Concealed Information Test for Better Accuracy

N/A
N/A
Protected

Academic year: 2021

Share "The web-based aIAT: Replicating a Lab Experiment, Confirming the Detrimental Effect of Using Negative Sentences and Adding It to the Concealed Information Test for Better Accuracy"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The web-based aIAT: Replicating a Lab Experiment, Confirming the

Detrimental Effect of Using Negative Sentences and Adding It to the

Concealed Information Test for Better Accuracy

Tineke Slotegraaf

Student number: 10458581

Supervisor: Bruno Verschuere, Ph.D Number of words: 9680

(2)

Abstract

The autobiographical Implicit Association Test (aIAT) is a new lie detection test. By assessing the strength of association with true and false, the aIAT determines which of two contrasting statements is true. The use of small samples may have resulted in widely varying accuracies estimated (> 90% versus 68%). Experiment 1 shows the web-based aIAT is valid, using a large sample size (n = 455). Experiment 2 confirms this finding in a novel sample (n = 184) and shows that the use of negative sentences should be avoided as it erases the aIAT's validity. Experiment 3 shows that the aIAT can help improve the validity of memory detection tests. Memory detection allows to assess the recognition of critical (e.g., crime) information, but not the source of that memory. We show that memory detection recognizes crime memory in both guilty and innocent suspects, and that the aIAT allows determining the source of that crime memory.

Introduction

Most lie detection tests today are based on physiological measures (e.g., heart rate or skin conductance). To get these physiological measures sophisticated and expensive equipment is necessary, and the people looking at the data need to be highly trained professionals. This drives up the cost for a good lie detection test. Since we live in a society which relies on honesty, more and more people want a tool to quickly assess whether someone is truthful, or not. Therefore, researchers try to find better lie detection tests, which are easier to conduct and analyzed. When looking at physiological measures there are different ways of asking questions, of which simple yes/no-questions are used the most. These can be either constructed in a way that both crime-relevant questions and simple comparison questions - to have a baseline - are asked, or in a way that only crime relevant questions are asked – getting the baseline by switching the probe (i.e., crime) items by irrelevant items in 5 subsequent questions. This way the examiner can see how suspects respond to crime-items or questions compared to control items or questions. New research focuses

(3)

on reaction times either in an interview (Walczyk, Roper, Seemann & Humphrey, 2003) or in a computerized task (Agosta, Mega, & Sartori, 2011; Agosta, Pezzoli, & Sarotri, 2013; Agosta & Sartori, 2014; Kleinberg & Verschuere, 2015; Marini, Agosta, Mazzoni, Dalla Barba, & Sartori, 2012; Sartori, Agosta, Zogmaister, Ferrara, & Castiello, 2008; Vargo & Petróczi, 2013; Verschuere & De Houwer, 2011; Verschuere, Prati, & De Houwer, 2009; Verschuere, Suchotzki, & Debey, 2014).

Measuring reaction times is a way to implicitly get to knowledge and beliefs of a person (De Houwer, 2006). It is a well-known technique in psychology. An example is the Implicit Association Test (IAT), which can be used to find tendencies (i.e., discriminatory tendencies toward black people compared to white people) (Frost et al., 2010; Greenwald, Nosek, & Banaji, 2003; Greenwald, Poehlman, Uhlmann, & Banaji, 2009; Nosek, Banaji, & Greenwald, 2002; Nosek, Greenwald, & Banaji, 2005). With an implicit measure you only find the strength of an association (De Houwer, 2006). In the above example participants will have to categorize words in either “good” or “bad” and faces or first names into the categories “white” and “black”. The association with white and good will be stronger than the association with black and good for most white people, and thus show discriminatory tendencies (Frost et al., 2010; Greenwald et al., 2003; Greenwald et al., 2009, Nosek et al., 2002; Nosek et al., 2005).

The IAT was recently adjusted to serve as a lie detection test (Sartori et al., 2008). Instead of the categories “good” and “bad”, Sartori et al. (2008) used the categories “true” and “false” while using autobiographical sentences (i.e., “I am sitting in a chair”). The categories that we want to know about – in our example above “white” and “black” – were replaced by two other autobiographical

categories that can be considered “guilty” and “innocent”. Therefore, it was called the

autobiographical Implicit Association Test (aIAT). When an association between two categories is higher than the opposite combination participants will be faster in categorizing sentences when these categories are combined - then sentences need to be categorized by pressing the same key (Greenwald, et al., 2003; Greenwald, et al., 2009; Nosek et al, 2002; Nosek et al., 2005; Sartori et al.,

(4)

2008). In the aIAT guilty suspects will be faster when “guilty” stimuli (i.e., I stole the laptop from the vehicle) are combined with “true” stimuli than when “innocent” stimuli (i.e., I was at home) are combined with “true” stimuli. For innocent suspects it will be the other way around.

Experiments concerning the aIAT have been performed in a lab, where participants need to go to and get either course credits or money for their participation (Agosta et al., 2011; Agosta et al., 2013, Granhag, & Giolla, 2015; Marini et al., 2012; Sartori et al., 2008; Vargo & Petróczi, 2013; Verschuere et al., 2009; Verschuere et al., 2014).Testing online would provide a greater diversity in age and a more equal amount of participants of both genders (most participants in the lab are female

psychology students between the ages of 18 and 25), as well as larger samples of participants and a higher power for analyses (Greenwald et al., 2003; Greenwald et al., 2009, Kim, 2003; Kraut et al., 2004; Nosek et al., 2002; Nosek et al., 2005). The larger samples, which lead to greater power, are possible in online research because the money participants get paid is about 5 times smaller than when they go to the lab to participate in research.

Because of these advantages of testing online we investigated if the aIAT could be used as an online test and was capable of improving the validity of memory detection tests. For this, we performed a replication of Experiment 1 by Sartori et al. (2008) in an online setting. After we had verified the validity of the online aIAT, we checked two ways of constructing the “innocent” category – as a negation of the “guilty” category, and as an alibi – and confirmed the detrimental effect of using negative sentences found by Agosta et al. (2011). In the last experiment we use the online aIAT to improve the validity of another reaction time-based lie detection test: the Concealed Information Test (CIT). Specific hypotheses and sub questions will be discussed per experiment.

(5)

Experiment 1:

Introduction

In the first validation paper of the autobiographical Implicit Association Test (aIAT) Sartori et al. (2008) reported several experiments. In their Experiment 1 (n=37), participants pick one of two cards (e.g., 4 of diamonds versus 7 of clubs) and the aIAT predicted which card was picked with an accuracy of 95%. Here we sought to constructively replicate their Experiment 1 using our novel online aIAT.

The findings we want to replicate are an overall group effect of the aIAT, where the D-score indicates which of the two cards were seen by the participants, as well as a high ROCa value and accuracy rate. It is important to replicate these effects to show the aAIT can be used as an online test as well as in the lab. Brandt et al. (2014) created a recipe for replication which tells us to follow the methods as well as possible and make sure we have high statistical power. We used the paper by Sartori et al. (2008) to follow his method closely and have about 12 times the participants of the original

experiment to get high statistical power. Details about our replication can be found in the methods section.

We expect to find the IAT effect in our online sample, for the IAT has been used in online settings before (Nosek et al., 2002). The tests were originally published online to inform people about implicit tests and how research works, but eventually, the data collected was used to improve the scoring-algorithm and validity of the IAT (Greenwald et al., 2003; Greenwald et al., 2009; Nosek et al., 2005). For the close replication we do expect to find high values for accuracy and ROCa.

Method Participants:

We started with 507 participants. 21 participants did not end the test, 1 participant did the test twice, and 30 participants had their browsers automatically translated the stimuli (which leads to longer reaction times, since the translation starts after the sentence was displayed in English). We

(6)

removed these participants from further analysis. This leaves 455 participants (28% female, Mage = 30.80, SDage = 9.20) of whom data was analyzed.

Participants came from all over the world; 51.6% from Europe, 32.7% from Asia, 5.7% from North-America, 6.8% from South-North-America, and 3.1% from Africa. We also asked for mother tongue and included the 100 most spoken languages in a list. Only 14% reported their mother tongue to be English and 10.5% indicated their mother tongue was not on the list.

These participants were randomly assigned so that 222 participants turned over the four of diamonds (29.3% females, Mage = 30.6, SDage = 9.2) and 233 participants turned over the seven of spades (27.5% females, Mage = 31.0, SDage = 9.2). Between the groups there was no difference in gender Χ2(1) = 0.1053, p = .746 or age t(453) = 0.3728, p = .710

Procedure:

The studies were approved by the ethical committee of the Department of Psychology of the University of Amsterdam. The program is written in a similar fashion as the first online CIT by Kleinberg and Verschuere (2015). The experiment was written as web-application in JavaScript. We made sure participants could not continue when looking up the test on their phone or tablet in order to control the use of a keyboard for categorizing the stimuli. The experimental tasks can be found at http://www.lieresearch.com/?page_id=695.

The study was administered via CrowdFlower (http://www.crowdflower.com), a website where people can register to get a small money compensation ($0.50) for participating in online experiments. We advertised the study to be a 15 minute lie test. All participants agreed to an informed consent. Next they gave some personal information, about their gender, age, mother tongue, and country, using drop down menus for answering to avoid typing errors.

We asked them to pick one of two cards which were turned upside down. We randomly assigned them to one of two cards (either the four of diamonds or the seven of spades) which was turned over

(7)

when clicked upon. We made sure this card was remembered by asking to pick the card they saw from a diversity of 8 random cards, three times. They could only continue when they had clicked upon the card they had turned over.

We created 10 sentences for each category and let the participants pick which 5 sentences were true and false for them at that moment. For the cards we used 5 sentences to describe the card they took. All sentences used in the experiments are shown in table 1. Next, participants completed the aIAT as described below.

After the categorization we asked some questions about the cards (four of diamonds and seven of spades), we asked how important, positive, and negative the cards were for the participants. These questions were answered using a 9-point Likert scale (1 = not strong at all, 9 = extremely strong) using a drop-down menu. Then we showed them a similar screen as in the beginning of the

experiment, asking them to pick their card from 8 random cards, to verify they remembered which card they picked. A short debriefing text explains the test and their outcome (which card is predicted to be seen) on the last page of this experiment.

aIAT

The autobiographical IAT was performed in 7 blocks. The goal was always to categorize the sentences shown in the middle of the screen into the categories shown at both sides of the screen with the "E" (for the category on the left) and "I" (for the category on the right) key on the keyboard. Participants were told to do this as FAST as possible, while making as FEW errors as possible. When an error was made a red X would be visible under the stimuli and the error needed to be corrected to continue the test.

The test used several blocks. First both combination of categories (“True” versus “False” and “I turned over the four of diamond” versus “I turned over the seven of spades”) were used separately to get used to the stimuli and the workings of the test. In block 3 and block 4 these categories are combined (e.g., “True and I turned over the four of diamonds” versus “False and I turned over the

(8)

seven of spades”), which was interchanged for half of the participants. Block 5 was used to change response to the categories of the cards and get used to the change in response. In block 6 and block 7 the categories are combined again with the changed buttons for the categories of the cards (e.g., “True and I turned over the seven of spades” versus “False and I turned over the four of diamonds”). The reaction times of these combined blocks were used to calculate the statistics. The structure of all blocks can be found in table 2.

To make sure we could use the data a warning was shown, when participants made too many errors (> 30% of the trials) or responded too fast (< 300 ms) or too slow (> 10,000 ms) in more than 20% of the trials, participants were redirected to the start of that block to start over.

The stimuli within the blocks were shown randomly with an inter-stimulus interval between two trials of either 250, 500, or 750 ms to avoid getting a rhythm in answering to the stimuli. In the combined blocks the odd trials have true or false stimuli and the even trials have stimuli for the categories “I turned over the four of diamonds” and “I turned over the seven of spades”.

Statistics

For the statistical tests we calculated a D-score the way Greenwald et al. (2003) found to be optimal for the IAT (as can be found at:

http://faculty.washington.edu/agg/IATmaterials/Summary%20of%20Improved%20Scoring%20Algorit hm.pdf) First, we deleted the trials with reaction times quicker than 150 ms and slower than 10.000 ms (because they either could not have read the sentence before responding, or were probably not focused in that trial). Next, we calculated the mean for each combined block, using the response time after a correct answer, (block numbers 3, 4, 6, and 7) and the standard deviation of the practice blocks together (blocks 3 and 6) and of the test blocks together (block 4 and 7).

These values were used to calculate a D-score by subtracting the mean of the block with the combined category “True and I turned over the seven of spades” from the block with the combined category “True and I turned over the four of diamonds” and divide this by the standard deviation of

(9)

the blocks together. This was calculated for practice blocks (block 3 and block 6) and test blocks (block 4 and block 7) separately and the mean of these two scores was used in further analyses.

Effect sizes are reported as Cohen’s d for both within and between t-tests. These values are reported as dbetween and dwithin, in all experiments. For the ANOVA the effect size Cohen’s f is used according to the equations of Lakens (2013).

All analyses were conducted using R Studio version 0.98.1091 and the alpha level used was .05.

Manipulations

Participants were randomly assigned one of two cards to prevent one group for being much larger because of a bias of choosing one position of the cards over the other position. Another random variable decided the block order for the participants. There were two block orders since it is assumed people are faster and make less errors in the first combined block compared to the second combined block.

Results and discussion: Results

The mean D-scores of our two groups of participants (those who turned over the four of diamonds and those who turned over the seven of spades) are shown in table 3. A negative D-score means the test predicted the participant turned over the four of diamonds and a positive D-score means the test predicted the participant turned over the seven of spades. The D-scores between the groups are significantly different t(450), p < 0.001, dbetween = 1.60.1

This group effect does not imply a perfect accuracy. Of the 222 participants who turned over the four of diamonds, 77% was classified correctly. Of the 233 participants who turned over the seven of

1

We first checked if block order leads has influence on the D-scores, by using a t-test. Block order does not lead to a significant difference in D-scores t(448)=-1.31, p=0.192.

(10)

spades, 84% was classified correctly. We defined a correct classification by either a negative D-score when the participant had turned over the four of diamonds or a positive D-score when the

participant had turned over the seven of spades (Similar to Sartori et al., 2008).

To check how well our test actually did we used Receiver Operating Characteristics (ROC) analysis on the d-scores of our two groups. In ROC analysis the specificity and sensitivity are put in relation of each other. The area under the curve (ROCa) show the overall performance and can range from 0 to 1, where 0.5 indicates random classification and 1 is perfect classification. The calculations were performed using the pROC package for R. The ROCa for Experiment 1 was 0.88, which means there is a chance of 88% that a random participant who turned over the four of diamonds has a different D-score from someone who turned over the seven of spades.

Discussion:

In this experiment we replicated Experiment 1 from Sartori et al. (2008) in an online stetting. We did find the IAT effect - the D-scores are significantly different – and a high accuracy of 81% - although not as high as the original experiment (95%). Asendorpf et al. (2013) discusses that the direction of the effect alone is not enough to conclude a replication was successful, the Confidence Intervals (CIs) should overlap. Unfortunately Sartori et al. (2008) did not report these values. In our own findings we see in the ROCa 95% CI (0.85-0.91) no overlap with the high value of the original experiment (0.985). We did find a very high effect size, which indicates the online aIAT is valid and can be used in further experiments. This confirms our expectations.

A Reason for our lower accuracy rates and ROCa values could be that we did the test online.

Participants probably have more distractions in a home situation than in a cubicle. There can be kids playing, pets who want attention, a cake in the oven, etc. These distractions can lead to longer breaks which might lead to a lower aIAT effect. Another possible reason for our lower results is that only 14% is native English, all other participants understood English – since they did the test – but had a different mother tongue.

(11)

Online research makes it impossible to verify if participants understand the instructions or to make sure they actually read them. We tried to make sure they understood, by letting them pick the card they have seen from eight random cards and doing a block again when making too many errors. It is unlikely this would influence the outcome, because Capaldi (2015) shows that, on MTurk,

participants read the instructions closer than students who participated in research in the lab for credits.

Overall, we showed that, even with the limitations of testing online and a lower accuracy compared to the original findings, the aIAT is valid as an online test. The web-based aIAT can be used in future research to improve the validity of existing memory detection tests.

Experiment 2:

Introduction

In the first experiment we showed that the autobiographical Implicit Association Test (aIAT) can be used as an online test to differentiate between two possible true scenarios. One known issue with the aIAT is the type of sentences used to describe the two scenarios (Agosta et al., 2011). In police investigations it would be easy if the categories describing the innocent situation can be the negation of the sentences about the crime (I did… versus I did not …); the A/NonA method. The innocent situation can also be described by the alibi of a suspect; the A/B method. This latter method leads to more distinguishable sentences, and thus better reaction times for the innocent category (Agosta et al., 2011).

Agosta et al. (2011) compared these two methods in a series of experiments. They found the A/B method classifies a higher percentage of participants correctly compared to the A/NonA method, but never compared the methods directly. The results from previous experiments were compared to results of the data Agosta et al. (2011) gathered themselves. When the previous experiment used the method A/B, it was replicated using the method A/nonA; when the previous experiment used the

(12)

method A/nonA, it was replicated using the method A/B. Reason for this comparison was the paper by Verschuere et al. (2009) who replicated Experiment 2 by Sartori et al. (2008), a mock crime scenario, but found a much lower accuracy rates (between 61% and 86%). Agosta et al. (2011) figured this could be due to using the A/nonA method, although this method was also used in the original experiment by Sartori et al. (2008) who found an accuracy rate of 93%.

Another study investigating the detrimental effect of using negative sentences in the aIAT was based on cocain use as the guilty category (Vargo, E.J. & Petróczi, A., 2013). However, there are some major shortcomings in this research. Firstly, they used a brief version of the aIAT with less trials which skipped the blocks 1 (with the categories “True” and “False”) and 5 (reversing the target categories, “Guilty” and “Innocent”, completely. This version was not used before and should have been thoroughly tested, before used in this setting. Next, the test was administered using touch screen technology, which was never done before and should also have been tested by replicating existing literature. Lastly, all the sentences included the word “cocaine”. The nonA category included sentences like “I don’t snort cocaine”, while the B category included sentences like “I respect the law on cocaine”. This leads to less distinguishable sentences in the A/B method (Agosta et al., 2011), which can lead to the lower accuracy rates found by Vargo and Petróczi (2013).

In this experiment we will compare the two methods directly to see if there really is a detrimental effect of using negative sentences in the aIAT. We expect to find a small effect of using negative sentences, with an accuracy rate close to the findings of Verschuere et al. (2009)

Methods Participants:

We started with 424 participants. Of those 41 participants did not end the test, and 21 participants used multiple ip-addresses in one test or did the test twice. These participants were removed form the analysis, which leaves 360 participants (28% female, Mage = 30.8, SDage = 9.6) of whom the data was analyzed.

(13)

Participants came from all over the world; 45.2% from Europe, 30.9% from Asia, 11.2% from North-America, 10.1% from South-North-America, and 2.5% from Africa. We also asked for mother tongue and included the 100 most spoken languages in a list. Only 16.4% reported their mother tongue to be English, and still 14.7% said their mother tongue was not on the list.

These participants were randomly assigned so that 174 saw the four of diamonds (28.7% females, Mage = 30.2, SDage = 8.9) and 186 saw the seven of spades (27.4% females, Mage = 31.3, SDage = 10.3). Between the groups there was no difference in gender Χ2(1) = 0.0257, p = .873 or age t(358) = -1.0468, p = .296.

There were 177 participants in the method A/NonA (28.2% female, Mage = 31.1, SDage = 9.4) and 184 in the method A/B (27.9% female, Mage = 30.4, SDage = 9.9). Between these two methods there was no difference in gender Χ2(1) = 0.0257, p = .873 or age t(358)= 0.6819, p = .496.

Procedure:

The procedure in experiment 2 is exactly the same as in Experiment 1. In the aIAT, one group replicated Experiment 1 exactly - the A/B method - the other group saw negative sentences for one of the categories – the A/nonA method. In the A/nonA method we changed the category “I turned over the seven of spades” into the category “I didn’t turn over the four of diamonds”. All sentences used for this category included the phrase “four of diamonds” and a negation. These categories are similar to the ones used by Agosta et al. (2011) and give an extreme example of using negative sentences in the aIAT. We used both methods, so we could directly compare them. The D-score was calculated exactly the same way as in Experiment 1. The experimental tasks can be found at

(14)

Results and discussion

Results:

We had four groups of participants in this experiment resulting from our 2 (chosen card: four of diamonds or seven of spades) x 2 (aIAT method: A/nonA or A/B).2 The d-scores of these are shown in table 3. A positive score means that the group overall was indicated to have seen the four of

diamonds, a negative score means the test indicated the group overall as having seen the seven of spades. The A/B method replicated our findings from Experiment 1 and shows the IAT effect. The A/nonA method performed much worse, so that the group that turned over the seven of spades was indicated as having seen the four of diamonds. This shows the A/B method performs as expected, while the A/nonA method cannot distinguish between the groups.

Method A/B has an accuracy of 81% for both conditions, while method A/NonA has an accuracy of 67% overall. In the latter method the accuracy of the participants who saw the four of diamonds (96%) was much better than the accuracy for the participants who saw the seven of spades (12%).

First, we performed a ROC analysis on the complete data set, the ROCa was 0.73, which means someone who turned over the four of diamonds had a 73% chance of getting a significant different score from someone who turned over the seven of spades. Since the data showed a difference between the conditions A/B and A/NonA the ROC analysis was also performed for both conditions separately. The ROCa of condition A/NonA was 0.60, which means a random participant who turned over the four of diamonds had only a 60% chance of scoring differently from the participants who turned over the seven of spades. The ROCa of condition A/B was 0.90, thus for a random participant who turned over the four of diamonds the chance is 90% he scored differently from the participants who turned over the seven of spades. This shows that using negative sentences in the aIAT leads to a lower IAT effect.

2 Since there was no significant difference between the block orders, t(358) = -0.93, p = 0.36, we excluded that variable from the main analysis.

(15)

To find which variables made a significant difference in the D-scores we used a 2 (chosen card: four of diamonds or seven of spades) x 2 (aIAT method: A/nonA or A/B) mixed ANOVA on D-scores. There was a significant main effect of the chosen card F(1,353)=122.02, p<0.001, f = 0.59, a significant main effect of the method F(1,353)=161.58, p<0.001, f = 0.68, and a significant interaction between picked card and method F(1,353) = 63.04, p<0.001, f = 0.42.

Discussion:

In our direct comparison of two methods to construct categories in the aIAT (A/B and A/nonA), we did find a detrimental effect of using negative sentences (accuracy dropped from 81% to 67%). This matches the findings by Agosta et al. (2011) who compared the methods indirectly. The effect sizes are large, which tells us the aIAT can be used to indicate which of two events is true when avoiding the use of negative sentences. This confirms our expectations.

These results imply the aIAT can be a great lie detection test when the use of negative sentences is avoided. Unfortunately, these findings have only been replicated at the group level, so far (Agosta et al., 2011; Agosta et al., 2013; Agosta & Sartori, 2013; Sartori et al., 2008; Vargo & Petróczi, 2013; Verschuere et al., 2014; Verschuere et al., 2009). Thus more research is needed before the test can be used on individuals in the field (i.e., in a police investigation).

We did not ask participants to hide information for us, yet. They would not have any reason to try to hide which card they turned over. Some studies found effects of faking, but mainly when giving participants the optimal faking strategy (Kim, 2003; Verschuere et al., 2009). This is an issue to investigate in future research and needs to be resolved before the aIAT is ready for field trials.

With our results we confirmed the findings of Experiment 1 in a novel sample. The online aIAT is a valid test when using only affirmative sentences.

(16)

Experiment 3

Introduction

Another test adjusted to be a reaction time-based lie detection test is the Concealed Information Test (CIT) (Verschuere & De Houwer, 2011). This test finds if crucial information from a situation (i.e., a crime) is known by the examinee (Bradley, Barefoot, & Arsenault, 2011; Gamer, 2011; Kleinberg & Verschuere, 2015; Osugi, 2011; Verschuere & De Houwer 2011; Verschuere et al., 2014). Questions in the CIT are formed with multiple possible critical items. An example question could be: “Was a laptop stolen from the car?”, where “laptop” - the stolen item or probe - would be changed to wallet, smart phone, camera, radio, or necklace - irrelevant items - in sequential questions. When collecting physiological measures the example above would be the correct way to ask the questions. In Japan, where the CIT is used in real police investigations, examinees don’t even need to respond to these questions, although they will probably answer “no” to every question (Osugi, 2011). When

constructing questions for the CIT it is important that all critical items are equally plausible to avoid knowing the correct answer by just listen to the options. In the RT-based CIT participants learn one item per category/question, the target item, on which to respond with a yes-button, while

responding with a no-button to all other items, to ensure all items are read or seen. When having knowledge of the crime, response on probe items will be slower compared to response on irrelevant items.

One of the biggest limitations of the CIT is that it only measures knowledge of crime information, not how this knowledge was acquired by the suspect (Kleinberg & Verschuere, 2015; Osugi, 2011; Verschuere & De Houwer, 2011). This makes it a difficult test for the police to work with, since only crime items which were not leaked to the public, and were remembered by the perpetrator, can be used in the CIT. Even if information is not leaked by the police, there can be another way innocent suspect got the information (i.e, word of mouth, witnessed the crime). The leakage problem might be solved by adding a new test (our aIAT from Experiment 1 and Experiment 2) to the CIT. It can be used

(17)

to find where knowledge of the crime came from and if the guilty sentence (implying someone committed the crime) or the innocent sentences (the alibi) are more likely to be true.

In this next experiment we will find if addition of the aIAT to the CIT will improve the validity of the latter. We only used one order of the two tests – first the CIT, than the aIAT – since the aIAT is too specific and would leak information to the innocent suspects. We will do this by using an online mock crime setting. We expect the naïve innocent, who have no knowledge of the crime whatsoever, to be predicted to be innocent by both tests; the informed innocent, who have read about the crime, to show they have knowledge about the crime in the CIT and being predicted to be innocent in the aIAT; and the guilty, who have committed a mock crime, to show they have knowledge about the crime in the CIT and being predicted to be guilty by the aIAT.

Methods Participants

We have gathered data from 515 participants. In the pre-analysis for the CIT we deleted 95 participants for making more than 50% errors for targets, probes, or irrelevant items, and 1

participant was deleted because there were not even half of the trials left after removing those with too fast (< 150 ms) or too slow (> 800 ms) reaction times. After combining the end-files for both CIT and aIAT 54 participants were removed because of missing data. This left us with 364 participants in the remainder of the analysis (33% female, Mage = 32.76, SDage = 10.48).

Participants came from all over the world; 46% from Europe, 31% from Asia, 10% from Northern-America, 11% from South-Northern-America, and 2% from Africa. We also asked for mother tongue and included the 100 most spoken languages in a list. Only 18% reported their mother tongue to be English, and 7% indicated their mother tongue was not on the list. In this experiment we also asked for educational level: 45% finished university, 22% has a college degree, 8% finished professional training, 24% finished high school, and 1% finished only elementary school.

(18)

Over the three conditions (naïve innocent, informed innocent, and guilty) there was no difference for gender Χ2(2)=5.50, p = 0.06, or age Χ2(92) = 103.54, p = 0.19.

Procedure:

We used the same basis code for the CIT as Kleinberg and Verschuere (2015) and added the aIAT to it similar as in Experiments 1 and 2. The additional pages for the different conditions were written in the same programming language and in similar style. Because this task took 30 minutes, the money compensation was $1.00. The experimental tasks can be found at

http://www.lieresearch.com/?page_id=729.

In the first pages, participants agreed to an informed consent and gave some information about their gender, age, educational level, mother tongue, and country, using a drop down menu to answer. They were randomly assigned to one of three knowledge groups: naïve innocent, informed innocent, or guilty. We also randomly assigned them one bag and one item that would be used as the stolen bag and item in the remainder of the task. After these pages the participants performed first a CIT and after that an aIAT. After performing both tests we asked the participants what condition they were in and what bag and item were stolen to check remembrance. A short debriefing text explained the workings of both tests and the outcome, which is either being innocent, have information about the crime, or being guilty.

Differences between conditions

In the first pages there is a difference between the conditions. Naïve innocent participants read a story from an imaginary paper, the Amsterdam Chronicle, about a fire in a church in Amsterdam and are asked to remember the story. On the next page, they are asked to summarize the story and tell what they think was most significant.

Informed innocent participants were also asked to read a story from the Amsterdam Chronicle, but it was about a theft in a mall. After reading the story they are not asked to summarize it, but see the bag that was stolen for at least 15 seconds and are asked to remember the details of the bag well.

(19)

After clicking next, the bag moved to the side of the page and participants needed to answer direct questions about the color, name, function, size, and texture of the bag. On the next screen we asked them to describe in their own words which bag was stolen according to the Amsterdam Chronicle. The same procedure was performed to memorize the stolen item. Last they saw a page with all possible bags and items, they had to click both bag and item that were stolen.

Guilty participants were not asked to read a story, nor was the Amsterdam Chronicle mentioned. They first saw the bag and item they were going to steal. On the next page they were required to steal a bag from a silhouette of a woman, by dragging it to a red square at the bottom of the screen. Then, the bag was shown as a larger image and they had at least 15 seconds to remember the details of the bag. After clicking next, the bag moved to the side of the page and participants needed to answer direct questions about the color, name, function, size, and texture of the bag. Then, they were asked to describe the bag they stole in their own words. On a new page they saw a rectangle representing the open bag and were asked to steal “the most valuable item” in that bag. The other items were a pen and a couple of apples. They stole the item by dragging it to the bottom of the screen. If the wrong item was picked, they saw a popup telling them they picked the wrong item and to try it again. Remembering the stolen item and describing it was similar to the stolen bag. Last, they saw a screen with all possible bags and items and could not continue until they clicked the correct bag and item they just stole. Screen shots for all three conditions, can be found in Appendix A.

CIT

The CIT started with instructions telling the participants they were accused of a crime, but deny knowing anything about it. They are told they will see target items on which they will have to respond yes, and that they will have to respond no to all other items, even if they recognize them. The instructions also include a warning that participants will not receive money when they cheat by skipping phases, or holding down buttons.

(20)

Participants saw a bag and item, which were randomly selected from the list of bags and items different from the bag and item that were selected to be the probes. They can continue after learning the target items for at least 15 seconds. Here participants also have to select the correct target items from a page with all possible bags and items, before they can continue with the actual test.

The CIT starts with three practice phases to make sure participants know how to do the test. They are asked to press the keys “E” for yes and “I” for no. In the first practice block people will only see WRONG in red below the stimulus if they press the wrong button, the second practice block is faster, and in the third practice block the task is even faster and a red TOO SLOW is added above the

stimulus if participants won’t respond in 800 ms. A mistake cannot be corrected. Participants have to do at least 50% of the targets correctly, have a mean reaction time below 800 ms, and less than 20% of the trials can have a reaction time below 150 ms to move on to the next practice block, or the complete test. The order of the items is semi-blocked to make sure each item appears equally often. All items and bags are placed in an array and shown twice. Another variable will decide if they see the item as word or picture. If the item was a word the first time, it will be a picture the second time and vice versa.

Before the actual test we repeated one last time that participants were required to respond with no to all stimuli apart from the target stimuli, even if they recognize the items. In the real CIT

participants had to respond to 240 items, each bag and item was shown 10 times as words and 10 times as picture. The inter-stimulus interval was either 250, 500, or 750 ms to avoid getting a rhythm in answering to the stimuli.

A transition page was used to introduce the beginning of a second test.

aIAT

The aIAT was similar to the one in Experiments 1 and 2. We used positive sentences and categories, since we found a detrimental effect of using negative sentences in Experiment 2. The sentences were

(21)

adjusted to the specific bag and item we selected to be the probes in the CIT (which were stolen in the first pages of the experiment). Participants had to press “A” for the category on the right and “L” for the category on the left. This way the keys cannot adopt the meaning given to them by

performing the CIT. The sentences we used can be found in table 1.

Statistics

For the CIT we first removed trials with reaction times quicker than 150 ms, slower than 800 ms or with an incorrect answer. The d-score was calculated by looking at the reaction times for probe items and irrelevant items for correct trials. We calculated it by (Mprobe – Mirrelevant) / SDprobe+irrelevant, similar to Kleinberg and Verschuere (2015). This score was used for further analysis.

For the aIAT the same equations as in Experiments 1 and 2 were used to calculate the D-scores.

Manipulations

We randomly picked the bag and item participants saw for probes (stolen items) and targets (used in the CIT). We selected one of six possible bags and one of six possible items for each participant, twice, so our findings cannot be due to the combination of stolen bag and item. The probes were also used to make the sentences in the aIAT more specific to the mock crime. Another manipulation check used in the aIAT was assigning the participants to one of two block orders, similar as in Experiment 1 and Experiment 2.

Pilots

We did several pilots with sample sizes varying between 32 and 63 to (1) assure that guilty

participants remembered the crime items, (2) assure participants cannot use the translation function of their browser anymore to avoid losing more participants, and (3) adding warnings to assure participants did not skip blocks in the aIAT.

(22)

Results and discussion CIT

We did a 3 (knowledge group: naïve innocent, informed innocent, guilty) x 2 (stimulus: probes versus irrelevant) ANOVA. We found a significant different d-score for condition F(2,361) = 3.21, p < 0.05, f = 0.09 and for stimulus F(1,361) = 71.10, p < 0.001, f = 0.44, as well as an interaction effect of condition and stimulus F(2,361) = 22.36, p < 0.001, f = 0.24. This last effect is most important, since it means the response times on probes is different for guilty participants compared to innocent participants.

We also performed t-tests for all three conditions on reaction times for probe items versus reaction times for irrelevant items to find in which conditions there is a difference. For naïve innocents we found no difference between these two reaction times t(128)=-0.36, p = 0.72, dwithin = -0.03. This was expected because they did not know the probes and thus will not respond differently to them. For informed innocents we found a significant difference between the reaction times t(117)=6.09, p < 0.001, dwithin = 0.56. This was also expected since the CIT tests knowledge and these participants know which bag and item are stolen. For the guilty participants we also found a significant difference in reaction times t(116)=7.63, p<0.001, dwithin = 0.71, as expected.

These findings are similar to the d-scores, which reflect if there is a difference between reaction times on probe items and irrelevant items. The d-scores can be found in table 4, a positive score means participants had longer reaction times on probes and thus had most likely knowledge about the crime.

We performed ROC analyses on the d-values of the three possible combinations of groups. For comparing naïve innocents with informed innocents we find ROCa = 0.67 (95% CI: 0.61 – 0.74). For naïve innocent compared to guilty we find ROCa = 0.73 (95% CI: 0.67 – 0.80). And for informed innocent compared to guilty we found ROCa = 0.57 (95% CI: 0.50 – 0.65).

(23)

aIAT

In a 3 (knowledge group: naïve innocent vs. informed innocent vs. guilty) x 2 (block order) ANOVA we found a significant effect for block order F(1,360) = 27.28, p < 0.001, f = 0.27 and condition F(1,360) = 66.06, p < 0.001, f = 0.43.3 There was no significant effect for the interaction between condition and block order (p = 0.24). This latter is the most important, for it tells us that the block order had no influence on the outcome of the groups.

We calculated the D-scores the same way as before. These can be found in table 4. A negative score means the test predicts the subject is innocent. The scores clearly show this effect for both naïve innocents and informed innocents. A positive D-score means the test predicts the subject is guilty, the table shows this for the guilty group.

We performed three ROC analyses to compare all the conditions with each other. The difference between naïve innocent participants and informed innocent participants can be described by ROCa = 0.65 (95% CI: 0.58 – 0.72). The reaction times between naïve innocent and guilty differ by ROCa = 0.80 (95% CI: 0.74 – 0.85). And the D-scores between informed innocent and guilty is ROCa = 0.69 (95% CI: 0.62 – 0.75).

Discussion

In this study we showed the first experiment combining the aIAT with the CIT to improve the validity of memory detection. When crime relevant information is leaked to innocent participants the CIT will show they have knowledge about the crime. The additional aIAT can be used to show where this information came from, or if they committed the crime or that information has been leaked. Our results confirm this theory and show that the aIAT can be used in addition to the CIT to help solve the leakage problem.

3

For the analysis of the aIAT we first did a t-test on block order, since we could exclude this variable from analysis in Experiments 1 and 2. The test for block order in the aIAT was significant, t(42) = 4.94, p < 0.001. Thus we had to include this in an ANOVA.

(24)

In the aIAT we should not have found a large effect for block order. We used the recommendations of Nosek et al. (2005) to use the double amount of trials in the block where the categories are reversed (in our case block 5). This should be enough to let people get used to the change in classifying the stimuli. Since Experiment 1 and Experiment 2 showed no effect for block order, we expected to find no effect in Experiment 3 as well. It is possible this can be due to fatigue, since this is the only experiment where participants made two tests.

The mock crime was found to be very hard to use in an online setting. When performed in a lab you would have a participant memorize the steps they need to take and then perform them without the page with instructions. An example would be to walk towards the lockers, open locker 21 with combination 3579, and take 5 euros from a green wallet. The questions can then be about the number of the locker, the combination to open the locker, the amount of money taken, and the color of the wallet. It is also possible to ask them first to tell what they did, to make sure they remember these facts. In our online setting, we asked for 2 facts, a specific bag and a specific item from that bag. We don’t want the task to take too long, since people need to finish the test in order for us to use their data, and we want the task to take an equal amount of time for all three groups.

While online research has advantages (i.e., larger samples, larger power, and a more diverse subject pool) there are also some limitations with online research. One limitation in our experiments can be that people know it is about lie research and thus might not pay enough attention to the items they need to steal. To minimize this effect, found in our pilot studies, we tried to keep them looking at the stolen items for as long as possible by adding questions next to it that can be answered while

examining the bag and item.

Further research with the combination of the aIAT and CIT is necessary, but the results of this first (web-based) combination of the tests are promising.

(25)

General discussion

In this paper we described three experiments with the online autobiographical Implicit Association Test (aIAT). We showed that the aIAT can be used as an online test, but shows less accuracy than the original experiment in the lab. Then, we confirmed, in a first direct comparison, that using negative sentences in the aIAT has detrimental effects and lead to classification on chance level, while using only affirmative sentences shows an accuracy of 81%. In the last experiment we solved the leakage problem by adding the aIAT to the CIT.

In every aIAT overall D-score per group we see a high standard deviation, this means that even though the mean score shows a good effect, there are also people with a lesser effect and wrong classifications. Even though we found a high accuracy rate of 81% in both Experiment 1 and the A/B method in Experiment 2, this also means there are still 19% of participants who are classified incorrect. The aIAT can be used to validate a group as being guilty or innocent, but is not yet sufficient on the individual level. This means the aIAT can only be used to give an indication of guilt, not a certainty. In the CIT we see a similar effect when looking at the standard deviations, leading to the same conclusion.

In percentage correct we see that the CIT leads to a good score in the guilty condition (77%), a little bit above chance in the naïve innocent condition (58%), and quite low in the informed innocent condition (31%). This shows the detrimental leakage effect the CIT has to try to cope with. Since only information that was not leaked can be used in the CIT, a lot of important question might not be examined. The aIAT improved the accuracy of the informed innocent to 64%, showing a possible solution for the leakage problem.

In accuracy the aIAT was best for naïve innocent, who do not need to hold back information (86%, even more than our accuracy rates in Experiment 1 and the condition A/B of Experiment 2). For participants who had information and were trying to hold it back, accuracy dropped to 64% (for both informed innocent and guilty participants). This is in line with previous findings on lying in the aIAT

(26)

(Verschuere et al., 2009), where classification on the group level is possible but accuracy rates drop. Only when giving an optimal faking strategy for the IAT, participants succeed in faking (Verschuere et al., 2009; Fiedler & Bluemke, 2005).

If the combination of these tests would be used in a web-based format in police investigations the most problematic will be to verify the person who is taking the test is actually the suspect. Therefore, it would be recommended to see this research as a validation of the reaction times-based CIT and aIAT combined, not as a reason to do this in an online setting. By performing it on a computer, and not from a distance, there will be someone who can verify the correct person is taking the tests, while still using reaction times to predict whether someone is guilty or innocent.

Future Research

To find if the CIT and aIAT can be combined for better validity more research needs to be done. It would be good to start with easier settings than the online mock crime, which have proven to work in an online setting for at least one of the two tests, since this might lead to higher accuracy rates. A suggestion would be to use the card paradigm described in Experiments 1 and 2 in this paper, or use the fake identity which have proven to work in the CIT (Kleinberg & Verschuere, 2015). These experiments should also be performed in an offline setting, in the lab, to confirm its validity.

Before using the tests together in the field, more research needs to be done towards faking in the tests. It is found that only the instructions to lie have a small effect, but still show the group effect. This is enough for a valid test when using groups, but not when testing individuals.

Conclusion

The aIAT can be used in online research towards group effect of autobiographical event if negative sentences are avoided. When adding the aIAT to the CIT it can help with solving the leakage problem, but more research is needed to validate our findings.

(27)

References

Agosta, S., Mega, A., & Sartori, G. (2011). Detrimental effects of using negative sentences in the autobiographical IAT. Acta Psychologica, 136(3), 269-275. doi:10.1016/j.actpsy.2010.05.011

Agosta, S., Pezzoli, P., & Sartori, G. (2013). How to detect deception in everyday life and the reasons underlying it. Applied Cognitive Psychology, 27, 256-262. doi: 10.1002/acp.2902

Agosta, S., & Sartori, G. (2013). The autobiographical IAT: a review. Frontiers in Psychology, 4. doi: 10.3389/fpsyg.2013.00519

Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J., Fiedler, K., ...

Wicherts, J. M. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27(2), 108-119. doi: 10.1002/per.1919

Bradley, M. T., Barefoot, C. A., & Arsenault, A. M. (2011). Leakage of information to innocent suspects. In B. Verschuere, G. Ben-Shakhar, & E. Meijers (Eds.), Memory detection: Theory and application of the Concealed Information Test (pp.187-199). Cambridge: Cambridge University Press.

Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., ... Van 't Veer, A. (2014). The replication recipe: What makes for a convincing replication?. Journal of

Experimental Social Psychology, 50, 217-224. doi:10.1016/j.jesp.2013.10.005

Capaldi, C. A. (2015, May). Graduating from undergrads: Are MTurk workers less attentive than undergraduate participants? Poster presented at the 4th Annual Psychology Outside the Box Conference, Ottawa, ON

De Houwer, J. (2006). What are implicit measures and why are we using them. In R. W. Wiers & A. W. Stacy (Eds.), The handbook of implicit cognition and addiction (pp. 11–28). Thousand Oaks, CA: Sage.

(28)

Fiedler, K., & Bluemke, M. (2005). Faking the IAT: Aided and unaided response control on the Implicit Association Test. Basic and Applied Social Psychology, 27(4), 307-316. doi:

0.1207/s15324834basp2704_3

Frost, P., Adie, M., Denomme, R., Lahaie, A., Sibley, A., & Smith, E. (2010). Application of the implicit association test to a study on deception. American Journal of Psychology, 123(2), 221-230. doi: 10.5406/amerjpsyc.123.2.0221

Gamer, M. (2011). Detecting concealed information using autonomic measures. In B. Verschuere, G. Ben-Shakhar, & E. Meijers (Eds.), Memory detection: Theory and application of the Concealed Information Test, (pp.27-45). Cambridge: Cambridge University Press.

Granhag, P. A., & Mac Giolla, E. (2014). Preventing Future Crimes: Identifying markers of true and false intent. European Psychologist, 19(3), 156-206. doi: 10.1027/1016-9040/a000202

Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and Social Psychology, 97, 17-41. doi: 10.1037/0022-3514.85.2.197

Greenwald, A. G., Poehlman, T. A., Uhlmann, E. L., & Banaji, M. R. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97(1), 17. doi: 10.1037/a0015575

Kim, D. (2003). Voluntary controllability of the implicit association test (IAT). Social Psychology Quarterly, 66, 83-96. http://www.jstor.org/stable/3090143

Kleinberg, B., & Verschuere, B. (2015). Memory detection 2.0: The first web-based memory detection test. PLoS ONE, 10(4): e0118715. doi:10.1371/journal.pone.0118715

(29)

Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: report of Board of Scientific Affairs' Advisory Group on the Conduct of Research on the Internet. American Psychologist, 59(2), 105. doi: 10.1037/0003-066X.59.2.105

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4:863.

doi: 10.3389/fpsyg.2013.00863

Marini, M., Agosta, S., Mazzoni, G., Dalla Barba, G., & Sartori, G. (2012). True and false DRM memories: differences detected with an implicit task. Frontiers in Psychology, 3.

doi: 10.3389/fpsyg.2012.00310

Nosek, B. A., Banaji, M., & Greenwald, A. G. (2002). Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics: Theory, Research, and Practice, 6(1), 101. doi: 10.1037/1089-2699.6.1.101

Nosek, B. A., Greenwald, A. G., & Banaji, M. R. (2005). Understanding and using the Implicit Association Test: II. Method variables and construct validity. Personality and Social Psychology Bulletin, 31(2), 166-180. doi: 10.1177/0146167204271418

Osugi, A. (2011). Daily application of the Concealed Information Test: Japan. In B. Verschuere, G. Ben-Shakhar, & E. Meijers (Eds.), Memory detection: Theory and application of the Concealed Information Test, (pp. 253-275). Cambridge: Cambridge University Press.

Sartori, G., Agosta, S., Zogmaister, C., Ferrara, S. D., & Castiello, U. (2008). How to accurately detect autobiographical events. Psychological Science, 19(8), 772-780. doi:

(30)

Vargo, E. J., & Petróczi, A. (2013). Detecting cocaine use? The autobiographical implicit association test (aIAT) produces false positives in a real-world setting. Substance abuse treatment, prevention, and policy, 8(1), 22. doi: 10.1016/j.drugalcdep.2014.10.008

Verschuere, B., & De Houwer, J. (2011). Detecting concealed information in less than a second: Response latency-based measures. In B. Verschuere, G. Ben-Shakhar, & E. Meijers (Eds.), Memory detection: Theory and application of the Concealed Information Test, (pp.46-62). Cambridge: Cambridge University Press.

Verschuere, B., Prati, V., & De Houwer, J. (2009). Cheating the lie detector faking in the autobiographical implicit association test. Psychological Science, 20(4), 410-413. doi: 10.1111/j.1467-9280.2009.02308.x

Verschuere, B., Suchotzki, K., & Debey, E. (2014). Detecting deception through reaction times. In P.A. Granhag, A. Vrij, B. Verschuere (Eds.), Detecting Deception: Current Challenges and Cognitive Approaches, (pp.269-291). doi: 10.1002/9781118510001.ch12

Walczyk, J. J., Roper, K. S., Seemann, E., & Humphrey, A. M. (2003). Cognitive mechanisms underlying lying to questions: Response time as a cue to deception. Applied Cognitive

(31)

Figures and Tables

Table 1: Stimuli. The stimuli used in the three experiments. For “True” participants picked five sentences that were true for them at that moment; for “False” they picked five sentences that were false for them at that moment. These stimuli were used in all three experiments. The “guilty” and “innocent” categories differ for the experiments. In experiment 1 the categories, and accompanying stimuli, “I turned over the four of diamonds” and “I turned over the seven of spades” are used. In experiment 2 these categories, and accompanying stimuli, are used for one group, but the other group got the categories, and accompanying stimuli, “I turned over the four of diamonds” and “I didn’t turn over the four of diamonds”. In experiment 3 the categories, and accompanying stimuli, “I took the BAG and ITEM” and “I read The Amsterdam Chronicle” were used. In this first category the sentences and category were changed according to the bag and item that were stolen (which was randomized for each participant), these words replaced the words BAG and ITEM in the sentences (e.g., I took the green bag).

Category Stimuli

True I am sitting on a chair

I am sitting on a couch

I am looking at a computer screen We live in the year 2014

I am taking a test

There is a keyboard in front of me I am dressed

I am inside I am online

I am participating in a research project

False I am playing football

I am climbing a mountain We live in the year 2008 I am taking a bath I am watching television I am sitting on a beach I am walking in the forest I am wearing pajamas I am outside

I am cooking dinner

I turned over the four of diamonds I clicked on the four of diamonds I got the four of diamonds I saw the four of diamonds

I turned over the four of diamonds I took the four of diamonds I turned over the seven of spades I clicked on the seven of spades

I got the seven of spades I saw the seven of spades

I turned over the seven of spades I took the seven of spades

I didn’t turn over the four of diamonds I didn’t click on the four of diamonds I didn’t get the four of diamonds I didn’t see the four of diamonds I didn’t turn over the four of diamonds I didn’t take the four of diamonds I stole the BAG and ITEM I took the BAG

I stole the BAG I have taken the BAG

(32)

I have stolen the ITEM I took the ITEM

I read The Amsterdam Chronicle I read The Amsterdam Chronicle I read an article

I was reading The Amsterdam Chronicle I was reading the article

I saw the article

Table 2: Block order of the aIAT. The order in which the blocks occurred and their categories are printed below. For the categories on the left the stimuli had to be categorized with the “E” button for Experiment 1 and Experiment 2 and with the “A” button for Experiment 3. For the categories on the right the stimuli had to be categorized with the “I” button for Experiment 1 and Experiment 2 and the “L” button for experiment 3. These keys are universally on the right or left side of the keyboard and thus associated with where the categories are displayed on the screen .In place of “guilty” and “innocent” the categories of table 1 are printed. The categories were counterbalanced so that for half of the participants the categories of blocks 2-4 and 5-7 were interchanged; the number of trials per block remained the same (thus, more trials in block 5).

Block number

Left Right Number of

trials

1 True False 20

2 Guilty Innocent 20

3 True + Guilty False + Innocent 20

4 True + Guilty False + Innocent 40

5 Innocent Guilty 40

6 True + Innocent False + Guilty 20

7 True + Innocent False + Guilty 40

Table 3: D-scores and ROCa values of Experiment 1 and Experiment 2. A negative D-score indicates the test predicts that the participants have turned over the four of diamonds; a positive D-score indicates the test predicts that the

participants have turned over the seven of spades. The ROCa value represents how different one group’s D-scores are from the other group and how much these findings differ from the chance level of .50. The categories on the top represent the card that was turned over. The rows represent the different experiments and experimental settings. The scores are represented here as the mean and standard deviation per group. The ROCa Confidence Interval (CI) is the scored from the ROC-analysis comparing the two groups and is thus presented per experimental setting.

The four of Diamonds The seven of Spades ROCa + 95% CI

Experiment 1 -0.42 (0.49) 0.35 (0.48) 0.88 (0.85 – 0.91)

Experiment 2: A/B -0.44 (0.44) 0.41 (0.49) 0.89 (0.84 – 0.94) Experiment 2: A/nonA -0.65 (0.36) -0.51 (0.43) 0.60 (0.52 – 0.68)

Table 4: Scores for both CIT and aIAT of Experiment 3. The scores are reported as the mean per group with their standard deviations. In the CIT d-scores a larger score means a more different reaction time on the probes compared to the irrelevant items. In the aIAT D-scores a negative score means the participants are classified as being innocent, while a positive score means the participants are classified as being guilty.

Naïve innocent Informed innocent Guilty

d-score CIT -0.01 (0.24) 0.15 (0.27) 0.23 (0.32)

(33)

Appendix A: Screenshots for the knowledge groups of Experiment 3

Naïve innocent:

(34)
(35)
(36)

Referenties

GERELATEERDE DOCUMENTEN

Vir die verwesonliking van die ideael van In verengelste staatsdiens het Cradock in die IIGrammar School&#34; die aangewese middel gesien. In daardie skool

Het concept oordeel van de commissie is dat bij de behandeling van relapsing remitting multiple sclerose, teriflunomide een therapeutisch gelijke waarde heeft ten opzichte van

This is a test of the numberedblock style packcage, which is specially de- signed to produce sequentially numbered BLOCKS of code (note the individual code lines are not numbered,

Vervolgens wordt de hieruit volgende rentetermijnstructuur bestudeerd en wordt er bestudeerd hoe goed de in-sample voorspellingen kloppen voor zowel het AFNS(3)-model als

Echter, de definitie van prenatale gehechtheid zoals is omschreven door de ontwikkelaars van het meetinstrument (Van Bakel et al., 2013) als “de liefdevolle sensitieve band die

system suffered of a high plasma background radiation signal caused by strong AI and All lines that reached the output slit of the monochromator by multiple

Distributed algorithms allow wireless acoustic sensor net- works (WASNs) to divide the computational load of signal processing tasks, such as speech enhancement, among the

As a consequence of the redundancy of information on the web, we assume that a instance - pattern - instance phrase will most often express the corresponding relation in the