• No results found

Development of a survey-based methodology for measuring the understandability of forensic reports in Finland - a pilot study

N/A
N/A
Protected

Academic year: 2021

Share "Development of a survey-based methodology for measuring the understandability of forensic reports in Finland - a pilot study"

Copied!
37
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Title page

Title: Development of a survey-based methodology for measuring the understandability of forensic reports in Finland - a pilot study

Name: Verna Kulomaa ID: 12402230

EC: 36

Duration of research project: 27.1.2020-31.7.2020 Degree: MSc in Forensic Science

University: University of Amsterdam Supervisor: MSc Tuomas Salonen Examiner: Prof. dr. Marjan Sjerps

Name of research institute: National Bureau of Investigation Forensic Laboratory Finland Date of Submission: 14.8.2020

(2)

Abstract

A pilot study of a survey-based methodology for measuring understandability of forensic reports from the National Bureau of Investigation Forensic Laboratory of Finland was completed with promising results. As the goal of the study was to measure the understandability of reports themselves rather than an aspect of the reports (such as weak versus strong conclusions), variability was intentionally increased by varying the type of findings (DNA, handwriting), conclusion style (LR based, currently existing styles), strength of the findings (weak, moderate, strong), the order of appearance, and participants in different occupations (prosecutors (n=17), lead and tactical investigators (n=8), technical investigators (n=7)). The main section of the survey included two types of questions: Likert-scale questions, and true-false questions. Likert-scale questions were used to measure the belief-change resulting from reading a report by subtracting the prior belief of the participant from the posterior belief. True-false questions were used to measure the correct understanding of the report. Together these were used to calculate the understandability indices, which were used to assess the differences between conditions and to calculate success rates. The reliability of the survey was assessed with Cronbach’s alpha. The over-all alpha for the true-false questions was α=0.65, however lower alpha values were observed when assessing the true-false questions concerning the handwriting report. Misunderstandings seem to occur for a variety of reasons, for example, because of the strength of the findings, the type of findings, and the participant’s expectations. For future development of the survey, it is recommended to add another measurement system beside the Likert-scale to measure the belief-change and true-false questions with inverse questions. The knowledge from the pilot study gives direct pointers for the in development of a survey-based tool for measuring report understandability in Finland and should provide a solid foundation for the next survey.

Key words: Survey-based, Understandability, Likelihood Ratio, Evaluative Reporting, Strength of Evidence, Evidence interpretation

(3)

Table of contents

1. Introduction ... 1

2. Materials & Methods ... 4

2.1. Materials ... 4

2.1.1. Design of the fictional reports ... 4

2.1.2. Design of the likelihood ratio scale ... 5

2.2. Methods ... 6

2.2.1. Participants ... 6

2.2.2. Technical implementation of the survey ... 7

2.2.3. Design of survey ... 8

2.2.4. Data processing ... 10

3. Results ... 12

3.1. Reliability assessment of the survey ... 12

3.2. The understandability indices ... 12

3.3. A closer look at the answers ... 14

3.3.1. A closer look at the answers: belief-change ... 14

3.3.2. A closer look at the answers: true-false questions before the conclusion ... 14

3.3.3. A closer look at the answers: true-false questions after the conclusion ... 15

3.3.4. A closer look at the answers: warm-up questions ... 15

3.3.5. A closer look at the answers: comments from the participants ... 16

4. Discussion ... 17 5. Conclusion ... 21 Acknowledgements ... 22 Bibliography ... 23 List of figures ... 25 List of tables ... 25 Appendices ... 26

Appendix 1. Current style DNA forensic report moderate strength conclusion ... 26

Appendix 2. Current style handwriting forensic report moderate strength conclusion ... 28

Appendix 3. LR style DNA forensic report moderate strength conclusion ... 30

Appendix 4. LR style handwriting forensic report moderate strength conclusion ... 32

Appendix 5. The effect of the strength of the findings on the success rate in LR and current style in DNA and handwriting ... 34

(4)

1

1. Introduction

Forensic science is the science where the sole purpose is to aid the legal system. As such it is critical that evidence provided by forensic science is reported accurately and in an understandable way. This poses a challenges especially because as Bali et al. state, forensic scientists must communicate complex scientific principles to legal decision-makers who might not have had scientific training (1). One way to communicate scientific results to legal decision-makers is through evaluative forensic reports. An evaluative report provides an assessment of the findings in the context of alleged circumstances (2). One way to communicate this strength is through the concept of likelihood ratio (LR), which is the ratio of the probability of the same observation conditioned upon two exclusive events (3). The European Network of Forensic Science Institutes (ENFSI) recommends using LR, for evaluative reports as it is considered logically sound approach (2). Despite these recommendations, the use of LR there has long been a debate on whether it should be implemented in reporting (4), (5), (6), (7) and if so, how it should be utilized (8), (9), (10).

The study by Bali et al. reveals that despite all the discussions and recommendations to use logically sound reporting methods, categorical conclusions (e.g. match, no match, inconclusive) still tend to be the most used ones in forensic institutes (1). Hence, there is a discrepancy between the literature and what is used in practice. Nevertheless, some countries such as Sweden (11) and the Netherlands (12) have already adopted using LR. Currently in the National Bureau of Investigation Forensic Laboratory (NBI-FL) in Finland, while LR is used when reporting on multiple donor (≥2) deoxyribonucleic acid (DNA) profiles, it is not used in other instances. For example, single source DNA results are reported as a random match probability (RMP) e.g. “A DNA-profile matching Ms. X’s DNA-profile was found in the sample. The probability of such a match to another, unrelated, person in the Finnish population is estimated to be less than 1 / 1 000 000” and handwriting results are reported as a source probability e.g. “It is extremely probable that the suspect wrote the signature”.

Due to the controversy surrounding LRs, studies have been conducted on how forensic information is understood when it is presented through LR, either in different conclusion styles (numeral LR, verbal LR or visual representation of LR), or against other presentation methods such as RMP, or even between different professional groups. Surveys are an often used tool to gather knowledge on how the reports are understood (12), (13), (14), (15), (16), (17), (18).

The recently published paper by Van Straalen et al. discusses how the presentation style of the findings affects the interpretation (13). They found that the criminal justice professionals (judges, forensic scientist, and lawyers) (n=269) best understood weak findings when they were presented as a categorical conclusions and strong findings when they were presented as a numerical LR. In their research they compared the alleged comprehension to actual comprehension, with participants reporting high comprehension scoring lowest in actual comprehension and vice versa.

Another study in the Netherlands by De Keijser & Elffers also used surveys to study understanding of forensic reports by criminal justice professionals (n=285) (12). In their study, the focus was on the differences in understanding between forensic experts, judges, and defence lawyers. De Keijser and Elffers found that understanding of the forensic reports which used LR in their conclusions was poor. They attribute the prosecutor’s fallacy to be the main reason for the low understanding scores. The prosecutor’s fallacy is when the conditional probability is transposed. They found that forensic experts often mistook prosecutor’s fallacy as a correct interpretation as well.

A paper by Martire et al. investigates the effect on interpretation of LR when it is presented in different formats for lay people (n=404) in the United States (14). Martire et al. found that weak conclusions are

(5)

2

understood poorly, and that weak evidence effect took place mostly when presenting LR in a verbal format. Weak evidence effect is when weak support for one proposition is wrongly interpreted as supporting the alternate proposition. Numerical presentation of LR produced belief-changes which aligned the most with the forensic expert’s belief-change. Belief-change being the shift from prior belief to the posterior belief after observing the LR. In their conclusion they recommend using numerical LRs to report conclusions and question whether verbal LR should be used at all.

Another study that included American lay people (n=541), was conducted by Thompson & Newman (15). They used a survey to study how different presentation methods of forensic findings in DNA and shoeprint reports, such as RMP, LR, and verbal equivalents of LR, would affect how the evidence was understood by lay people. They found that verdicts of guilt were sensitive to the strength of the evidence when DNA evidence was in question. However, for shoeprint evidence, the verdict was only sensitive to the strength of the evidence when RMP was used. They concluded that fallacious interpretation of the conclusions was common and associated with the weight that the participant had given to the evidence. For example, participants that had estimated a high probability of guilt were more susceptible to the source probability error. Source probability error is when a low RMP is interpreted as a high probability that items have a common source. The results of their study indicate that perceptions of forensic science evidence are influenced by expectations associated with different type of evidence, prior beliefs, and by expert testimony. They also suggest that the best way to report on forensic findings might vary across disciplines.

Already from these studies it can be observed that there are different conclusions on how to report forensic findings most efficiently. However, it needs to be kept in mind that the surveys are different and that they have used participants from different countries. Often a survey created in one country will not be directly applicable in another one.

The NBI-FL has interest in following the ENFSI’s recommendation of the use of LR in evaluative reporting, however, it is not known which way of presenting LR will be optimal for Finland. Additionally, there is limited knowledge to which degree the current style reports are understood by lawyers, judges, and investigators and how those would compare against LR based reports. The above-mentioned issues need to be investigated on to assess the applicability of LR based reporting in Finland.

This study seeks to help gain insight on this issue as part of the LYSTI1 project initiated in NBI-FL. The aim of

this specific project is to create a survey-based methodology to measure the understandability of forensic reports in Finland. This type of study on understandability of forensic reports has not been done previously in Finland to the author’s knowledge.

Understandability is defined as two separate concepts. The understanding of the strength that is to be attached to be findings and the understanding of the intent of the report. In forensic context, it is crucial that the strength of the findings, is understood correctly as subsequent legal decisions can depend on the reports. The intent of the report can be misunderstood in many ways. Often the intent of a forensic evaluative report is to help answer a very specific question regarding some event. Misunderstandings of the intent can include extension of the results to another scenario, ignoring alternative explanations, ignoring uncertainty, weak evidence effect, transposing conditional and the defence fallacy.

To investigate the quantification of understandability as defined, a pilot survey was constructed. Two forensic fields were chosen for this survey, namely DNA and handwriting. These fields were chosen because the DNA and handwriting experts in NBI-FL already are familiar with LR style conclusions and could aid in the construction of fictional forensic evaluative reports. Another reason for why these fields were chosen is 1 Lausuntojen Ymmärrettävyyden Selvittäminen Tilastollisesti (Investigating the Understandability of Statements Statistically)

(6)

3

because the current style report conclusions are quite different for DNA and handwriting, and there is interest to see whether the understanding of the reports would be affected by said differences. Fictional reports were created for DNA and handwriting with LR style conclusions and current style conclusions. Reports were simplified and consisted of one page of report and one page of appendix, which gives additional information on how to interpret the conclusion. Appendices were the same for DNA and handwriting in LR style reports. However, with current style reports the appendices were different, as each field has their own appendices and the existing ones were used for the fictional reports. Conclusions were given in three different strengths which to associate with the findings which were the same for DNA and handwriting in both LR and current style reports. Hence, in the LR terms the strengths were 10, 100 and 100 000 and the equivalents of these were reported in the current style reports. This structure was chosen for this pilot study in order to explore the possibilities for quantification of understandability of forensic reports. Cronbach’s alpha (19) was used to assess the suitability of the questionnaire to measure the intended latent variables. These latent variables being the understanding of the strength which to attach with the findings and the understanding of the intent of the report. To assess the understandability of the reports an index of understanding was created for this study. The correct answers were predetermined for each question and the index illustrates how close to the correct answers the participants were.

Following this introduction, the materials and methods are described. The fictional reports that were created for this study are be described in the materials section, and the survey in which the fictional reports were used are described in the method section. This is followed by the results where the survey results are assessed with Cronbach’s alpha, index of understanding, and individual questions. In the discussion, the survey results are presented and compared to other similar research and recommendations are provided for the future. Finally, in conclusions the observations from the study are summarized.

(7)

4

2. Materials & Methods

In this chapter, the materials and methods that were created for the study and how they were used are described. As there is no globally standardized guideline for LR reporting and NBI-FL does not have a guideline for LR reporting, a LR reporting structure was formulated for this study together with the handwriting and DNA experts. Moreover, as all the reports used in this study were required to be fictional while also reflecting the repots currently used in NBI-FL, they were created for this study together with the experts as well. Important aspects, which are the survey design, sample selection and number of participants, are discussed in the methods section.

2.1. Materials

2.1.1. Design of the fictional reports

For the purposes of the study, it was necessary to produce fictional reports which would be precise and informative. The fictional reports were formulated to resemble real ones as closely as possible while omitting purely technical information. The forensic reports were constructed together with the DNA and handwriting experts from NBI-FL through in-person and online workshops. These experts were already familiar with LR style forensic reporting, and thus were able to assist with choosing vocabulary for expressing the LR style report in Finnish.

Currently in NBI-FL DNA conclusions are reported with RMP and handwriting conclusions with source probability. A typical DNA conclusion could be formulated as “A DNA-profile matching Ms. X’s DNA-profile was found in the sample. The probability of such a match to another, unrelated, person in the Finnish population is estimated to be less than 1 / 1 000 000” whereas a typical handwriting conclusion could be formulated as “It is extremely probable that the suspect wrote the signature”.

For both of the current style fictional reports, reports consisted of a reference number header, the request from the police, a sample list and sample description, analysis results, conclusions, disclaimer and appendix. However, the presentations of their conclusions were different, as well as the content of their appendices. Conclusions differed in the way explained above. For handwriting, the appendix consisted of a conclusion scale to explain the strength which is to be associated with the conclusion. For DNA, the appendix consisted of an explanation of the terms used in the report. A fictional DNA report written in the current style can be found in Appendix 1 and a fictional handwriting report written in the current style can be found in Appendix 2. The LR based fictional report structure was based on the current versions. However, the LR report also included sections called “propositions” and “likelihood ratio”. Additionally, the disclaimers and the conclusion were modified accordingly, and a likelihood ratio scale was added to the appendix.

The Likelihood ratio section in the report was chosen to be presented with the probabilities of observing the findings under each proposition and then the resulting ratio was presented. This was done to show clearly where the resulting LR comes from. The LR section from the fictional DNA report can be translated as follows: “Based on the established DNA STR frequencies in the Finnish population, the probability of observing these

findings if H1 is true is 1. Likewise, the probability of observing these findings if H2 is true is 1/100 000. Therefore, the likelihood ratio of 100 000 is obtained as a ratio of the above probabilities.

Likelihood ratio scale can be found in the appendix of this report.”

Following the LR section, there was immediately the conclusion which was presented as the verbal equivalent of the LR. It can be translated as follows:

(8)

5

The probabilities determined for the propositions can be influenced by other factors that have risen during the investigation. However, without the information of these factors their effect cannot be estimated by the forensic laboratory. Likelihood ratio scale can be found in the appendix of this report.”

The conclusion of the report was phrased corresponding to the LR scale. A disclaimer was also added that if there is information related to the scenario, the Forensic Laboratory does not know about, they cannot consider that in their analysis.

For all the LR reports, an “additional information” section was added after the conclusion. It contained an explanation of where the data comes from to assign the probabilities for observations under both events. The section contained an explanation of what LR means and how it can be used. A fictional DNA LR style report can be found in Appendix 3 and a fictional handwriting LR style report can be found in Appendix 5.

2.1.2. Design of the likelihood ratio scale

All the LR reports, regardless of sample type, had the same appendix, which contained the likelihood ratio scale. The LR scale was inspired by the scale that is described extensively by Nordgaard et al. (11), while still taking inspiration from the ENFSI guidelines (2). As there are many ways how the LR could be presented, inspiration was taken from research and psychological studies on how probabilistic information is processed. Research has shown that when receiving probabilistic information people prefer numbers, whereas when expressing probabilistic information verbal expressions are preferred (20), thus numeral information was presented first which followed with the verbal equivalent of the likelihood ratio. They were presented in this order, as receiving numeral information before the verbal information leads to more accurate interpretations (20). Fractions were used in the scale as they are comprehended better that decimals (21). The verbal equivalent was presented always in the same direction i.e. what is the relation of H1 to H2. This differs from the ENFSI guideline, however, this was done to be consistent and not to require the reader to recognise the change which is commonly done when the findings provide support for H2. The wording differs as well, because the expressions are translated from Finnish. In Finnish it was formulated in a manner which highlights the relationship between H and H2. The LR scale contained a visual aid to indicate which LR supports which propositions. The translated (from Finnish to English) LR scale can be seen in Fig. 1.

(9)

6

Figure 1 Likelihood ratio scale, which was used as an appendix for LR reports. This scale was originally written in Finnish but was freely translated to English for illustration.

2.2. Methods

2.2.1. Participants

The participants of the survey were selected among prosecutors, lead investigators, technical investigators, and tactical investigators. All these professionals utilize the forensic reports in their work. Tactical investigators and lead investigators were grouped together as they use the reports similarly in their work. Participants’ respective organizations were contacted prior to the questionnaire to request the email addresses of those willing to participate. Once the questionnaire was ready to be sent, an email with the individual participation link was sent to 17 technical investigators, 21 prosecutors and 11 tactical investigators/lead investigators. Of these groups, 7 technical investigators, 17 prosecutors and 8 tactical investigators/lead investigators participated in the survey. Of the 32 participants that replied, 59% reported to identify as female and 41% as male. The average age of the participants was 43 (SD=10). Table 1 presents the demographic information of the participants in each group.

(10)

7 Table 1 Demographic information of the participants.

Occupation

Prosecutor (n=17)/Lead & Tactical

investigator (n=8)/Technical investigator (n=7) *

Gender Woman/Man Age (SD) Group 1 67%/11%/22% 67%/33% 46 (9) Group 2 57%/29%/14% 57%/43% 42 (10) Group 3 43%/43%/14% 43%/57% 44 (12) Group 4 44%/44%/11% 43%/57% 40 (9) Total 53%/25%/22% 59%/41% 43 (10)

* One participant reported their occupation as ‘other’, however their line of work aligns with the technical investigator and thus they were considered a part of the technical investigator group during data processing.

2.2.2. Technical implementation of the survey

The fictional reports were uploaded to a survey administration website Webropol. An email was written on the website where context and purpose of the survey was elaborated on. As the randomization of the sections within a questionnaire was not possible with Webropol, the questionnaire was divided into 4 questionnaires where a participant would receive DNA reports in ether LR or current style and handwriting reports in either LR or current style. The style of report was always different between the different fields. For example, if a participant received DNA reports in LR style, they would receive handwriting reports in the current style. The manual randomization was done to minimize the effect of the order in which participant receives the information. The participants were randomly picked for each order. However, they were chosen so that based on the occupation, the participants would be divided equally for each condition. The orders for each group can be seen in Fig. 2.

Figure 2 Report type and order variation for each participant group.

The email contained an explanation of how the data will be handled and a link to the survey. It was requested that participants fill in the questionnaire within two weeks. When the survey was opened through the provided link, the welcome page explained the structure of the survey and how the questions would be formatted. When answering the questionnaire, participant could not proceed to the next page until all questions were answered. However, the participant could move back in the survey if they wanted. Mid-way to the questionnaire the participants were given an opportunity to save their work and continue later if they wished to do so.

Group 1

CurrentHW DNA LR

Group 2

currentDNA HW LR

Group 3

DNA LR CurrentHW

Group 4

HW LR DNA

(11)

8

Participants who had not answered the questionnaire by the requested date were sent a reminder email with a request to complete the questionnaire within five days. In the end, the response rate was 65%. Data was downloaded from Webropol as four excel files, where the data was already anonymized by Webrobol.

2.2.3. Design of survey

First the questionnaire included a welcome page where the structure and how to answer the questions was explained. Before the main body of the questionnaire, the participants were asked to answer three warm-up questions where the Likert-scale was used for answering, so that the participants would get an understanding of how to use it in the main part of the questionnaire. The warm-up questions were about basic probabilities such as “What is the probability of throwing tails when a fair coin is tossed one time?”.

Next, the participant received a short case description which was not meant to be incriminating nor vindicating. For example, if participant received a DNA case description it would be as follows “There has been a pharmacy burglary during the night and at least 200 tablets of drugs that can be considered narcotics have been taken. Police has secured a DNA swipe from the surface of drawer and sent it to the NBI-FL where they found DNA on the sample. Police has apprehended Essi Esimerkki regarding to another case and have asked NBI-FL to compare the DNA profiles. NBI has made a forensic report about those DNA samples.” The case description was different for handwriting because the same participant answered both DNA and handwriting questions. The results from the previous description could affect the prior belief of the participant if they already had read a forensic conclusion regarding to the same case and suspect. After this the participant could read the beginning of the report which if it were in the current style, it contained a header, the request from the police, and a sample list. If the report were in the LR style, it contained a header, the request from the police, a sample list, and propositions. After reading the case description and the beginning of the report, the participant was asked to select on a seven step Likert-scale to which degree they believed the suspect to be the source of the sample collected from the crime scene. The participant was then presented with three statements about the intent of the report. These statements can be seen in Table 2. They were asked to choose which they thought be the correct answer form “true”, “false” or “I don’t know” for the statements.

Figure 3 A flowchart demonstrating the distribution of questions prior versus after seeing a conclusion.

Subsequently, the participant was asked to read a full report. Again, they were asked to select on the Likert-scale to which degree they believed the suspect to be the source of the sample. After they receive five statements regarding to the reasoning that can be made from the report and asked to select what they thought to be the correct answer from “true”, “false” or “I don’t know”. This same pattern repeated two more times for the same type of findings, but the strength of the conclusion was different each time. The distribution of the questions prior to seeing a conclusion and after seeing a conclusion can be seen in Fig. 3.

2 Cases & 2 types of findings

Prior to seeing a conclusion

Beginning of a report & 1 Likert-scale question and 3 T/F statements

After seeing a conclusion

Report 1 & 1 Likert-scale question and 5

T/F statements

Report 2 & 1 Likert-scale question and 5

T/F statements

Report 3 & 1 Likert-scale question and 5

(12)

9

Next, the participant was given the option to pause the questionnaire and resume it later if they wished to do so. After this they would continue to the next section where another case description and beginning of the report was shown. The same questions were asked as previously, however they were worded differently to fit the scene. The rest of the questionnaire was structured in the same way as for the other type of findings. After the main body of the questionnaire, the participant was asked to answer questions about age, gender, and occupation. They were also given an option to comment on the questionnaire if they wished to do so. A visualization of the structure of the questionnaire can be seen below in Fig. 4.

Likert-scale and multiple-choice questions were used in this survey. Likert-scale was an odd number scale as a “neutral” point was required (22). By neutral it is meant that there need to be a point in the Likert-scale where it is equally probable that the suspect left the trace or that the suspect did not leave the trace. There are differences in the opinion of which amount of steps is optimal (22), but 7-step scale was chosen so that it would represent the posterior odds logarithmically and would correspond to the LR scale if a neutral prior odd was chosen. The mid-point was intended to be neither incriminating nor vindicating, and each step would increase in a logarithmic manner. Therefore, 5=10-100, 6=100-10000, 7=10000-1000000. The steps from 3 to 1 were the inverse of 5 to 7. Likert-scale questions were presented prior to conclusions to get the prior probability of a participant and directly after each conclusion to assess the belief-change.

All the multiple-choice questions had the option of “true”, “false” and “I don’t know”. “I don’t know” option was added so that participants would not be required to randomly choose an answer if they did not know the correct answer. However, when the data was processed, “I don’t know” was considered as an incorrect answer. For these multiple-choice questions, the participant was presented a statement and asked to answer which option they thought to be the correct answer. The statements showed in DNA sections of the report can be seen in Table 2. The statements were similar in handwriting sections, but the statements were worded to fit the scene and the type of findings.

As the topic of forensic conclusions is commonly regarded challenging, especially when statistical information is included, all the questions were designed to be as straightforward as possible. Traditionally,

Figure 4 Visualization of the questionnaire structure. Example is given of the questionnaire group 1 answerred. HW stands for handwriting.

Welcome page

Warm-up questions

HW Case description and beginning of current style

report

1 Likert-scale question and 3 true-false statements

3 x HW full report in current style & after each report 1 Likert-scale question and 5 true-false statements

Option to pause the questionnaire and resume

later

DNA Case description and beginning of LR style report 1 Likert-scale question and 3

true-false statements

3 x DNA full report in LR style & after each report 1 Likert-scale question and 5 true-false

statements

Demographic questions and a comment box

(13)

10

when prosecutor’s fallacy or defence fallacy are tested for, the mathematics is shown behind the fallacious reasoning. However, for this study when statements with fallacious reasoning were made, they were verbally formulated instead of showing the numbers, because the goal was to create simple statements. The statements used in the report are given in Table 2. The statements were essentially the same for handwriting, but they were worded to fit the scenario and the report for handwriting. The statements after the conclusion were presented in the same order throughout all questionnaires and sections.

Table 2 True-false statements used in the questionnaire. These statements have been freely translated from Finnish to English.

Statement True False Prior/after the

conclusion

The report indicates whether the suspect has robbed the pharmacy. X Prior

The report indicates whether the DNA could be originating from the suspect. X Prior The report indicates whether the suspect has opened the drawer in the pharmacy. X Prior It is possible to get these results even though the suspect would not be the origin

of the sample.

X After

The conclusion shows that it is extremely improbable that the DNA could originate from anybody else except the suspect.

X After

The results from the report prove that the suspect was at the crime scene. X After The observations support the proposition that the DNA originates from the

suspect rather than the alternative propositions that the DNA originates from another unrelated person.

X After

According to the conclusion, there is some possibility that the DNA could originate from another person than the suspect. Thus, the report does not aid to answer the question whether the DNA originates from the suspect.

X After

The survey design was a mixture of between-subject and within-subject design. This was done because the goal was to assess the understandability of the report rather than other differences such as the effect of the different occupations. The aim was to increase the variability with the effects that are known to change perception of information. Such effects can be the order-effect, anchoring, and prior beliefs.

The aspects that were varied included the type of samples (DNA and handwriting), strength of the results (weak, moderate, strong) different occupations (prosecutors, lead and tactical investigators, technical investigators), and reporting style (current and LR). The order and type of report varied between the participants as can be seen in Fig. 2. The strength of the conclusions was presented always in the same order for each type of finding. For DNA, the strength of the findings was always ordered as: moderate, weak, strong. The order for handwriting was always: moderate, strong, weak. These orders were chosen to be different for DNA and handwriting to avoid anchoring the answers to the order.

2.2.4. Data processing

The answers to DNA and handwriting questions were used to calculate an index representing how well the report was understood by the participant. Both the answers from the Likert-scale questions and true-false questions were used to calculate the index. Student’s t-test was used to assess whether the difference in the resulting indices were statistically significant. Alpha of 0.05 was used for t-test calculations.

The understandability index consists of how close to the intended change was the participant’s belief-change and how many answers the got correct from the true-false statements. The belief belief-change was calculated by subtracting the prior belief from the posterior belief. For the index, it was calculated how close each answer was to the intended belief change. The furthest possible belief change, for all three conclusion strengths, from the correct answers could be 24. The most incorrect answer would be achieved if the participant would set their prior belief to be seven and then the three posterior beliefs as one. Then from the belief change, the correct answer is subtracted, which then shows how far from the correct answer the belief

(14)

11

change was. The equation to show the furthest belief-change from the “correct” answer is ((1-7)-1)+((1-7)-2)+((1-7)-3)=-24. To calculate the success rate of the belief change, the closeness to correct answers was divided by 24.

True-false answers were translated so that correct answers were replaced 0 and 1 for incorrect and correct answers, respectively. In total, the participant answered 36 true-false questions in the survey, but DNA and handwriting data was processed separately. Therefore, for each section the maximum amount of correct answers was 18. The success rate for true-false questions was calculated by dividing the amount of correct answers by 18.

Cronbach’s alpha (19) was used to assess the internal consistency of the test items. Internal consistency can be used to assess whether the questions reliably measure the intended latent variables. In this case, the Likert-scale questions were intended to measure the understanding of the strength of the findings, whereas the true-false questions were intended to measure the understanding of the intent of the report. Here 0.70 is considered acceptable as according to Cortina (19). The other assessments of Cronbach’s alpha are based from Cortina’s value. The alpha is considered minimally acceptable when it lies between 0.65 and 0.70, undesirable when it lies between 0.60 and 0.65 and unacceptable if it is below 0.60.

(15)

12

3. Results

3.1. Reliability assessment of the survey

Likert-scale question results were grouped together for DNA and handwriting for each participant for this calculation. They were combined as they are intended to measure the same latent variable., i.e. the understanding of the strength that is to be attached to be findings. The calculated Cronbach’s alpha resulted to α=0.77. This is considered acceptable.

The Cronbach’s alpha was calculated for true-false questions under the different conditions, such as, DNA reports, handwriting reports, LR style reports, current style reports, DNA LR report, DNA current reports, handwriting LR reports, handwriting current reports, as well as for all true-false questions combined. There were 18 true-false questions for each evidence type for each participant to answer. Results for the Cronbach’s alpha can be found in Table 3 below.

Table 3 All Cronbach’s alpha results, table includes the number of items and number of participants in each condition.

Questions considered α no. of items no. of participants

All Likert-scale questions 0.77 6 32

Handwriting, current true false questions 0.49 18 16

Handwriting, LR true false questions 0.44 18 16

DNA, current true false questions 0.60 18 16

DNA, LR true false questions 0.72 18 16

Handwriting, all true-false questions 0.37 18 32

DNA, all true-false questions 0.67 18 32

LR, all true false questions 0.57 18 32

Current, all true false questions 0.54 18 32

All true-false questions 0.65 36 32

For the Likert-scale questions and DNA LR true-false questions the alpha exceeds 0.70 which is acceptable. This shows that with the context of DNA conclusions presented with LR and the Likert-scale questions, the answers are internally consistent. When DNA answers are combined, the alpha is 0.67 and if all true-false answers are added together, the resulting alpha is 0.65, which are minimally acceptable. DNA current falls to the 0.60, which can be considered undesirable, but the other results fall below that which can be considered unacceptable.

3.2. The understandability indices

The understandability indices were calculated according to the description given before. The maximum score from answering all the questions correct in the survey would be 84. For each type of finding (DNA, handwriting) the maximum score would be 42. Data is presented with considering different variables. Comparing the indices which can be seen in Table 4, for all LR style reports (M=33.34, SD=4.69) and all current style reports (M=32.38, SD=4.46) shows that there is no statistically significant difference t(31)=0.86, p=0.39. Some differences could be observed with other variables as described below.

(16)

13

Table 4 Index displayed comparing LR style reports and current style reports. Mean, standard deviation, median and standard error of the mean are given.

LR Current

Mean 33.34 32.38

SD 4.69 4.46

Median 34.00 33.50

SE 0.83 0.79

Overall, handwriting reports, regardless of the presentation style, have higher indices than the DNA reports. These indices can be seen in Table 5. However, the difference in DNA LR (M=33.12, SD=6.08) and handwriting LR (M=33.56, SD=2.92) indices is not statistically significant t(22)=0.26, p=0.80. Whereas the difference in DNA current (M=30.50, SD=4.87) and handwriting current (M=34.25, SD=3.15) indices is statistically significant t(30)=2.59, p=0.01. The mean index is the highest for handwriting when it was presented in the current style.

Table 5 Index displayed through type of findings and conclusion style. Mean, standard deviation, median and standard error of the mean are given.

DNA Handwriting LR Current LR Current Mean 33.13 30.50 33.56 34.25 SD 6.08 4.87 2.92 3.15 Median 35.00 31.00 34.00 34.50 SE 1.52 1.22 0.73 0.79

The index values for DNA reports were found to be sensitive to the order in which the DNA report was presented. The results for the indices when the order is taken into account can be seen in Table 6. The difference with DNA LR presented first (M=29.57, SD=6.27) and DNA LR presented second (M=35.89,

SD=4.48) is statistically significant t(14)2.36, p=0.03, whereas, the difference with DNA current presented

first (M=28.14, SD=4.22) and DNA current presented second (M=32.33, SD=4.74) is not statistically significant

t(14)=1.84, p=0.08. Handwriting mean index stays relatively similar within a presentation style regardless

whether it was presented first or second. The difference between handwriting LR when presented first (M=34.22, SD=1.20) and handwriting LR when presented second (M=32.71, SD=4.23) is not statistically significant t(7)=0.91, p=0.39 and the difference between handwriting current when presented first (M=34.33,

SD=3.43) and handwriting current when presented second (M=34.14, SD=3.02) is also not statistically

significant t(14)=0.12, p=0.91.

Table 6 Index displayed through type of findings, conclusion style, and order presented in the questionnaire. Mean, standard deviation, median and standard error of the mean are given.

DNA Handwriting LR Current LR Current 1st 2nd 1st 2nd 1st 2nd 1st 2nd Mean 29.57 35.89 28.14 32.33 34.22 32.71 34.33 34.14 SD 6.27 4.48 4.22 4.74 1.20 4.23 3.43 3.02 Median 31.00 37.00 29.00 33.00 34.00 32.00 35.00 34.00 SE 2.37 1.49 1.60 1.58 0.40 1.60 1.14 1.14

(17)

14

One of the reasons for the lower index when DNA reports are presented at first is that participants have set their prior beliefs to the extremes of the Likert-scale. For example, in cases when DNA LR was presented first, three participants reported their prior beliefs in the extremes, but in the same questionnaire, only one of them reported extremes as their prior for the handwriting scenario.

Upon closer inspection, the highest index values were found in cases where DNA LR conclusions were presented in the second half of the questionnaire. These participants answered consistently well for the true-false questions which brought their average higher. These individuals got the highest amount of correct answers overall (84%).

Besides the DNA LR presented second, the handwriting reports produced better scores than DNA equivalents. It should be noted, that, although the index is higher overall for handwriting reports, the Cronbach’s alpha is much lower for handwriting cases than for DNA cases. This would mean that questions concerning DNA reports are consistently misunderstood, whereas in the case of handwriting questions the participants contradict themselves more, but at the same time answer more questions correctly in average.

3.3. A closer look at the answers

3.3.1. A closer look at the answers: belief-change

When looking at all the belief change answers for each group, although there are differences between individuals, the averages always show the same general pattern. With the weak conclusion resulting into the lowest average in belief-change, the moderate conclusion to the belief-change average in the middle and strongest conclusion to the highest belief-change average within the section of the questionnaires.

Most of the participants set their prior belief to be four, which represents an answer which is neither incriminating nor vindicating. However, when it came to DNA reports, more individuals selected either seven or one as their answer. There were in total nine cases where extreme prior beliefs were chosen for DNA reports whereas there was only one such case for handwriting prior beliefs.

Some curious behavior was noted in the answers of a few of the participants. For example, a participant who answered questions about DNA report with current style conclusions, selected neutral as the prior, seven as their posterior for moderate strength conclusion, but then later for strong conclusion they selected five. Another participant when answering questions about DNA report with LR style conclusions selected seven for each Likert-scale question. Before receiving any conclusion, they were already certain that the sample originated from the suspect, and it did not change even when the strength of the findings changed.

For most of the other participants who chose seven as their prior, they did change their Likert-scale answers and related them to the strength of the findings. For example, moderate=4, weak=3, strong=6. This pattern occurs both when DNA is presented first or second.

3.3.2. A closer look at the answers: true-false questions before the conclusion

The false statements were understood well by the majority of the participants. This is desirable, since the ultimate question was asked whether the forensic report would indicate whether the suspect had committed the crime in question. Most participants understood that the forensic report alone cannot answer that question. All the participants who provided an incorrect answer had put their answer, as “I do not know”. However, when asking whether the report would indicate whether the suspect is the source of the sample found on the crime scene, 45% of the participants answered incorrectly. It is preferable that the participants understand what the report is not for, but it is slightly concerning that almost half of the participants do not understand what the report is for.

(18)

15

3.3.3. A closer look at the answers: true-false questions after the conclusion

In this section, the same example statements are used as in Table 2. The handwriting statements were essentially the same but were worded differently to fit the scene and the type of findings.

For the first statement, “it is possible to get these results even though the suspect would not be the origin of the sample”, when looking at all the answers, 60% of the statements were answered correctly. When looking at the answers separately for each strength, there does not seem to be any general pattern which would explain the large number of participants who misunderstood the uncertainty of the analysis. When individual answers of the participants were inspected, there were occasions when people answer correctly when the strength of the findings was weak, but when the strength was moderate or strong, they answer incorrectly. Of all the second statements, 48% were answered correctly. The DNA version of this statement was as follows “the conclusion shows that it is extremely improbable that the DNA could originate from anybody else except the suspect”. Looking at the raw data, it becomes obvious that this misunderstanding was connected to the strength of the findings. When the strength of the conclusion was strong, participants answered incorrectly to 94% of the statements and when the strength was weak, participants answered incorrectly to 14% of the statements.

The third statement was “the results from the report prove that the suspect was at the crime scene”. Of all the third statement answers, 85% were correct. The mistakes mainly occurred when DNA findings were in question. This could be explained by the handwriting findings being on a piece of paper, and it is easy to understand that paper can be transferred, whereas, there might not be as much knowledge of transferring DNA. Thus, participants could have thought that DNA findings prove that the suspect was at the scene. Of the fourth statement, 88% of the statements were answered correctly. This statement in the DNA section of the survey was “the observations support the proposition that the DNA originates from the suspect rather than the alternative propositions that the DNA originates from another unrelated person”. This shows that majority of the participants understood the directionality the conclusions have, i.e. support for H1 over H2. Most the incorrect answers, 70%, were found in the answers after a weak conclusion. This manner of misunderstanding the strength is called the weak evidence effect.

The fifth statement in the DNA section of the survey was “according to the conclusion, there is some possibility that the DNA could originate from another person than the suspect. Thus, the report does not aid to answer the question whether the DNA originates from the suspect”. Of all the fifth statements, 55% were answered correctly. Here a relation to the strength of the conclusion was observed as well. For instance, from the answers after a weak conclusion 61% were incorrect, whereas after a strong conclusion 28% of answers were incorrect.

Overall, there seems to be incorrect reasoning which is related to the strength of the conclusions as well as to the type of findings. Out of these five questions, two seemed to have incorrect answers when conclusions were strong. Two others seemed to have incorrect answers when the conclusions were weak. Moreover, one question seemed to have incorrect answers because of the type of findings presented.

3.3.4. A closer look at the answers: warm-up questions

Three questions were shown to the participants to get them familiar with how the Likert-scale was intended to be used. The first question was about throwing a fair coin one time. The answer to the statement is that it is equally likely to land on either side. All the participants answered this correctly. The second question was about picking a certain card from a deck of 52 cards, the probability to get that certain card would be 1 in 52. The correct answer for this was 3 on the Likert-scale. The average for this was 2.06 (SD=0.80). Even though most of the answers did not land on the intended one, it is not a huge concern as the participants had not received any numerical context to attach to the Likert-scale. The directionality of the answer was correct.

(19)

16

The third warm-up question was about the chances of winning the lotto with an already bought unchecked ticket. The answer for this is 1 on the Likert-scale, which the majority got correct. The average of the answers was 1.31 (SD=0.78). There were two individuals, which considered it as likely to win as not to win.

Overall, the participants answered the warm-up questions well, which gives an indication that the participants understand the basics of probabilistic reasoning.

3.3.5. A closer look at the answers: comments from the participants

Out of the 32 participants, nine left comments at the end of the questionnaire. Six participants stated that some aspects of the questionnaire were difficult to understand. Two participants stated that the questionnaire was clear and precise, and one of them requested a more comprehensive survey for the future. One person mentioned that they always consider forensic evidence in the context of a case, and thus it was difficult to estimate whether the appendix helped in the interpretation of conclusions.

The following things were mentioned to be difficult: age question, probabilities, true-false statements, questions before conclusion were difficult to understand, and the structure of the questionnaire was confusing. Lastly, it was commented about the LR style conclusions that they leave too much for interpretation, are confusing and this will devalue the work of technical investigation and if this style of report is shown in court, any kind of conclusion could be made from the report.

(20)

17

4. Discussion

The main goal of this project was to create a measuring tool for NBI-FL to assess the understandability of their reports. Another goal was to create a version of LR style reports and compare them to the current style reports. The different assessments of the data here and how they relate to other studies done previously are discussed here. Recommendations are included for improving the survey and on how the understandability of the reports could be increased. The reliability of the survey was assessed with Cronbach’s alpha which is a measure commonly used in psychological studies. The understandability index helps to look at the differences with the different conditions in the survey. The reasons for misunderstandings in some of the individual questions are also assessed.

As mentioned, Cronbach’s alpha was used to assess the reliability of the survey and it was calculated for a variety of conditions. However, there is no universal rule how Cronbach’s alpha values should be interpreted, and it depends heavily on the number of items, the participants, and the type of questions. Regardless, the fact that half of the results lie below 0.60 (especially with handwriting results), when different conditioning variables are taken into account, indicates that the questions might not be measuring the understandability of the report as intended and thus reformulation of the true-false statements should be considered. Likert-scale questions fared well, but the number of items was much smaller which can lead to higher alpha values. The DNA sections Cronbach’s alpha values were higher than in the handwriting sections, which shows that the true-false questions worked somewhat well in DNA sections, but for some reason the same questions did not work as well in handwriting sections. If the alpha value is calculated for all the true-false questions the alpha is 0.65 (k=36, n=32). For comparison, De Keijser & Elffers’ questionnaire used true-false statements as well which resulted in an overall α=0.67 (k=14, n=285), which they determine to be acceptable (12). So, just by looking at the overall alpha values of this survey, it could be concluded to be working as intended. However, creating a tool for measuring understandability of forensic reports, it should be able to measure it for different fields, so that comparisons can be made, and with the low alpha values observed currently it cannot be concluded with confidence that the survey is measuring the intended effect.

As Thompson & Newman recommend, different ways of measuring the effect that forensic information has on an individual should be tested and they recommend using at least two ways of measuring the same effect simultaneously to have more confidence in the findings (15) . This could be achieved by adding another way to measure the belief change and adding questions that repeat the same statements as in the true-false statements but from a different angle. So, for example the statement where it was stated that ‘the findings support H1 over H2’, would be stated again later but as ‘the findings support H2 over H1’.

Other ways to assess the reliability of the survey would be to check the stability of the questionnaire. This could be done by presenting the same questionnaire to the same participants with enough time in so that they do not remember their answers from the previous time. A way to assess the validity of questionnaires would be to confirm the conclusions through other means than the questionnaire. For example, studying which type of conclusions are derived from forensic reports in case work and in court and compare whether this reflects what was found with this study.

The issue with the true-false questions can be that they are based on concepts which come from the LR framework. Prosecutor’s fallacy and defence fallacy are concepts which are discussed whenever LR for forensic reporting is discussed. As these concepts are not familiar for most Finnish criminal justice professionals, it could be that they did not understand the language used true-false statements. Thus, the questionnaire could have not measured the understandability of the reports, but rather whether the participant was skilled in probabilistic reasoning. Factor analysis could help to look for unintended variables that the survey is measuring. However, as the sample size is small in this study under each condition, it would not lead to a reliable assessment and as such this approach was not applicable in the current case.

(21)

18

Finding the right questions to measure the understandability as intended becomes challenging if the way common misunderstandings are phrased is completely unfamiliar to the participants. This is an important issue to solve because it is critical to measure whether these common misunderstandings of forensic findings are occurring, as they have an impact on society through the criminal justice system. Through a reliable measuring tool, trends could be recorded and if there are changes to the reports, their impact can be observed. As Dror (23) states in his paper and Curley et al. in their paper (24), forensic science would benefit of working together with experts of cognitive neuroscience. Experts of cognitive neuroscience understand decision making and their input could be beneficial in creating these measurement tools.

While there are potential issues with the questions, it is nevertheless still interesting to see how participants answered them as it provides at least preliminary indications of the understandability of forensic reports. When the participants answered the belief-change questions, they put seven as their answer for DNA prior probability more often than for handwriting. This also occurred when DNA was presented second, where they had already answered four for handwriting prior. So, this cannot simply be explained by misunderstanding the Likert-scale. This could indicate that participants have different attitudes towards handwriting and DNA. There is also anecdotal evidence, that on some occasions criminal justice professionals rely on the name provided in a DNA report without assessing the context or the strength which is to be associated to the findings. This could explain some of the differences between the handwriting and DNA prior beliefs.

Thompson & Newman also inferred from their results that participants had different attitudes towards DNA and shoeprint findings (15). They speculate that media influences how the participants view different type of forensic findings. The media could also have an influence in Finland as well, as the same CSI TV shows are running regularly on TV. This research included DNA and handwriting cases, which could be considered comparable to DNA and shoeprint findings, as both handwriting and shoeprint analysis are based on subjective expert analysis. Therefore, the attitude could be different towards DNA and thus the prior beliefs differ from handwriting prior beliefs.

The true-false statements prior to the conclusion showed that the participants understood what the report did not indicate, i.e. that the report did not prove guilt or whereabouts of the suspect. However, also almost half of them did not recognise what the report did indicate i.e. whether the sample could have originated from the suspect. Perhaps the participants felt that the uncertainty expressed in this statement made it a false statement. As forensic conclusions in NBI-FL are sometimes phrased as “it is extremely probable that the suspect is the source”, the uncertainty could feel unfamiliar. De Keijser & Elffers also found in their study that the majority of participants understood that the forensic report does not prove guilt (12). The participants in their study were criminal justice professionals as well, so there is evidence showing that professionals recognise that forensic reports do not prove guilt.

The true-false statements after the conclusion were aimed to test whether the intention of the reports was understood correcty. The first of these statements (“it is possible to get these results even though the suspect would not be the origin of the sample”) was answered incorrectly by 40% of the participants and upon closer inspection of the answers there does not seem to be any general pattern which would explain the incorrect answers. On occasion participants answered correctly when the strength of the findings was weak, but when the strength was moderate or strong, they answered incorrectly. This would indicate that these participants recognise that there is uncertainty to the analysis when the strength of the findings is weak but disregard the uncertainty when the evidence strength is higher.

Prosecutor’s fallacy (“the conclusion shows that it is extremely improbable that the DNA could originate from anybody else except the suspect”) was clearly connected to the strength of the findings. The stronger the findings were, the more prosecutor’s fallacy took place. At the same time defence fallacy (“according to the

(22)

19

conclusion, there is some possibility that the DNA could originate from another person than the suspect. Thus, the report does not aid to answer the question whether the DNA originates from the suspect”) occurred mostly when weak findings were presented. With this kind of reasoning we would end up in a scenario where weak findings are disregarded, and strong conclusions can be absolutely relied upon. Similarly, Thompson & Newman found in their research that misunderstandings were connected to the strength of the findings (15). This is not ideal as weak findings can still be beneficial within the context of a case and strong findings still have uncertainty inherently associated with them which needs to be recognised.

The true-false statement about whether the findings prove that the suspect was at the crime scene (“the results from the report prove that the suspect was at the crime scene”) was mostly answered incorrectly when DNA findings were in question. This could be because transfer of material is easily recognised when a piece of paper is in question, but it might be harder to recognise the possibility of DNA transfer. The answer to the statement is false if only the DNA report is considered, as the DNA report cannot prove by itself that the suspect was at the crime scene, but together with other information, they could prove beyond reasonable doubt that the suspect was there. It could be that the participants considered the statement from the point of view that other information would also be available and not solely the DNA report. The reason for this speculation is that one of the participants commented that it is difficult to answer some questions because usually they consider forensic reports in the context of the case. Thus, it could be possible that some other participants felt the same way. In the future the instructions on how to answer the questionnaire should be elaborated on to make sure that participants consider only the information which is presented to them. The true-false statement about the directionality (“the observations support the proposition that the DNA originates from the suspect rather than the alternative propositions that the DNA originates from another unrelated person”) was mostly answered incorrectly with weak findings, again showing the weak evidence effect. Martire et al. found in their study that the weak evidence effect took place in all presentation styles, although it was most prevalent when results were communicated verbally (14). This is something that could be improved upon with training so that criminal justice professionals recognise that weak support for one scenario does not translate into support for the alterative scenario.

When looking at the comments of the survey, it creates an impression that some participants feel uncomfortable with uncertainty and probabilities. This would suggest that some of the criminal justice professionals are unfamiliar with statistical data. This is to be expected since law or police students do not typically have extensive statistical training in higher education and are likely training themselves during their professional career. To solve these misunderstandings and unease about statistics, training would need to be offered to those who use forensic reports in their work. It is important that NBI-FL provides reports based on sound scientific data and reasoning. However, at the same time it needs to be ensured that the information conveyed through the reports is understood correctly so that criminal justice professional’s work is efficient, and injustices do not take place.

This pilot study has given insight on possible measuring systems, however for the future, it is recommended that the survey is still improved upon. The impact of changes in the survey could first be tested with another smaller participant group, and when the measuring system is deemed acceptable it can be used to measure the impact of possible changes on the forensic reports with a larger group of relevant individuals.

The Likert-scale could have more steps to increase the nuance and possibly reduce the issue with calculating belief-change if participants choose seven in the scale. Another option would be to request the participants to give a number to which degree they support or do not support the statement given to them. This could erase the issue with reaching a limit with the Likert-scale, however, often the issue with this type of measurements is that it is possible that individuals understand the numbers differently and it gets again

(23)

20

difficult to assess whether the intended latent variable is being measured. Another option would be to use both as Thompson & Newman recommend (15).

Questions could be added to the part where reasoning based on the reports is measured by adding statements which asks about the same topic from both directions, i.e. when answer to one question is true when the answer to the other question is false. This would help inspect whether the participant understands the reasoning that can be made from the conclusions.

Naturally, a larger number of participants in the future would give a more comprehensive result on how well the measuring system is working. As well as give more insight on the prevalence of the effects seen in this research.

Despite improvements on the forensic report presentation and creating ways to measure the effectiveness, there are limits to how understandable the report can be made while remaining factual and scientifically sound. Therefore, training is recommended for criminal justice professionals (13) along with subsequent testing with a survey whether the training has an effect on the results (14). Van Straalen et al. recommend also hiring enough forensic advisors for court (13). These forensic advisors can aid in interpreting forensic findings as well as help to review whether the analysis is scientifically sound. This is a good suggestion, as forensic science is a constantly developing subject and it might not be realistic that other criminal justice professionals need to be experts on interpreting all types of forensic findings.

(24)

21

5. Conclusion

The method designed in this study for measuring the understandability of forensic reports in Finland shows great potential. While it might require improvement for the future, the main issues are known, such as limitations of the Likert-scale, the need for more thorough instructions and the need for inclusion of question pairs which correlate directly. This knowledge gives direct pointers for the next steps and should provide a solid foundation for the next survey.

There were no large differences in the understanding index between LR style and the current style. However, a difference could be seen in the index between DNA and handwriting reports. This could indicate differences in attitude toward different findings since both were given the same strengths to associate with the findings. Understandability and understanding of forensic reports should be increased in the future both through optimizing forensic reporting and training criminal justice professionals.

(25)

22

Acknowledgements

I would like to thank my supervisor Tuomas Salonen for all his guidance and support. I am grateful for all he has taught me. His endless positivity and creativity helped to finish the project during the special circumstances during the COVID-19 pandemic. I would like to thank my manager Tapani Reinikainen for supporting this project and his enthusiasm to improve the level of forensic science is inspirational. I would like to thank the forensic experts Elina Rönkä, Emilia Lindberg and Johanna Lehto on their guidance with creating the fictional reports. Their insight was invaluable for this report. I would also like to thank Sani Marttila and Rebecca Bucht for kindly giving guidance on creating the survey. And lastly, I would like to thank my examiner Marjan Sjerps who gave me great feedback and provided me with resources for the project.

Referenties

GERELATEERDE DOCUMENTEN

The crises passed by, as soon as the women changed the social situation they were living in.. These women were using the languages of their bodies to express

De tijd was er bijvoorbeeld nog niet rijp voor of achteraf bezien zaten niet de juiste partijen aan tafel.. Toeval speelt ook

In this case, if we argue, as in Section 1, that value added can be useful for predicting earnings, then it should also be useful for m eeting this user group’s

WB H&A confirmed that the change of the project scope (focus on the no regret measures instead of the master plan) was one of the main reasons to abandon this method [I12].

The fWHR scores of female populist politicians in the European Parliament does not seem the change the lower dominant results that are found for populists in general.. The results

(Fig. 3), and the near coincidence of these curves in the case of hydrophobic particles, suggest that ~ and ~pL are proportional to n0, the viscosity of the

The purpose of this study is to explore the variability and differences of the quality of sustainability assurance over the years, and to explore if this quality

The last 3 companies (Heineken, Royal Delft and Ahold) probably included a separate risk section in the annual report after the Corporate Governance Code because it was