• No results found

Probabilistic interpretation of tests for cell type determination and association with the DNA donor

N/A
N/A
Protected

Academic year: 2021

Share "Probabilistic interpretation of tests for cell type determination and association with the DNA donor"

Copied!
54
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Article manuscript; journal: Forensic Science International

* Corresponding author:

Email address: wtoosterman@gmail.com Student number: 6122841

Probabilistic interpretation of tests for cell type determination and

association with the DNA donor

Wessel Thomas Oosterman

a*

aUniversity of Amsterdam bNetherlands Forensic Institute

A R T I C L E I N F O

This article is the report of a research project, MSc in Forensic Science, University of Amsterdam (36 ECTS)

Project supervisors:

Jacob de Zoetea, Bas

Kokshoornb, Petra Maaskant – van Wijkb

Project examiner:

Marjan Sjerpsa,b

Project history:

Start: 19-3-2015 End: 18-9-2015

Report submitted 11-9-2015

Project location:

Netherlands Forensic Institute

Keywords:

Bayesian network Source level

Probabilistic interpretation Combining evidence Cell type determination

A B S T R A C T

In forensic casework on biological samples a forensic investigator has to make an assessment of the identity of the DNA donor(s) and of the cell type(s) that a sample is composed of. The current method of reporting a putative cell type is based on a non-probabilistic assessment of the test results. Currently no weight of evidence is given in the expert report and the possibility of a false positive or false negative result is not quantified. Additionally, the association between DNA donor and cell type in mixed DNA profiles can be exceedingly complex. The quality of the report and the transparency of its reasoning can be improved with a probabilistic approach. We introduce a Bayesian network that may assist a forensic practitioner to form a substantiated opinion about the cellular material(s) that a sample is composed of and about the attribution of donor to cell type. Further acquisition of data to support the probabilities in the network is required. Therefore the network should not be used as a ‘black box’ model. Instead the network may serve as a starting point for the relatively unexplored area of probabilistic interpretation of biological traces at source level.

(2)

Table of Contents

1. Introduction ... 3

2. The Bayesian method and Bayesian networks ... 6

2.1 An example ... 6

3. Probabilistic interpretation of cell typing results ... 8

3.1 Structure of the network ... 8

3.2 Conditional probabilities of test nodes ... 9

3.3 The prostate specific antigen test ... 12

3.3.1 Effect of sample type, time and temperature ... 13

3.3.2 Cross reactivity of the PSA test ... 15

3.3.3 False positives and false negatives ... 16

3.4 Further literature research ... 17

3.5 Discussion ... 17

3.6 Case example 1 ... 18

4. Bayesian DNA profile interpretation ... 20

4.1 Implementing DNA profile interpretation software in a Bayesian network ... 20

4.2 A network for DNA interpretation ... 21

4.3 Case example 2 ... 22

4.4 Discussion ... 23

5. Association between donor and cell type ... 24

5.1 A Bayesian network for two contributors ... 24

5.2 Case example 3 ... 25

5.3 Discussion ... 25

6. Association between donor and cell type by mixture ratio ... 27

6.1 An example ... 27

6.2 Making the association ... 27

6.3 Deducing the DNA template quantity from the DNA profile ... 28

6.4 Peak height association in a BN ... 29

6.5 Case example 4 ... 30

7. Results ... 31

7.1 A detailed case example ... 31

7.1.1 A case specific Bayesian network ... 31

(3)

7.1.2 Hypothesis testing ... 31

7.2 Sensitivity analysis ... 32

7.2.1 Analysis of the conditional probabilities in the [PSA test] node ... 32

7.2.2 Analysis of the [DNA interpretation software] node ... 34

7.2.3 Discussion ... 35

8. Discussion and conclusion ... 37

8.1 Other methods for associating donor with cell type ... 37

8.1.1 Quantification of cell type ... 37

8.1.2 Context of the trace ... 37

8.2 Conditional independence... 38

8.3 Prior probabilities ... 38

8.3.1 Using soft evidence to alter prior probabilities ... 39

8.4 Conclusion ... 40

8.4.1 Recommendations for future research ... 40

References ... 42

Appendix I: Literature study into the PSA test ... 45

Appendix II: Literature study into the RSID semen ... 47

Appendix III. RSID semen discussion section ... 49

Effect of sample type, time and temperature ... 49

Cross reactivity of the RSID semen ... 50

False positives and false negatives ... 50

Appendix IV: Analysis of the literature on cytological analysis ... 52

Appendix V. Sensitivity analysis of the RSID semen test ... 53

1. Introduction

In forensic investigations it has become accepted practice that a forensic practitioner evaluates the evidence under two competing propositions, associated with the prosecution’s and the defence’s point of view. A set of propositions has a distinct rank in the hierarchy of propositions, with each subsequent level being more informative to the criminal justice system [1] (Table 1). While the first three levels lie within the field of expertise of the forensic practitioner, offence level is traditionally the purview of the trier of fact. The

scientific community has already invested in the interpretation and evaluation of biological traces at sub-source [2] and activity level [3]. Interpretation of biological traces at source level has remained relatively unexplored.

In the investigation of biological evidence at source level there are usually at least two questions of main importance: “Who donated the sample?” and “What cellular material(s) is the sample composed of?”. To answer the first question the forensic practitioner performs DNA analysis. The putative amount of donors

(4)

Table 1

The four levels in the hierarchy of propositions with an example of a possible question under investigation and the corresponding pair of propositions.

Level in hierarchy Example question Hp and Hd

Sub-source ‘Who is the donor of the

DNA?’

Hp: The defendant donated the DNA

Hd: An unknown individual who is unrelated to the defendant donated the DNA

Source What cellular material did

the defendant donate?’

Hp: The defendant donated the semen

Hd: The defendant did not donate semen but another cellular material instead

Activity ‘How did the defendant’s

semen get on the victim’s clothing?’

Hp: The semen got on the victim’s clothing through ejaculation of the defendant

Hd: The semen got on the victim’s clothing through secondary transfer of the defendant’s semen

Offence ‘Did the defendant rape the

victim?’

Hp: The defendant raped the victim Hd: The defendant did not rape the victim is reported as well as any matches in the national DNA

database or with a person of interest (POI). The strength of the evidence is evaluated with a statistical approach. To answer the second question, forensic practitioners can perform a large number of tests that are chemical, serological and microscopic in nature. These ‘classical’ tests can be used to indicate the presence of semen, saliva and blood (the ‘classical traces’) and often give an immediate result1. Many tests can occasionally demonstrate (false) positive reactions with non-organic substances, as well as cross reactivity with other cell types, complicating the interpretation of

1 Recent research has demonstrated potential new methods for cell type identification. These methods include several molecular approaches [4] (mRNAs, miRNAs, DNA methylation and microbes are potential markers) as well as various spectrometric techniques [5]. With these markers it might be possible to identify various types of cellular tissues in addition to the classical traces.

a positive test result. Currently a positive serological test is usually reported to give ‘an indication for the presence’ of a particular body fluid2. Since currently no weight of evidence is given in the expert report (the likelihood ratio or LR), the possibility of a false positive or false negative result (the context of the trace) is not quantified (but may be subjectively indicated) [6]. Additional information can be utilized in the expert report. The report can potentially be more specific and the current manner of reporting could result in incorrect interpretation of the expert’s testimony by the trier of fact.2

2 Both ‘body fluid’ and ‘cell type’ are used throughout this paper. While not all cell types are body fluids, every ‘classical’ test is for the indication of a body fluid. Both are used whenever appropriate.

(5)

Another instance which might lead to misapprehension of the forensic report by the court concern cases where more than one person contributed to a crime stain. In these mixed profiles, since DNA profiling is performed independently from cell type testing, the association between donor and cell type is far from straightforward. For instance, in a mixed profile comprised of two individuals in which blood is assumed to be present, the cell type cannot be attributed to one contributor. After all, the blood could have been deposited by either individual, with the second profile originating from any kind of cellular material, including blood. Even if a stain consists of a large amount of DNA rich material such as blood, the donor that contributed the highest amount of DNA (the major contributor) cannot be automatically associated with this cell type. This was demonstrated in a study where blood stains were deposited on a glass substrate that was previously touched [7]. From one large blood sample (15 μl) the substrate handler’s contributed the major component of the profile while the blood donor contributed the minor component. The authors of [7] suggest a conservative approach for reporting DNA – cell type associations.

An unfounded association between a specific cell type and a donor has been recognized as the ‘association fallacy’ [8]. The current practice of reporting has no explicit empirical basis and can lead to an association fallacy. This can have serious consequences, possibly leading to a miscarriage of justice. It is hard to estimate how deleterious this practice is because errors might never be discovered. Nevertheless instances are known where an association fallacy has occurred with serious consequences. An example concerns the case of the wrongful arrest of Adam Scott [8]. In this case mr. Scott was arrested and incarcerated for rape based on a DNA match from a vaginal swab. This swab resulted in a mixture of profiles matching the victim, the victim’s boyfriend and the defendant. The DNA from Adam Scott was concluded to originate from sperm cells; however his DNA later turned out to originate from an instance of contamination of Scott’s saliva. The real perpetrator

was not present in the DNA profile. The positive test which led to the indication for sperm was associated with the DNA of the defendant while the sperm could have logically originated from either of the two males in the mixture1. Moreover, a priori it was not unlikely have originated from consensual sex with the victim’s boyfriend. This case illustrates two problems with the current evaluation of biological evidence. On one hand there is the association fallacy that led to the false association between Scott’s DNA and the sperm; on the other hand there is the disregard of important prior information, namely a plausible innocent explanation for the presence of sperm.

A formal method for the combined interpretation of DNA and cell typing evidence can help overcome these kinds of errors. However this area has received little attention from the forensic community. One paper has presented a Bayesian approach for evaluating saliva test results in cases of alleged sexual assault with oral intercourse [9]. Also a probabilistic approach to saliva identification with the classical tests was developed at the Netherlands Forensic Institute by de Wolff (in press)

[10]. Little attention has been given to the interpretation of blood and semen test results. There are several sources of information that are indicative of the origin of a cellular material. Currently these factors such as the quantity of DNA are discarded for the interpretation. In this paper a Bayesian network is presented which can combine these factors to make a probabilistic statement regarding which cellular material(s) were donated by which person of interest. In Section 3 the probabilistic interpretation of cell type is explored. A method to model the interpretation of DNA is presented in Section 4 and the association between DNA donor and cell type is done in Section 5. The performance of the network is evaluated with a case example and a sensitivity analysis in Section 7. Discussion and conclusions can be found in Section 8.

1 Additionally, since positive test results cannot provide certainty about the presence of semen, a ‘100%’ association between DNA donor and cell type is unwarranted.

(6)

2. The Bayesian method and Bayesian networks

A Bayesian network (BN) can be used to graphically display how variables are causally related (for instance variables involved in DNA profile interpretation). The Bayesian network method of evaluating the strength of the evidence has been adopted by a number of forensic fields, including DNA analysis. They have become generally accepted as a representation scheme for uncertain knowledge [11]. Also the intuitive graphical representation of a BN allows for easier dissemination of knowledge to non-experts [12]. Since the DNA-cell type association problem involves a large number of variables, the required probability calculations quickly become too numerous and complex to do manually. In dedicated software packages such as HUGIN [13] and AgenaRisk [14] these calculations are performed automatically. Another advantage of computer modelling is that, once the general structure of the network has been decided, networks can be procedurally generated in a modular fashion to fit a specific query of the operator. This way it is not necessary to build a new network from scratch for each instance in which only a slight adaptation of the original is required. Most importantly, the subconscious procedure to arrive at a conclusion regarding the evidential value of some observations becomes insightful and reproducible.

An example of a Bayesian network is given in Figure 1. The random variables are graphically represented by discrete units, which are called nodes. A dependency between two random variables is represented by an arrow which points from the cause to the effect, an

edge. In this paper nodes will be denoted between

[square brackets]. In Bayesian networks a certain nomenclature is used to indicate nodes in relation to each other. When an edge points from [E] to [H], [H] is called the parent of [E] and [E] is called the child of [H]. If [Q] is then the child of [H], [Q] is called a descendant of [E] and [E] is called an ancestor of [Q]. A node that has no parents is called a root node.

Figure 1

An example of a Bayesian network.

A node usually consists of a finite set of mutually exclusive states. Each state has a certain probability of occurring, given a (combination of) states of its parental nodes. These conditional probabilities are summarized in a node probability table (NPT). States will be denoted between <triangle brackets>. In the aforementioned example [H] contains probabilities conditional on [E], which is usually notated as 𝑃𝑟(𝐻|𝐸). Root nodes do not have conditional probabilities but instead a prior

probability distribution to indicate how likely each state

is a priori if no other information is known: Pr(E). Once a network has been constructed it can be used to process new information or evidence. The degree of belief in a proposition 𝐻 can be calculated with the new evidence 𝐸.

𝑃𝑟(𝐻|𝐸) =𝑃𝑟(𝐸|𝐻) ∗ 𝑃𝑟(𝐻) 𝑃𝑟(𝐸)

= 𝑃𝑟(𝐸|𝐻) ∗ 𝑃𝑟(𝐻)

𝑃𝑟(𝐸|𝐻) ∗ 𝑃𝑟(𝐻) + 𝑃𝑟(𝐸|¬𝐻) ∗ 𝑃𝑟(¬𝐻) (1)

This rule for updating degrees of belief is known as

Bayes’ theorem. 2.1 An example

A forensic service provider is testing a new blood test. It gives a positive reaction with 99% of human blood samples but also with 1% of samples that do not contain any blood. A forensic practitioner uses the new blood test on a balaclava and gets a positive test result. A background study indicated that 10% of samples taken from casework balaclavas contain blood. The probability that the balaclava contains blood given the positive blood test can then be calculated:

(7)

Table 2

Probability table for the [Trace] node

[Trace] Prior probability

<Blood> 0.1

<No blood> 0.9

Figure 2

Bayesian network representing the causal relation between trace and blood test

Table 3

Probability table for the [Blood test] node

[Blood test] [Trace]

<Blood> <No blood> <Positive> 0.99 0.01

<Negative> 0.01 0.99 𝑃𝑟(𝑏𝑙𝑜𝑜𝑑|𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑠𝑡) = 0.99 ∗ 0.1

0.99 ∗ 0.1 + 0.01 ∗ 0.9 ≈ 0.917

With a BN one can gain insight into the types of data that are required to make a valid statement about the association between DNA donor and cell type. With such a model it is possible to account for the possibility

of obtaining a false test result and to demonstrate the dependencies between the prior expected outcome of a test, the test results, and the degree of belief that the observations support one proposition over another. In the following section a network for the probabilistic interpretation of cell typing test results will be introduced.

(8)

3. Probabilistic interpretation of cell typing results 3.1 Structure of the network

A BN for probabilistic interpretation of ‘classical’ cell type determination tests is presented in Figure 3. This network is able to interpret any test which has two or more discrete outcome options, for instance a test which can be ‘positive’ or ‘negative’. Examinations which utilize an array of markers such as RNA or microbial analysis require a more statistically sophisticated method to interpret and are not included in this paper. Instead the tests under consideration are the so called ‘classical’ tests for cell type determination which are chemical, immunological, enzymatic or microscopic in nature (Table 4). These tests are utilized for the determination of blood, saliva and semen. Despite extensive recent advances in the area of cell type determination, traditional methods are still the most commonly used in forensic laboratories. Some of these tests demonstrate reactions with multiple body fluids or with non-human organics and non-biological material (for examples, see Table 7). When this undesirable reaction is due to the presence of the test’s target molecule (reactant) in multiple body fluids it is called cross reactivity. When the undesirable reaction is due to a non-specific reaction with a molecule different from the proper target molecule, it is called a false

positive (e.g. with condoms, bleach or strong acids).

Because the classical tests generally lack specificity, the cell types which are under consideration are not limited to blood, saliva and semen. Instead any substance that might react with one of the tests should be included in the network. Body fluids which are known to demonstrate cross reactivity with one of these tests include: sweat, vaginal fluid, breast milk, urine1 and feces. It is possible that there is some kind of untested substance which reacts with a test. Since it is not feasible to test every substance that might be encountered at a crime scene, all untested substances are included in an ‘other’ category. This ‘other’ category

1 It was decided to differentiate between male and female urine, considering their difference in reactivity where the PSA-test is concerned (Section 3.3).

refers to the probability that a test reacts with a cellular material that does not have a category of its own. Non-cellular substances that lead to a positive test result (false positives) are included in a ‘none’ (no cells) category.

Table 4

Common tests for the determination of the classical traces. Blood Tetrabase test

Saliva Amylase (pressure) test, RSID saliva Semen Acid phosphatase (pressure) test,

cytological analysis, differential lysis, RSID semen, PSA test

The aforementioned cell type categories are implemented in the BN as states of [Cell type(s) present

in trace] (Figure 3). This node contains the exhaustive

set of all possible combinations of cell types given the considered types. The forensic practitioner can decide to use only those cell types which are relevant for a specific piece of evidence. For instance, it might be decided that breast milk is extremely improbable. The states of the individual cell typing tests are <positive> and <negative>. Exactly what constitutes a positive test is determined in a protocol which might differ between laboratories. Most importantly, one should stick with the protocol that was used to determine the test outcome of samples in the training database. In Section 3.3, the probability of obtaining a positive PSA test given that a certain cell type is present will be estimated from published literature.

It would be possible to do a more extensive evaluation of the relation between cell type and test outcome. For instance, tests to demonstrate saliva do not react to saliva specifically. Instead it reacts to the marker α-amylase which is an enzyme involved in the digestion of starch. However, an individual’s α-amylase activity might be low due to a number of reasons, including ethnicity, smoking, diabetes, medication and diet. Overall, around 6% of donors have low amylase concentrations [15]. By including an additional node [Amylase present?], which is the child of [Cell type(s) in

(9)

saliva is present can be modeled. However, the probability of a positive test result given that amylase is present is difficult to determine due to the absence of any literature with these kinds of experiments; only the test outcome given a certain cell type has been investigated. Moreover, it is unclear with what criterion amylase should considered to be ‘present’ in a sample. For these reasons it has been decided to leave this intermediate node out of this network. For a more detailed exploration of this mechanism, see [10].

Figure 3

A generic network for the interpretation of cell type determination tests.

The outcome of a test might depend on the fact that another test has been carried out beforehand. This principle is most applicable to pressure tests (amylase-, combined- or acid phosphatase pressure test). ‘Pressure tests’ are frequently utilized to search a piece of evidence (usually fabrics) for relevant traces. The pressure test involves a sheet of paper which is incubated with a substrate that changes colour when it comes in contact with a specific body fluid. Pressure tests are always carried out before any other test. If a pressure test has been carried out the probability of a positive subsequent serological test might be altered if the pressure test consumes or dilutes part of the sample, lowering the probability that the subsequent test detects its substance. It is likely that the effect of a pressure test on a subsequent test might be very small

[10]. A biological explanation could be that the pressure

test only consumes part of the substance on the upper part of the fabric, not any in the medial part of the stain. Serological tests use extraction of the cellular material from a cut-out which also reaches part of the substance that has penetrated the fabric. Consequently there should be a substantial proportion of the cellular material in a stain left after the pressure test. In addition, pressure tests are much less sensitive than serological tests. For these reasons it has been decided to leave the influence of the pressure test out of the network. If evidence arises to indicate that the effect of pressure tests on subsequent tests is significant, this relation could be included in the network as is illustrated in Figure 4.

Figure 4

An illustrating BN for the influence of a pressure test on subsequent a test. The states of [Amylase pressure test] are <Positive>, <Negative> and <No pressure test performed>. If the pressure test is preformed (either positive or negative), [Pressure test performed] is in <Yes>. The [RSID saliva] node then has different conditional probabilities for a certain cell type if [Pressure test performed] has the state <Yes> then if it has the state <No>.

3.2 Conditional probabilities of test nodes

The probability of a positive test given that a (combination of) cell type(s) is present can be estimated in several different ways. Firstly it is possible to ask forensic practitioners for their personal estimate of this probability. However in casework researchers cannot be sure whether a particular cell type is present or not. In spite of the scientists’ practical experience, we decided not to perform any expert elicitation experiments and strictly adhere to an objective, data driven approach. Secondly it is possible to estimate the relevant probabilities from a database of real cases. Unfortunately this method suffers from the same problem: the uncertain nature of any biological material present on crime samples. Thirdly the limit of detection (LOD) of a test can be used to estimate the conditional

(10)

Table 5

Factors of influence on test outcome.

Reactant concentration Absolute amount of reactant in test strip

Donor specific Degradation Determined in protocol Sample specific

Natural individual

variation, disease, tumors

Temperature, UV light, PH, wetness

Extraction efficiency, amount of supernatant used for test

Total amount of cellular material, presence of other body fluids

probabilities for the network. A test is unlikely to demonstrate a false positive reaction with a certain body fluid if its maximum reported reactant concentration is lower than the LOD. However limit of detection is measured in the maximum number of dilutions still providing a positive test. Since forensic samples mainly concern dried cellular material instead of diluted neat samples, the limit of detection is not a very suitable parameter for estimation of the probabilities in our network. To get the best representation of samples which are investigated at crime laboratories, ‘mock case samples’ are preferred. Therefore in this study it was decided to use the available literature with simulated case samples to estimate the conditional probabilities for the network. These include published (validation) experiments and internal validations carried out at the Netherlands Forensic Institute. The results of this literature study are presented in Section 3.3.

Different factors that are of influence on the outcome of a serological test are displayed in Table 5. There are two factors which determine the concentration of the reactant in a particular body fluid. Firstly there are individual differences due to natural variation and disease. Secondly, after the sample is deposited, degradation of the protein might occur by environmental factors such as (UV) light, high temperatures, wetness and exposure to an acidic environment. The absolute amount of reactant that is then available for the test depends on the aforementioned concentration and the total volume of body fluid. The total amount of reactant that is then pipetted into the test strip depends on the extraction efficiency and the amount of supernatant that is

subsequently used for the test. Finally the reactivity of the test might be inhibited by high concentrations of another body fluid.

Some of the factors which were introduced in the previous section can be negated by careful control. For instance, the protocol of the test determines the extraction efficiency and amount of material used for the test; by carefully adhering to a protocol these factors can be kept approximately constant between different instances that the test is used. If the conditional probabilities in the BN are then specified to a certain protocol, these factors do not have to be included in the network, provided that the protocol is adhered to. The interpersonal variation in reactant concentration is not known to the scientist, so this cannot be taken into account in the node probability table. However, the scientist usually has some idea regarding the total amount of cellular material present in the sample (based on the size of the stain and the appearance of whether a sample is diluted). One might argue that the probability of a positive test outcome is higher if a large amount of cellular material is clearly observed. The scientist would then apply a lower false negative rate. However this is a dangerous avenue because it would require the scientist to essentially predict the outcome of a test which might lead to confirmation bias. Moreover to our knowledge no research has been done about the correlation between an investigator’s expectations and the test outcome. For these reasons an estimation regarding the amount of cellular material is not used in the network.

If the reactivity of a test is inhibited by the presence of high quantities of another body fluid this can be included in network. The node probability table of [Cell

(11)

type(s) in trace] contains separate states for mixtures of

cell types, so when the reactivity of a body fluid is inhibited by the presence of another substance this probability can be altered for that particular mixture in the table. A method to include the influence of degradation into the network is depicted in Figure 5. Finally, there are numerous causes of altered reactant concentrations in individuals. These can be medical or pathological in nature (e.g. tumors, hormonal changes), but they can also be normal physiological situations, for instance age related changes. These can cause altered test reactivity with a body fluid; several instances will be treated in the next section. Care has to be taken to use an altered conditional probability when the presumed donor of the sample is known to have such a condition. A method to implement this in a BN is depicted in Figure 6.

Figure 5

A forensic practitioner can occasionally assess whether degradation has occurred. This observation can be indicated in a BN which includes a separate node for degradation (e.g. [Exposure to UV light]) that is a parent of the test node.

To conclude, when experimental literature is used to estimate the probability of test outcomes, their experimental setup should match the protocol that is used at the crime laboratory which uses the network because these probabilities change with the protocol. Especially the extraction protocol has a large influence on the proportion of the material that is wasted [16]. Buffers extract material with different efficiencies, often depending on sample type (type of fabric, brand of swab etc.). Furthermore, to get a good representation of the level of difficulty of casework samples, mock cases should be used to form a set of experiments that reflects the specific situation of the crime laboratory. This however is the ideal situation. The body of published literature that fits these demands is scarce since the protocols which are used vary significantly between studies and the mock case samples are too low in number to be representative for a specific crime

laboratory. For these reasons the literature study presented in the next section should be supplemented with new experimentation if the network is to be used for casework.

Figure 6

If a person of interest is known to have a condition that might lead to altered reactivity with a test this can be implemented in the BN for cell type determination. For instance, individuals suffering from a lung tumor can have semenogelin (the reactive substance with the RSID semen test) in their blood serum [17], possibly causing cross reactivity of the RSID semen with blood. If the RSID semen is performed on a sample which also has a DNA match with this individual (the victim in this example), an altered conditional probability should be used. To avoid using an altered conditional probability if it is a fortuitous match, a [Victim present] node is included with the states <Yes> and <No> (for DNA profile interpretation see Section 4). The probability of a <Positive> [RSID semen], conditional on <Blood> and <yes> can then be different from the probability of a <Positive> [RSID semen], conditional on <Blood> and <No>. The outcome of [RSID semen] given any other cell type is unaltered no matter the state of [Victim present?]. The method illustrated here can also be used in similar situations.

A meta-analysis into the reactivity of the PSA test with different materials and cell types is presented in the following section. This serves as an illustration of how the NPT of a cell type determination test can be distilled from the literature. Further studies into the RSID semen and into cytological semen analysis have also been performed and can be found in Appendix III and IV respectively. Any lab which performs these ‘classical’ tests should perform experiments with a variety of substances to investigate false positive and false negative rates within the framework of the lab-specific protocols. Only then will it be possible to have an accurate representation of the circumstances (protocol, kinds of casework, level of expertise of the operators) in that specific lab. This argument is just as relevant where the implementation of a BN is concerned since the probability values have to be accurately determined. The results of selected studies

(12)

were combined to obtain conditional probabilities for the BN for cell type interpretation. A challenge that was encountered is that studies often use different protocols for sample preparation, as well as different sample types and different brands of test strips. To this end samples can be divided into classes with similar circumstances; for instance stains on different kinds fabrics can be grouped together to reflect their difference in reactivity compared to other sample types. A conditional probability for the network is then obtained by dividing the amount of positive samples by the total amount of samples. There are also instances where a category has either all or no positive samples. The resulting probabilities of one or zero respectively are undesirable since it is unlikely that a test will never produce a different result if an infinitely large amount of samples is tested. To this end the upper bound of a 95% confidence interval is adopted as conditional probability for cell types which have demonstrated no reactivity with a test while the lower bound of a 95% confidence interval is adopted for cell types which have demonstrated 100% reactivity with a test. The Clopper-Pearson confidence interval1 (CI) was used to calculate the 95% interval. The confidence interval has two further functions. First, it shows categories which have a large CI and thus marks these categories as having a sample size that is too small and as targets for future research. Second, the CI serves as a range of values that can be used for the sensitivity analysis in Section 8.

The conditional probabilities that are derived from dividing the amount of positive samples by the total amount of samples should not be regarded as definite values. First of all, sample preparation and cellular quantity often vary strongly between studies; consequently pooling the data of many studies does not represent a specific situation that can be found in a laboratory but instead provides the general area where the correct conditional probability can be found.

1 The Clopper-Pearson CI does not suffer from inaccuracy when np<5 or n(1-p)<5; a requirement due to small sample sizes. Additionally, calculations with p=0 and p=1 are possible. Despite reports of conservatism of the Clopper-Pearson CI, no large differences were found with other binominal CIs (Wilson’s and Jefferey’s).

Moreover, the sample size of many ‘cell type categories’ is rather small. The probabilities that are derived by taking the upper or lower bound of the 95% confidence interval can also differ significantly from the value that would be expected based on our knowledge of the underlying biological processes.

3.3 The prostate specific antigen test

Prostate specific antigen (PSA) or p30 is a protein which is secreted by the prostate gland and is therefore present in high concentrations in seminal fluid. It was initially believed to be a specific marker for human semen, however in the past decades PSA has been found in various other male and female bodily fluids in differing concentrations. Low amounts of PSA have been found in blood, saliva, vaginal secretions, breast milk, breast tissue, female urine, in the periutheral glands, the endometrium and in amniotic fluid [18]; while high concentrations have been found in semen, male urine and in the blood of prostate carcinoma patients (Table 6).

For forensic purposes immunoassays are frequently utilized to demonstrate the presence of PSA in a sample. A common test, the Seratec® Semiquant, can detect PSA down to around 2 ng/mL, but at 0.5 ng/mL a weak positive test result may still be obtained [16]. An absolute limit of detection of approximately 0.4 nL semen has also been reported [19]; however PSA concentration in semen can differ drastically between persons, ranging from 70 to 2160 ng/mL [20], but values up to 5500 ng/mL have also been reported [21]. This will inevitably lead to intrapersonal variation of test sensitivity. Nonetheless, no individuals with PSA concentrations equal or lower than the test’s LOD have been reported so samples containing at least a few nanolitres of undiluted semen are expected to give a positive result. In the following sections the reactivity of different cell types and different sample types with the PSA test will be estimated. An overview of these values with accompanying confidence intervals can be viewed in Table 7. A complete list of all the studies used in the meta-analysis is in Appendix I.

(13)

Table 6

Reported PSA concentrations in various cell types.

Body fluid Average PSA concentration (ng/mL) Maximum reported PSA concentration (ng/mL)

Semen 820 [20] 2160 [20]; 5500 [21]

Blood 0.046 [26] 0.16 [26]; 0.61 [27]; 4 [16]; 2001[16]

Saliva 0.048 [26]; 0.024 [28] 0.03 [20]; 0.34 [26]; 0.06 [28]

Sweat No PSA found [21]

Vaginal secretions 0.11 [34]; 0.43 [35] 1.25 [34]; 5.0 [35]

Breast milk 1.0 [20]; 0.47 [36] 110 [27]; 21002[20]

Urine (male) 9151[30] 800 [29]; 28531[30]

Urine (female) 0.29 [31]; 1.73 [32]; 3.72 [33]; 0.52 [26] 1.06 [31]; 1.24 [26]

Feces No PSA found [21]

3.3.1 Effect of sample type, time and temperature

A distinction between different sample types should be made concerning reactivity with a PSA test. Most of the seminal samples that are encountered in casework can be divided into one of three categories: stains, liquid samples and vaginal samples. Below we will discuss how these might have a different outcome on the PSA test. It is possible to use different conditional probabilities for each sample type in a BN (Figure 7).

Figure 7

[PSA test] can have different conditional probabilities for each state in [Sample type]: <stain>, <liquid sample> and <vaginal sample>.

In one study where liquid semen samples were used only one out of 176 samples returned a negative result

[22]. However when these samples were kept at 37°C for 48 hours 169 out of 176 samples returned positive. Liquid seminal samples that are kept at high temperatures might degrade after time. When testing these samples the conditional probabilities for older

samples might be different. However, the results of this study have not been replicated. Additional research is needed for a more generalized statement about PSA degradation in liquid seminal samples. A conditional probability for fresh liquid semen samples has been included in Table 7.

Test sensitivity with seminal stains is generally reduced compared to liquid samples because of the non-perfect extraction efficiency (<100%) of stains. In a study of Denison et al. (2004) [23], semen stains gave a positive result nine out of ten samples and in another study five out of five stains were tested positive [24]. On a large timescale, fabric seems to be a very stable matrix to store semen in (room temperature): positive results have been reported in ten stains ranging from 2 to 30 years old (n=10) [21]. Even five stains with ages ranging from 33 to 56 years old all tested PSA positive with the PSA test [25]. [16,20,21,26-36]12

A large proportion of the forensic casework with semen concerns vaginal swabs. Because semen persists only for a limited amount of time in the vagina, the reliability of the PSA test with post-coital swabs is strongly dependent on the time that has elapsed between ejaculation and sampling. In a study where

1 Prostate carcinoma patients 2 Value obtained directly after birth

(14)

Table 7

Overview of the literature study into the PSA test and the resulting conditional probabilities for the network.

Cell type positive/ total

samples

Pr(PSA test+|cell type) Lower bound 95% CI Upper bound 95% CI

Semen (liquid) 344/352 0.977 0.956 0.990 Semen (stain) 29/30 0.967 0.828 0.999 Blood 0/27 0.127 0 0.127 Saliva 0/48 0.0741 0 0.074 Sweat 0/33 0.1061 0 0.106 Vaginal secretion 2/175 0.011 0.001 0.041 Breast milk 1/18 0.055 0.001 0.273 Urine (male) 55/96 0.573 0.468 0.673 Urine (female) 1/55 0.018 0 0.097 Feces 0/30 0.1161 0 0.116

females were exposed to varying amounts of their partner’s semen, the elevated levels of PSA were found to decline rapidly over time [35]. Concentrations were still elevated at 24 hours but after 48 hours base-line levels were reached. Exposure to higher amounts of semen was detectable for a longer time. Since in this study only 1 mL of semen was used and the volume of male ejaculate can range up to about 11 milliliters (mean = 3.2 mL) [37], in actuality the elevated PSA levels could persist in the vagina for longer times. This view is supported by a study where PSA was on average still detectable at 27 hours post-coitus [38]. When post-coital swabs were tested for PSA, the amount of positive reactions decreased over time. No negative reactions occurred in the 0-6 hour segment (n=18) and at the 60 hour mark no more positive reactions were observed [39]. In one study a weak positive reaction from a post-coital swab was still observed 70 hours after intercourse [40]. The probability of observing a positive reaction with a post-coital swab could be modeled as a function of time, approaching a base-line probability at around t=70 hours. This base-line should

correspond with the false positive rate of vaginal fluid, since after this time the PSA originating from semen has disappeared. The relation between time and test reactivity might look like Figure 8, which is based on the study of Benschop et al. (2009) [39]. The probability on a positive test seems to follow a roughly linear decline. However due to the outlier (at 48-60 hours) and the small sample size of individual time categories it is not possible to make a regression model. Additionally, the probability of a positive test is likely strongly dependent on case circumstances (did the victim wash, sampling technique, was the victim alive etc.). The probability decline trend in Figure 8 is confirmed by a study of Laffan et al. (2011) [24] which demonstrates a similar decline.1However at this time a strictly statistical approach seems too ambitious. The probability of a positive test with postcoital swabs will have to be decided on a case to case basis, based on case circumstances and elapsed time.

(15)

Figure 8

Vaginal swabs (n=86) which were obtained varying amounts of time after intercourse were tested with the Seratec® PSA kit. Each data point corresponds to the probability of a positive test result and has a 95% confidence interval.

In a similar study with samples simulating oral sexual assault, vomit samples were created by adding semen to gastric juice [41]. PSA was detected in neat samples and samples that were deposited on a ceramic plate after incubation up to 4 hours (n=54). However PSA was detected in samples which were spotted on cotton cloth up to only 15 minutes of incubation time. Neat samples can likely be regarded as diluted liquid semen, but with stains a false negative result should be expected.

3.3.2 Cross reactivity of the PSA test

An overview of the cross reactivity of the PSA test is displayed in Table 7. Below we will discuss in more detail when cross reactivity might occur.

Male urine

High concentrations of PSA are present in the urine of adult males. With a maximum reported value of around 800 ng/mL [29] there is a large overlap with the range of PSA concentrations found in semen, making the interpretation of samples that might contain urine highly complex. The manual of Seratec® [16] advises to dilute a positive sample 200 times to differentiate between urine and semen. A semen sample would still give a positive result, urine a negative result. However since the amount of material is difficult to estimate in

forensic samples, such a control would not give a definite answer. The PSA concentration in the urine of young children is low or absent. This is still the case with 10 and 11 year old boys (n=19) [29]. However, when boys enter puberty this changes and at after the thirteenth year PSA is at adult levels and neat urine samples regularly give positive test results even when diluted 100 times [42]. If a young boy is the possible donor of a sample, an altered conditional probability for urine should to be used. This can be implemented similarly as in Figure 6, but with an altered conditional probability for urine if the person of interest is present.

Urine that is deposited on fabrics usually gives positive results [23,29] unless diluted [29] or only a very small amount (25 mm2) of fabric is used [43]. It seems the reactivity of male urine is strongly dependent on the amount of material that is used with the test. Moreover, different sample preparation protocols garner different results. For adults the probability that male urine gives a positive PSA test seems to be strongly dependent on whether the sample is diluted. Neat samples give positive results almost invariably, while stains and diluted samples have varying results. Test results from samples that could contain male urine can be very difficult to interpret because of the unpredictable test behavior. Here it was decided to have a single category for male urine. This is a crude approach; one could also inform probabilities for separate trace types (liquid, stain etc.), however the available data is too limited for this.

Vaginal fluid and female urine

PSA concentrations in vaginal secretions are reported to be around 0.43-0.88 nl/mL on average [35]; maximum concentrations are reported up to 5.0 nl/mL

[35]. Incidental detection of PSA in semen free vaginal samples could be explained by prostatic tissue that is present in the upper and lower female genital tract [44]. The PSA concentration in female urine is of similar height: average reported values range from 0.29 [31] to 3.72 [33] nl/mL. Consequently the PSA concentrations that can be encountered in vaginal fluid and female urine may be detected by the PSA test based on its limit of detection. This is confirmed by a positive reaction

(16)

with these female body by one PSA-high individual during or three days prior to the menstrual cycle [23]. This may be due to the fluctuation of PSA levels with the menstrual cycle. PSA levels in serum were shown to be produced in a cyclical manner with the highest concentrations occurring during active menstruation

[45]. Whether this is also the case with urine and vaginal fluid has not been confirmed. Furthermore, androgens levels peak during active menstruation, correlating with the aforementioned cyclic PSA levels. Serotonin-reuptake inhibitors, which are commonly prescribed to post-menopausal women as medication for stress disorders, have been shown to increase specific androgen levels in women [46]. That serotonin inhibitors might cause PSA levels to rise above the limit of detection of the test due to the correlation between increased androgen and PSA levels was hypothesized, but has not been demonstrated yet [23]. Active menstruation and serotonin inhibitor medication with a person of interest should be considered as warning signs of a possible false positive. The higher probability of a positive test result can be indicated in the BN by using the method that was explained in figure 6. Unfortunately the exact effect of these two situations is not clear from present literature. Presently we will not differentiate by these criteria.

Blood and saliva

Reported concentrations of PSA in blood and saliva are generally very low (<1 ng/mL) [27,47]. Male blood can have higher concentrations up to 4 ng/mL [16], but because samples are diluted during preparation no instances of false positives have been reported. A confounding factor to watch out for is blood from prostate cancer patients which can have high PSA levels (<200 ng/mL) [16]. Modeling the altered reactivity of these patients can be done with the method that was discussed previously in Section 3.2 (Figure 6).

Breast milk

Directly after birth very high PSA concentrations in breast milk are possible (up to 2100 ng/mL), but this rapidly declines to lower levels after a few days [20]. Most of the breast milk samples contain around 1

ng/mL PSA but large individual differences were observed. One false positive result was reported in the examined literature out of eighteen samples in total.

Feces and sweat

PSA has never been detected in feces and sweat. 30 and 33 samples respectively were all tested negative with the PSA test.

Other untested cell types

It cannot be excluded that another (untested) cellular material also is reactive with the PSA test. For the purpose of covering all possible situations, an <other> state is included in [Cell type(s) in trace]. It is however impossible to determine the probability that the PSA test (or for that matter, any test) is reactive with an ‘other’ cell type. This probability should be very close to zero if all known false positive cell types are included as separate states in the network. However if some are left out, the ‘other cell type’ category becomes significant. If the forensic practitioner decides that the probability of an ‘other’ cell type does not significantly differ from zero the category can be left out of the network. However it is then strictly speaking not possible to make an exhaustive set of propositions concerning the presence of cell types. Therefore a subjective estimation of this probability is recommended. If the posterior probability for an ‘other’ cell type is significant this should be a clear indication that a cell type that may give a false positive result does not have a separate state in the network. If a probable cell type within the ‘other’ category has been identified it should then be entered as a separate state in the network.

3.3.3 False positives and false negatives

A study of Khaldi et al. (2004) [22] using Vedalab® PSA tests showed quite a large amount of false positives in control samples containing water: 6 out of 102 samples were positive. This would indicate a relatively high conditional probability in the <None> (no cells) category. These results show a low reliability of this test, but they are at odds with other studies, since very

(17)

large sets of body fluid samples (e.g. 130 vaginal fluid samples) were tested negative. Unfortunately other studies do not test a large amount of water samples and the effect has not been replicated. Water samples are preferable as a negative control since they do not contain any PSA and have a neutral pH. Consequently a positive reaction with a water sample has to be explained by a defective test, demonstrating the base-line reliability of a test. The water control group is used to infer the probability of a positive PSA test when no cells are present, i.e. the state <None> in [Cell type(s)

present in trace]. Laboratories using the Vedalab® brand

of PSA test ought to do additional research to verify the test’s reactivity with control samples.

Internal validations performed at the Seratec® laboratory showed that the test result can be influenced by pH values of the sample material. Only low pH (< 5) values of organic acids such as citric acid can cause the false positives which can usually be recognized as a false result because the test line is spotted or not formed uniformly [16]. Extreme pH values in general can disrupt the antigen-antibody interactions by weakening of hydrogen bonds. The manufacturer recommends adjustment of the pH to a neutral value.

Of various brands of condoms with lubricants and contraceptives only a condom with the nonoxynol-9 spermicide gave a positive test result in one study [48], though this was contradicted by the results of other studies [40,43]. In one study liquid bleach and Delfen® contraceptive foam gave weak positive results; however the positive results could not be repeated [23]. Overall some caution with the interpretation of sampled condoms seems warranted because in casework the brand of condom is usually unknown.

No false positives with the following animals were reported: dog, pig, horse, cat, chicken, cow, sheep and various microorganisms [21,23,43,49]. While the PSA Seratec Semiquant seems to be human specific, no experiments with ape semen have been performed which is a notable oversight. Finally mixtures of different body fluids could demonstrate new reaction behavior. Semen was tested together with other body fluids in various mixtures of two and three different substances. The detection of semen does not seem to

suffer from interference in these mixtures

[23,40,42,43]. However, in one instance when semen was mixed with caustic soda a false negative result was obtained. [24] This could be explained by the extremely low pH of caustic soda (NaOH). It is not entirely clear how a mixture of body fluids affects the reactivity of semen, since neat semen can also produce a false negative result occasionally. The results of these studies support the hypothesis that the ‘most reactive’ body fluid in a mixture determines the reactivity of the whole sample. This is at odds with the hypothesis that through dilution the mixture obtains a new PSA concentration which then determines the probability of a positive test. A third hypothesis would be that the test can react with either fluid in a mixture, causing a higher reactivity in mixtures. At this moment it is not clear which hypothesis is true. The latter two hypotheses have strong implications for the network, since every body fluid combination would have a distinct conditional probability in the network. Moreover, the test reactivity would be dependent on the relative amount of each body fluid in the mixture. Since this would require a more sophisticated model and since there is some evidence for the first hypothesis, we have chosen to let the most reactive body fluid determine the conditional probability of the mixture. For instance, a mixture of blood and saliva has a 0.127 probability to produce a positive test in the network, since 0.127 is the higher of the two individual probabilities.

3.4 Further literature research

A similar analysis was performed on the literature on RSID semen (Appendix III) and cytological semen analysis (Appendix IV).

3.5 Discussion

In the previous sections a data driven approach was used to define the conditional probabilities for the [PSA

test] node, surmised in Tables 7. The PSA test is by no

means specific for semen. A biologist would observe that some conditional probabilities are too high based on what one would expect, e.g. Pr(PSA test+|blood) is

(18)

estimated at 0.127 while blood usually contains extremely low PSA concentrations. This is even more so with the RSID semen test (Appendix III). However one has to be very careful when using one’s expectations to define the probability since there might be some (unknown) mechanism that occasionally causes a false positive reaction with blood. 27 negative blood samples in the entire body of literature is rather scarce, especially since blood is frequently encountered in casework. However the (high) probability value of 0.127 should not be taken as the true value either. With more experimentation we expect that this value will be lower, that is why it is important to continue to test known cell type (combinations) on commonly used tests. Laboratories might also possess a large volume of unpublished experiments. If these were to be pooled many additional experiments might not be needed. Alternatively it might be decided to subjectively alter the probabilities in the network. However extreme care should be taken when doing so because this defeats the purpose of a data driven approach and might lead to probabilities which are biased towards the expectations of the operator. For present purposes we adhere to a rigid interpretation of the data.

We will examine if our uncertainty about the conditional probabilities has a large impact on the posterior odds with a sensitivity analysis (Section 8). The cell types associated with impactful conditional probabilities can then be targeted for future research. This will be essential for any later validation of the network.

3.6 Case example 1

A stained rag is secured at a crime scene and sent in for investigation. The police requests a full investigation regarding body fluids present on the rag, in particular semen. A tetrabase test, RSID saliva and PSA test are performed in the lab. The PSA test returns positive while the other tests are negative.

Figure 9

BN for cell type interpretation in the case example .

The BN which is constructed for this case (Figure 9) contains states for the cell types blood, saliva, semen and ‘other’. The posterior probability distribution for [Cell type(s) in trace] is given in Figure 10. It is clear from the distribution that there is a strong indication for the presence of semen while saliva and blood are likely absent. The possibility of some other kind of cellular material cannot be excluded. The presence of semen is further evaluated with two competing propositions:

Hp: The sample contains semen

Hd: The sample does not contain semen

An additional hypotheses node is entered into the network as a child of [Cell type(s) in trace]. The conditional probabilities of <Hp. The sample contains

semen> are equal to 1.0 if the cell type (combination)

contains semen. <Hd. The sample does not contain

semen> contains conditional probabilities which are the

inverse of the other state. The prior probability distribution in the network is left unaltered (see Section 8.3 for a discussion about the prior odds). The posterior probability of proposition one is 0.953. Therefore the probability that the sample contains semen is 0.953.

(19)

Figure 10

Posterior probability distribution of [Cell type(s) in trace] if the evidence of a positive PSA test and a negative RSID saliva and tetrabase test is entered into the network.

(20)

Table 8

Example of a probability distribution for a DNA profile that matches a male suspect and a female victim. There is no indication of any more contributors in the profile.

No donor Suspect (m) Victim (f) Unknown (m) Unknown (f) Suspect (m) + Victim (f) Suspect (m) + Unknown (m) Suspect (m) + Unknown (f) Victim (f) + Unknown (m) 2 Unknowns (2m) 2 Unknowns (2f) 2 Unknowns (m+f) >2 contributors

4. Bayesian DNA profile interpretation

4.1 Implementing DNA profile interpretation software in a Bayesian network

The evidential strength of a mixed DNA profile is traditionally reported as the likelihood that one or multiple person(s) of interest are contributor(s) to the sample. While the possibility of a fortuitous match is evaluated in this manner, not all possible situations are covered. This is usually unnecessary since observing a full DNA profile is extremely improbable given some situations (for instance, ‘the mixed profile is due to two unknown individuals’). However due to the structure of our network all possible situations have to be considered. Moreover in partial profiles these situations can become significant. Therefore a DNA profile is interpreted as a probability distribution which describes the exhaustive set of propositions pertaining to the DNA profile; that is, all possible donor combinations. In the distribution each possible contributor has two attributes associated with it: first to indicate whether they are the person of interest (Suspect or Victim), an

Unknown person, or not present; second to indicate the

sex. An example of an exhaustive set of donor combinations is given in Table 8. The probability of observing a full NGM profile given most of these states will be very close to zero. To avoid an infinite list of states, these can be collapsed into a single category (e.g. “>2 contributors”). If it is probable that three or more donors are present, “>2 contributors” can be split into sub-hypotheses. The probability distribution is implemented in a BN as states of [Individual(s) present

in trace] (Figure 11).

It is possible to utilize DNA interpretation software to determine the probability distribution for [Individual(s) present in trace]. A lot of research has been invested in the probabilistic interpretation of DNA. Sophisticated models for DNA interpretation such as LRmix [50] or TrueAllele [51] have become commonplace. These models have eliminated the need for a stochastic threshold and are able to account for stochastic events such as drop-out, drop-in, peak height imbalance and stutter. With these kinds of models the

states of [Individual(s) present in trace] can be calculated. For example, the software is used to calculate the probability of observing a DNA profile, given that the suspect and victim are the donors of the trace. Once the required distribution is obtained it can be entered into the network in a child node of [Individual(s) present in trace]: [DNA profile interpretation software] (Figure 11). The distribution is

entered as conditional probabilities in one state (e.g. <Probability distribution from software>) which is then set as observed in the network, while the other state (e.g. <Complementary>) contains probabilities which are defined as one minus the probability of the other state. The probabilities in <Complementary> are not relevant for calculations in the network but are needed for technical reasons (so columns sum to one). The conditional probabilities in <Probability distribution

from software> will be used for calculations. An

example of the NPT of [DNA interpretation software] is in Table 9.

(21)

Table 9

Example of a NPT containing the probability distribution of a single DNA profile matching a suspect, with a random match probability of 0.05. [DNA interpretation

software]

[Individual(s) present in trace]

<No donor> <Suspect> <Victim> <Unknown> <>1 contributor>

<LRmix> 0 0.95 0 0.05 0

<Complementary> 1 0.05 1 0.95 1

Table 10

Node probability table of the RMP nodes. Population frequencies were adopted from Westen et al [52]. [Unknown

individual?]

[Allele frequency of the unknown individual]

<10> <12> <13> <21> <None>

<Yes> 0.00024 0.00048 0.00192 … 0.00072 0

<No> 0.99976 0.99952 0.99808 … 0.99928 1

Figure 11

Implementation of DNA interpretation software in a BN.

4.2 A network for DNA interpretation

It is also possible to interpret the DNA profile in a BN. A simple model for DNA interpretation is presented in this section (Figure 12). In this network it should become transparent how random match probabilities can be used to obtain a probability distribution of [Individual(s) present in trace]. We will use this model in case examples to demonstrate how DNA profile interpretation can function in a Bayesian setting. This network is able to analyze the DNA profiling results of one locus (vWA) and of the sex marker (Amelogenin). Because of the modular structure of the network it can be expanded to include multiple STR loci in a particular PCR amplification kit such as Identifiler®.

The model in Figure 12 has the assumption that no more than two contributors are present. The node

probability tables of this network are case specific and in this particular example a profile matching a female victim (vWA genotype 16/17) and a male suspect (vWA genotype 15/16) is used. The root node [Individual(s)

present in trace] contains the states listed in Table 8.

[Individual(s) present in trace] is then split up into four divergent nodes, describing contributors #1 and #2 and the sex of contributors #1 and #2. The evidence of the peaks of the profile is entered in the bottom-most numbered vWA nodes for the vWA peaks and in the rectangular [AMEL X] and [AMEL Y] nodes for the sex loci. These have a probability of 1 in state yes if the relevant allele of either of the two contributors is present (or if both are present). The numbered vWA nodes are then to indicate whether an allele is donated by a single contributor. An allele can be donated by an individual if it matches the genotype, or it can be donated by an unknown person with the probability of a random match. The vWA genotype of the suspect is 15/16, hence the probability of [VWA 15, #1] given the suspect is equal to one, assuming a drop-out probability of zero. Similarly, the probability of [VWA 19, #1] given the suspect is equal to zero, assuming a drop-in

Referenties

GERELATEERDE DOCUMENTEN

In case a rear-end collision does occur, the head rest needs to be adjusted at the right height (the top of the head rest must be on equal height with the top of the head), and

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

When reflecting on breastfeeding in the global context, the question arises: “Why is progress on improving the breastfeeding rate, and especially the EBF rate, so uninspiring?”

Usually, problems in extremal graph theory consist of nding graphs, in a specic class of graphs, which minimize or maximize some graph invariants such as order, size, minimum

In particular, we focus on three MAC methods: IEEE 802.11p, the proposed standard for medium access, standardized by the IEEE for Wireless Access for the Vehicular

geanticipeerd door een antwoord op de kritische vraag te geven, is: Is er geen reden om aan te nemen dat B alleen beweert dat product X wenselijk kenmerk Y heeft, omdat hij daarvoor

Nadat voorts nog is meegedeeld dat voor geïnteresseerden Leonhard-pincetten te koop zijn (geschikt voor het hanteren van kleine fossielen e.d.), sluit de voorzitter de vergadering..

In de voorgaande pilot, die Deltares voor RWS heeft uitgevoerd, stond RWS voor dezelfde keuze: ofwel een rechtstreekse mapping vanuit de interne database (DONAR) naar de WFS-