• No results found

Finding Distant Relatives Using Commercial Genetic Tests and International Genealogical Research DNA Databanks

N/A
N/A
Protected

Academic year: 2021

Share "Finding Distant Relatives Using Commercial Genetic Tests and International Genealogical Research DNA Databanks"

Copied!
30
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Finding Distant Relatives Using

Commercial Genetic Tests and

International Genealogical Research

DNA Databanks

Lydia J ochem s

University of Am sterdam , 19 Decem ber 2017

Cohort: 2016-2018

Supervisor: Prof. Dr. Ate Kloosterman

Co-assessor: Prof. Dr. Marjan Sjerps

Institute: Nederlands Forensisch Instituut (NFI)

Over the last decade, the number of genetic tests available to the public has been

growing. These DTC (direct-to-consumer) genetic tests have been of great help to

donor-conceived individuals and adoptees. By using these companies’ databases, they have

been able to find distant relatives and even their biological parent. In this paper, the

methods of three major companies are reviewed, the privacy and regulations concerning

DTC genetic testing and the use in the forensic field is assessed. The methods used by

different companies are quite alike. The size of the companies’ databases and the fact

that they keep growing, makes it attractive for the forensic field. Y-chromosomal and

mitochondrial DNA analyses are the most useful applications as they can provide deep

rooted information on distant relationships. However due to the quick arising, there

seems to be a lack of proper regulation resulting in privacy and ethical issues. The use

in forensics could therefore be promising but there is still a long way to go.

(2)

Table of Contents

INTRODUCTION 3

METHODS 4

FORENSIC RELEVANCE 5

THE SEARCH FOR BIOLOGICAL RELATIVES 5

THREE MAJOR COMPANIES 7

ANCESTRYDNA 7

FAMILY TREE DNA 10

23ANDME 12

OVERVIEW 15

PRIVACY AND REGULATIONS 17

POSSIBLE USE IN FORENSICS 19

DISCUSSION AND CONCLUSION 22

REFERENCES 24

LITERATURE 24

WEBSITES 27

APPENDIX 28

(3)

Introduction

Over the last decade, the number of publically available genetic tests has been growing and so did the companies’ databases (Harper et al., 2016; Phillips, 2016; Roberts & Ostergren, 2013). These types of genetic tests are also called DTC (Direct to Consumer) genetic tests. They are advertised as tests to find out about your heritage, family history, ethnicity, chances of certain diseases and many others are emerging. One can easily purchase a test online, receive a test kit at home and only need to send the samples back to the company for it to be tested. Some dubious tests are becoming available as well, like ‘infidelity’, ‘talent’ and ‘nutrigenetic’ tests (Phillips, 2016). The public has been very enthusiastic about the developments in commercial genetic testing. Some people even see it as an individual’s right, to get access to their own genome (Popovsky, 2010). The tests are getting cheaper as well, compared to the early days of the commercial availability, the price has already decreased ten times (Roberts & Ostergren, 2013). The easy accessibility has made it interesting for the common man. According to Gollust et al. (2012) the primary reason to purchase a genetic test is because of curiosity and health risk issues.

Donor-conceived individuals have been showing an increased interest in commercial genetics (Crawshaw et al., 2016). This is because they want to obtain missing medical information for example. Other reasons are, to find their donor and/or donor-siblings (individuals conceived by the same donor, which are in fact only half-siblings). Already in 2005, an attempt by a 15-year-old boy has shown promising results (Motluk, 2005). By using a Y-chromosome test (by Family Tree DNA), he found others who were likely to be from the same paternal line as him. These ‘matches’ provided him with a surname, together with the limited information he got from the donation clinic (the donor’s birthday) and another online search tool, he eventually found his biological father. There are more stories about searches like this, but the exact number of success stories is unknown (Harper et al., 2016).

Tracking down a sperm donor was already possible in 2005. With the developing technologies, one can only imagine how easy a search like this will be now, more than a decade later? The availability of these tests has many benefits, like just mentioned, but it also comes with some risks. The risks are that the lay public interprets their own results of something involving scientific research and additionally one’s privacy can be violated (Majdik, 2009). Therefore, Harper et al. (2016) calls it “The end of donor anonymity”.

Recently a Dutch women also managed to find her biological father using DTC genetic tests (Stoffelen, 2017). She sent in samples to three companies, Ancestry DNA, Family Tree DNA and 23andMe. She found a match with an Australian woman, who appeared to be a second cousin of her biological father. He was traced with the help from a family investigator, who generated a family tree from the Australian match. Examples like this and the example of the 15-year-old boy mentioned previously, show that distant relatives can provide enough information to find closer relatives as well.

In this review, this topic will be investigated concerning commercially available tests, the privacy issues that come with it and if this can be used by forensics. First a short introduction will explain why people would search for their relatives, mainly addressing the case of donor-conceived individuals. This angle is chosen because recent news items regarding the search by donor conceived, were the trigger for writing this literature review and assessing the applicability in forensics. Then, some of the major companies on the market will be described and critically reviewed on their methods. Next, the general privacy issues of DTC genetic testing companies will be discussed. As a final step this information will be used to assess the options for the forensic

(4)

field. All these topics will be addressed concerning the subject of finding distant relatives. The research sub-questions addressed in this review are therefore:

 How do DTC genetic tests work?

 How are privacy and regulations handled in this field?  Can the databases be used in forensic investigations?

The sub-questions will not be specifically answered in a sentence. They are investigated and the resulting information is used to ultimately answer the main research question in this review, which is:

 How can DTC genetic tests be used to find (distant) relatives and can this be used in the

forensic field?

In this investigation my hypotheses are as follows: different methods can be applied to investigate this subject, but the basis would be the same; the methods differ from those used in forensics, because the initial aim of the investigation is different; the companies provide proper privacy regulations for their customers to trust their services; and the companies could be of use in the forensic field, because success stories have shown they can be used to find distant relatives.

Methods

In this review, the three sub questions previously mentioned will be addressed concerning the main subject of finding distant relatives using commercial genetic tests and international genealogical research DNA databanks. This will be performed by first reviewing recent literature regarding the subject to get an overview of the information available. The way of organizing the research will also be determined based on this.

The literature will be searched for using the online University of Amsterdam (UvA) library database (http://lib.uva.nl/primo_library/libweb/action/search.do?vid=UVA, accessed during the months of November and December 2017). When investigating the methods of the different companies, the companies’ websites will be used as sources as well. The websites also provide the companies’ published (white) papers, which are relevant in this study. No start or end date of publication is used in the searches, because all the progress in this field has a very recent history.

When referencing, with the first mentioning the paper is referred to using APA style. The following sentences can be part of the information obtained from the same paper mentioned in the first sentence, until another reference is mentioned.

The exact used search queries and some explanations of them can be found in Appendix A. The literature found, white papers and information on the websites will be used to provide an overview regarding the current status of finding distant relatives using commercial genetic tests. The methods described by the companies will be critically assessed using relevant literature and self-obtained knowledge during the Master Forensic Science (UvA, Amsterdam).

(5)

Forensic Relevance

Commercial companies providing DTC genetic tests are getting more popular (Klotz, 2016). More use of these services also mean growing databases. The databases of these companies are getting quite large, as can be seen in Table 5 (Harper et al., 2016). These databases can be interesting for the forensic field, as they contain a lot of information on a large population. With the success stories of finding sperm donors through these databases, it shows its capability of finding relatives. Even very distant relatives can be identified via Y-chromosomal and mitochondrial DNA testing. These techniques have been applied by the forensic field as well, this is called a familial search.

If the forensic field would be able to access the commercial databases, they could try to find matches (or near matches, to perform a familial search) with their samples. This would be of great value in unsolved cases, where no matches are found in the forensic databases. Forensics could also make use of the family/relative finding tools of the companies, which could provide tactical information to help find the source of the crime scene sample.

In short, the forensic field would be able to compare a much larger population to their samples and make use of the aim to search for relatives instead of full matches. However, the methods used by such companies differ somewhat from the forensic methods (Murphy, 2010).

An overview of the methods used by the companies provides insight into the commercial field and could help determine how to use these databases in forensics if deemed possible.

The search for biological relatives

Infertility is a problem that occurs all over the world (Agarwal et al., 2015). The World Health Organization (WHO) defines infertility as follows “a disease of the reproductive system defined by the failure to achieve a clinical pregnancy after 12 months or more of regular unprotected sexual intercourse”. It appears to be affecting 8-12% of the couples worldwide but shows regional variability (Kumar & Singh, 2015). When infertility is determined by a doctor, a couple can choose several treatment procedures. The use of donor gametes is one of the treatment options, which is used in a great part of infertility problems (Harper et al., 2016). This is also called a ‘third-party reproduction’ and can be by a sperm donor, in case of male infertility, or an oocyte donor, in case of female infertility.

When gametes are donated, the donor can be either anonymous or temporary anonymous, depending on the country’s legislation. In the United Kingdom, for example, temporary anonymity is the national legislation (Klotz, 2016). According to this legislation the donor stays anonymous until the conceived child is 18 (or 16, depending on the country) years old. When this age is reached, the child can request information about its biological parent from the donor clinic. In the Netherlands and Germany donor anonymity legislation has changed to this temporal regime as well. In some European countries, like Belgium, France and Denmark, the anonymity of the donor is still protected by the law (Borry et al., 2014). Arguments for anonymity are, that it provides protection for the donor to not be obliged to financial, moral or emotional involvement.

In most countries, legislation also includes a maximum of children that may be conceived by one donor (Harper et al., 2016). In the United States of America there is no national legislation, the donor’s anonymity depends on the clinic’s legislation (Klotz, 2016). Most of these clinics provide anonymous donations (Ravitsky, 2012). Due to the lack of national legislation in the USA, there

(6)

are cases of donors fathering many children (Harper et al., 2016). The largest known group of donor-siblings consists of 150 people (Blyth, 2012).

Although many countries changed their legislation to the temporary anonymous regime, there are still many children conceived by anonymous donors from before this legislation change. In the Netherlands alone, there are approximately 40.000 donor children born before this change (https://fiom.nl/, last accessed on 9 November 2017). Many of these children try to find information on their donor and/or donor-siblings. On the internet, there are websites available which help donor-siblings find each other via the donor registration number, which is given by the clinics instead of a name to protect the donor’s privacy (Hertz & Mattes, 2011). Nowadays, with the growing number of available genetic tests for consumers, donor conceived children increasingly make use of DTC genetic tests for genealogy.

The companies providing these tests are not set up originally for seeking donors and donor-siblings. The general aim of these tests is to provide information about people’s ancestry or health (Harper et al., 2016). To determine genealogy, the Y-chromosomal and mitochondrial haplotypes are used (Borry et al., 2014). The autosomal DNA rises in its use as well to obtain ancestry information. This is then based on the genetic variation in different populations.

The rise of such genetic tests, available for the consumer, started off in a clinical context. These tests compare an individual’s genotype to known genetic markers for diseases (Popovsky, 2010). The tests would indicate a person’s risk of specific diseases. Of course, this could be very useful, but it may also lead to exaggerated conclusions by the user as they are not a scientific expert. Nowadays medical genetic tests are not the only ones available DTC, at the start of 2016 there were already 246 companies providing online DNA tests (Phillips, 2016). The purposes of these tests vary greatly and it is astonishing to see what kind of tests are out there already. There are tests on nutrigenomics, which try to provide you with a tailored diet based on your genes. Other companies claim to check your DNA on genes that tell you what is the best athletic training for your body specifically, which are called ‘athletic ability’ tests. Already four companies offer child ‘talent’ testing. They test on 46 traits which can estimate a child’s character. Also, quite some companies offer the so called ‘infidelity’ tests. They let the customer send in samples taken from bedsheets and/or clothing, to check their spouse for infidelity. These tests raise a lot of questions and are most of the time not properly validated (Phillips, 2016). However, it is clear there is a demand for genetic testing by consumers (Popovsky, 2010). The difference with the available genealogy tests is that these tests can be based on scientific research. Researchers are already investigating the human genealogy, so the basis of valid and accurate techniques is already there. Researchers have investigated what motivates the donor-conceived to find information about their biological relatives. Hertz and Mattes (2011) performed a survey among families of donor-conceived children (N=356) to gain insights in their motivation to connect with each other. The main reasons given by these families are, because they are interested in:

 The characteristics that the donor-siblings might share  The possibility of an extended family

 Knowing them in case of medical needs  A relationship with the donor’s other children

The general drive behind this search is the need of knowing one’s identity, which is a psychological process many donor-conceived individuals go through (van den Akker et al., 2014). Most donor-conceived also feel like it is their right to know this information. Studies even indicate

(7)

it is important for a person’s well-being to know about their biological parents or origin (Hoglund-Shen, 2017).

Three Major Companies

In this section, the methods of three major companies will be described and critically reviewed. Not all companies, providing DTC genetic tests, are mentioned here because of time management. These companies are chosen to be discussed, based on their size and usage. The aim is to highlight the largest and most used companies, based on mentions in the relevant literature (Crawshaw, 2017; Harper et al., 2016; Hoglund‐Shen, 2017; Phillips, 2016; Popovsky, 2010)

AncestryDNA

At AncestryDNA, saliva samples are tested on the autosomal DNA using microarrays (https://www.ancestry.com, last accessed on 18 December 2017). Their service is available in 35 countries all over the world. They own a database consisting of over 6 million customers, in which family names can be searched as well. This can already provide genealogy information without DNA testing, by just searching for a surname. However, offspring of anonymous donors (most of the time) do not know their biological parent’s surname, which is part of the reason they search using DNA tests.

The genome is analyzed on more than 700.000 markers (https://www.ancestry.com, last accessed on 18 December 2017). These markers are Single Nucleotide Polymorphisms (SNPs) and indels. They use the data from more than 6 million people that are in their database and population data of more than 150 global regions to provide the customer with an ethnicity estimate. They use autosomal DNA, because this does not limit the result to the paternal or maternal line and makes it as effective for men as for women. This sounds more as a commercial choice rather than a decision based on effectiveness, because the Y-chromosome (for the paternal line) and the mitochondrial DNA (for the maternal line) could provide much deeper rooted ancestral information than the autosomal DNA (Kayser & De Knijff, 2011; Wei et al., 2013). When you order a test at AncestryDNA, you get a sample kit send to you which gives you instructions on how to take the sample, package it and send it back to them (https://www.ancestry.com, last accessed on 18 December 2017).

When ordering a test, you have to choose from different types of memberships with different prices. If you would only want to make use of on type of test, this is equal to buying a one month membership. The memberships available are: “U.S. discovery”, which only gives access to the US data for comparison; “World Explorer”, which gives you access to their worldwide data for comparison; and “All Access”, which provides some extras (https://www.ancestry.com, last accessed on 18 December 2017). The extras include access to fold3.com, containing more than 500 million military records, and access to newspapers.com which is a database of more than 200 million historical newspapers. Both websites are companies owned by ancestry.com. They can provide additional interesting information about your family’s past, when available. So, this company mostly aims at people who are investigating their ancestors’ past.

The specific methods for ancestry determination will not be discussed, because we are more interested in the matching tool. This can actually help find distant relatives.

(8)

When searching for relatives, the matching tool can be used (https://support.ancestry.co.uk, last accessed on 21 November 2017). Your DNA is compared to that of the other users and this is updated regularly because more people are added to the database. The relationship with the match is determined based on the amount of DNA shared. This is expressed in centimorgan (cM), which indicates the chance of recombination occurring (Deza & Deza, 2006). The larger the stretch of DNA, the bigger the chance of recombination occurring. When two people share this large stretch, this must be due to a certain relatedness. AncestryDNA determines the degree of relatedness based on the number of shared centimorgans. Table 1 explains how centimorgans are translated into a possible relationship. As you can see, the ranges of centimorgans for some relationships are missing some distances and the gaps between the ranges get bigger and bigger. When sharing 2200 cM of DNA you cannot determine whether the relationship is full sibling or grandparent/aunt/uncle/half-sibling. These ranges were established by analyzing the genetic data retrieved from pairs of individuals with a known relationship (Ball et al., 2016). Additionally, the number of meiosis occurring between two individuals is taken into account. The more meioses occurring between two individuals, the more variation is present in the DNA they inherited from their common ancestor. Therefore, the number of meioses are compared to the amount of shared DNA due to common ancestry. This is called ‘Identical by Descent’ (IBD).

Using 24362 customer’s genotypes, sharing less than 20 cM IBD, reproduction events were simulated (https://support.ancestry.co.uk, last accessed on 21 November 2017). The simulated relationships range from child to 10th cousin. The results show that, the more distant the relation the less IBD and more variation in IBD is observed when the relation is separated by more meiosis events. This means the relation estimate comes with more uncertainty when the relation gets more distant. Therefore, they provide the customer with a range of the relationship, for example ‘3rd to 4th cousin’.

Table 1. How centigmorgans are expressed in a family relation by AncestryDNA (edited from: https://support.ancestry.co.uk, last accessed on 21 November 2017 ).

Shared amount DNA in centimorgans (cM)

Possible relationship

3475 Parent, child or identical twin

2400 – 2800 Full sibling

1450 – 2050 Grandparent, aunt, uncle, half-sibling 680 – 1150 1st cousin, great-grandparent

200 – 620 2nd cousin

90 – 180 3rd cousin

20 – 85 4th cousin

6 – 20 Distant cousin

After predicting the relationship based on IBD, this is combined with IBD2 data to provide a more accurate estimate of close relationships. IBD2 is an estimate of the proportion of IBD in the genome (https://support.ancestry.co.uk, last accessed on 21 November 2017). This is used because close relations (parent-child, twins, siblings and half-siblings) show a large overlap in the IBD that is shared, but together with the IBD2 data almost 100% accuracy can be achieved. This explains the gaps in the ranges of Table 1 as mentioned earlier. When looking at their IBD2 over IBD data, there is still an area of overlap between full-siblings and half-siblings. This shows that this distinction is the hardest to make and that this estimate can never be achieved with 100% accuracy.

Because the AncestryDNA database keeps growing, comparison needs to be performed continuously (Ball et al., 2016). This means that after using their services once, new matches can occur even years later. AncestryDNA developed its own software for the comparison. Their

(9)

software is called J-GERMLINE, adapted from the GERMLINE software from Gusev et al. (2009). The difference between the two software programs lies in the fact that GERMLINE detects IBD in all samples at the same time and J-GERMLINE is designed in such a way that it compares the samples in the database to the newly entered samples in a step-by-step manner. Ball et al. (2016) have shown that for larger databases the processing time increases more rapidly when using GERMLINE instead of J-GERMLINE.

Before the comparison can start, genotype phasing needs to be performed (Ball et al., 2016). This assigns the analyzed SNPs to a chromosome. Phasing is performed by the Underdog algorithm, adapted from the BEAGLE algorithm. By comparing runs with test sets, AncestryDNA shows that their adapted version works more accurately.

Then, each chromosome is divided into segments of 97 SNPs these are called “windows”. For each individual, these windows are compared. When all SNPs in the windows match between individuals, this is called a “seed match”. This match is tried to be extended as far as possible in both directions on the chromosome. The match ends when a homozygous mismatch occurs or when the chromosome ends. Then the length of the matching segment is measured in genetic distance, centimorgans. If the “seed match” has a distance more than 6 cM, it is reported as a match. This cut-off is chosen because a lower cut off would result in too much data to store, as decreasing the length results in an exponentially increasing number of matches, and the accuracy decreases with a decreasing length. Ball et al. (2016) show with an experiment that with a lower IBD cut off point, the results are less accurate.

The detection of matches is based on the degree IBD (Ball et al., 2016). However, this similarity may be due to other factors like selection pressure (Albrechtsen et al., 2010). To be able to estimate relatedness from IBD segments in a correct manner, the informativity of the segment is determined by an algorithm called “Timber” (Ball et al., 2016). The algorithm looks in each “window” at the IBD segments more than 6 cM and compares it to the reference panel of 325.932 genotypes. Most samples are expected to show low IBD in the reference panel, as most individuals are not closely related. If there is more IBD or even spikes of IBD at some locations between the sample and the reference panel, this could be explained by other factors such as demographics or migration patterns. If these high spiking IBDs can be explained by such factors, they are less useful in the relationship estimates. Therefore, the Timber algorithm looks at the unusually high matching rates and reduces their genetic distance. The altered distances are the “Timber scores”. Timber is only used for matches with an IBD < 90 cM. This cut off is applied because the algorithm only looks at the samples per window, so it does not consider IBD segments extended in multiple windows. When an IBD segment is spread in multiple windows it actually covers a great part of the chromosome, when this occurs one can be quite confident that this came from a common ancestor and it would not make sense to run it through Timber. Smaller IBD segments come with lesser confidence of descending, which is why the cut off point of < 90 cM is used. The algorithm therefore improves the estimates of very distant relations (5th cousin or more distant).

Through contacting your close matches, you can establish a family tree together. If the search gets too difficult, because your matches are too distant for example, AncestryDNA offers services of a professional genealogist (https://www.progenealogists.com, last accessed on 21 November 2017). They also mention this is suitable for adoptees searching for their biological family. The methods of the professional genealogists do not only make use of the DNA. They gather old pictures, newspapers, letters etcetera to get any clues on the family history. A donor conceived or an adoptee obviously does not have this kind of information available. How this service actually works in these types of cases is not explained and seems not possible.

(10)

Family Tree DNA

Family Tree DNA (FTDNA) provides an insight in your own genealogy, history and ancestry. The company is more unique by offering various types of tests. Their database consists of 918162 records (https://www.familytreedna.com, last accessed on 18 December 2017). When making use of their services, you can decide yourself if you want your DNA to be compared to the whole database or just to your own project of sent in samples.

When the DNA is compared to the database the matches come with a surname and an email address to be able to contact each other (https://www.familytreedna.com, last accessed on 18 December 2017). Another advantage of FTDNA is that, they are working together with National Geographic’s Genographic project, results from either of the two can be uploaded into each other’s databases for comparison. Both companies benefit from this, by enlarging their database on two fronts.

FTDNA offers three different types of tests, which will be explained thoroughly (https://www.familytreedna.com, last accessed on 18 December 2017). With all tests, you will get access to 4 tools. The first is “myOrigin”, which provides details on were your ancestors came. This is shown in the form of ethnicity percentages per region. “ancientOrigins”, shows your ancestor’s migration paths and how much DNA you still share with them. “Family Matching”, is the tool which compares your DNA to others in the database and providing match results. The last tool is the “chromosome browser”, which shows gives a more detailed look at matching segments with your result from the “Family Matching” tool. The latter is of course the most interesting when searching for (distant) relatives.

The “Family Finder” test, tests autosomal DNA which is used to compare to the database and provide match and ancestry information (https://www.familytreedna.com, last accessed on 18 December 2017). The test compares 696800 SNPs of the autosomal DNA on a microarray. Then is calculated how two people are related, taking the size and number of matching DNA fragments into account. When receiving the results of a comparison of two individuals, information is provided on the most possible relation between these individuals. This is more accurate for closer relations than for more distant relations. They consider a match to be IBD, when a DNA segment shows at least 500 matching adjacent SNPs of at least 1cM long. The genetic distance, in cM, is determined using data of the ‘International HapMap Project’. The results are divided into four categories of matches: immediate, close, distant and speculative. The categories are explained in Table 2.

Table 2. The categories of matches provided by Family Tree DNA, (edited from: https://www.familytreedna.com, last accessed on 13 November 2017 )

Category Includes

Immediate matches parents, children, full siblings, grandparents, grandchildren, half siblings, aunts, uncles, nieces and nephews

Close matches 1st cousins, 2nd cousins, 3rd cousins, half siblings, grandparents, aunts, uncles, nieces

and nephews

Distant matches 2nd cousins, 3rd cousins, 4th cousins and 5th cousins

Speculative matches 4th cousins, 5th cousins and remote cousins

The ranges of these categories show some overlap. The reason for this is that they show the most likely relations and the possible relations (https://www.familytreedna.com, last accessed on 18 December 2017). This range of relationships is investigated, because the more distant the relations are the more changes occur. This makes it harder to exactly pinpoint the specific relations, because the more distant the relation the higher the chance of similarity occurring by chance. So, when you have a close match which is most likely to be a 1st cousin, this also means

(11)

that 2nd cousins, 3rd cousins, half siblings, grandparents, aunts, uncles, nieces and nephews cannot be excluded as relation with the match.

The Y-DNA tests can be ordered in three different variants, “Y-37”, “Y-67” and “Y-111” (https://www.familytreedna.com, last accessed on 18 December 2017). The numbers reflect the number of Short Tandem Repeats (STRs) that are investigated during the analysis. This test will provide information on genealogy, history and ancestry. The database provides surnames of matches, which makes it possible to trace your paternal family origin. The more STRs used to investigate, the more refined the result will be. The results show the haplogroup of your Y-DNA, together with its migration pattern. The Y-DNA test includes “the SNP assurance program”, which assures you of any result. If they fail to deliver a good estimate (100% confidence) of your haplogroup using Y-STR, they offer to perform an additional SNP test. This sounds as a logical back-up plan, based on the fact that SNPs are more specific on geographical scale than STRs, so this could provide a better estimate of your haplogroup (Hammer et al., 2006). The Y-STRs mutations might be too recent to determine the haplotype, because they can be reversed over a timeframe of 200 thousand years (Rozhanskii & Klyosov, 2011). Y-SNPs have a lower mutation rate which can help determine deep-rooting haplotypes (Wei et al., 2013).

The Y-STRs can also be used to find matches with others in the database (https://www.familytreedna.com, last accessed on 18 December 2017). The level of the relationship is dependent on the number of matching markers and their genetic distance. FTDNA provides a table explaining when which relationship level is probable for the different Y-STR tests. This information is shown in Table 3. To determine your relatedness with a Y-STR DNA match, you need to look at the genetic distance. In this context, genetic distance is determined by the amount of non-matching STRs (not in cM). For example, when using the Y-37 test and 32 out of 36 STRs match, the genetic distance would be 4. This would fall under the category ‘probably related’. The company does not explain how these categories and their according genetic distances have been established. To obtain such information, a test population must have been investigated on their genetic distance.

Table 3. Expected relatedness with Y-STR matches based on the genetic distance of the match (edited from: https://www.familytreedna.com, last accessed on 18 December 2017)

Y-37 Y-67 Y-111

Very tightly related 0 0 0

Tightly related 1 1 - 2 1 - 2

Related 2 - 3 3 - 4 3 - 5

Probably related 4 5 - 6 6 - 7

Only possibly related 5 7 8 - 10

Not related 6 > 7 > 10

The mitochondrial DNA test can be ordered in two variants, “mtDNA Plus” and “mtFull Sequence” (https://www.familytreedna.com, last accessed on 18 December 2017). The “mtDNA Plus” investigates two hypervariable regions (HVR1 & HVR2) using SNPs. The HVRs are also commonly used in forensic context (Butler, 2009). They are suitable for such analyses because they are present in the ‘control’ region of the mtDNA. Here, the origin of replication is present, but no gene encoding DNA. An additional HVR, HVR3, can be used as well but this one is not mentioned at FTDNA. The reason for this could be that HVR 1 and 2 are providing enough information, as these regions are larger than HRV3, to determine the desired outcome of FTDNA (Butler, 2009).

According to FTDNA, when a match (individual with the same mtDNA haplogroup) is found on the HVR1 this means that you share common ancestor with that match, within the last 52

(12)

generations at a 50% confidence interval (https://www.familytreedna.com, last accessed on 18 December 2017). When a match is found in both HVRs this means that you share a common ancestor with that match, within the least 28 generations at a 50% confidence interval. However, they do not show any references of this data or explanation on what this is based. It seems logical that the combination provides information on more recent generations, because a larger matching region indicates less divergence.

The “mtFull Sequence” test also investigates the two hypervariable regions, but adds the coding regions. A match provided by this test means that you share a common ancestor with that match, within at least 5 generations at a 50% confidence interval. The 50% (of both tests) seems quite low, because there is as much chance of ancestry and non-ancestry. They also provide a 95% confidence interval for this analysis, which tells that you share a common ancestor with your match, within at least 22 generations.

The coding region can be used additionally to the two HVRs, because of its lower mutation rate it accumulates less variation (Kayser & De Knijff, 2011). In this way, the addition of the coding region could provide information about more deep-rooted ancestry.

No theory or white paper, explaining the methods and/or confidence intervals, were found.

23andMe

23andMe offers tests to determine your ancestry, find matches in their database and discover traits (https://www.23andme.com/, last accessed on 18 December 2017). It is one of the biggest and most well-known companies (Harris et al., 2013). It gained its popularity by providing medical reports, in which was shown which risk variants you are carrying (Hoglund-Shen, 2017). In 2013, this service was shut down by the Food and Drug Administration (FDA). Recently (April 2017) the company gained back some of its rights to provide medical insight to their customers (https://blog.23andme.com, last accessed on 29 November 2017). However, this is not the important aspect when searching for relatives. Although, it could support an estimated relationship when both individuals are carrying a rare disease allele.

Their database stores the genetic information of approximately 1.2 million individuals and keeps on growing as more tests are purchased (https://www.23andme.com/, last accessed on 18 December 2017). They analyze saliva samples sent in by the customer, which need to contain a minimal quantity of approximately 2 milliliters (Shah, 2014).

The ‘DNA relatives’ tool, which was launched in 2009, can be used to find (distant) relatives (https://www.23andme.com/, last accessed on 18 December 2017). This tool is optional and you need to actively opt-in to use the tool. You can opt-out any time you like.

To provide an estimate of ancestry and of the genealogical relation between two individuals, the autosomal DNA, the X and Y chromosome(s) and the mitochondrial DNA are analyzed using microarray genotyping. The autosomal DNA is analyzed on 110.000 to 880.000 SNPs. No explanation is provided on when which number of SNPs is actually analyzed. This is a bit dubious, because you would expect the same analysis is performed on every sample that is sent in. If this is not the case, there should be an explanation for this which is not provided by 23andMe. What they could mean by this, is that the analysis is performed on 880.000 SNPs, but not always all SNPs can be determined in a sample and that this therefore ranges between 110.000 and 880.000.

The SNP locations are based on the NCBI’s human genome assembly (https://customercare.23andme.com, last accessed on 8 December 2017). The X and Y chromosomes and the mitochondrial DNA are analyzed to determine haplogroups. For these analyses, SNPs are investigated as well. No specific numbers of SNPs are mentioned for the

(13)

different analyses. For the mitochondrial DNA is mentioned that the SNPs are located in the coding regions and in the hypervariable regions.

In the ‘DNA relatives’ tool, your profile is by default set to anonymous. When you want to get in contact with your matches, you need to request them to share information (Bettinger, 2016). Until then you can only see their sex, the estimated relationship and haplogroups.

23andMe defines relatives as individuals who share a common ancestor within 8 generations (https://www.23andme.com/, last accessed on 18 December 2017). Related individuals are detected if a region of at least 7 cM shows at least 700 matching adjacent SNPs. The percentage of shared DNA and the number of segments is used to estimate the relationship.

The company provides a probability of finding different degrees of relationships as shown in Table 4. It is logical that the more distant the relationship the lower the probability of finding them is, because the more distant the relationship the less DNA is shared. However, in exact numbers an individual has more distant relatives than close ones, which still would result in more distant matches.

Table 4. Probabilities of detecting certain degrees of relationships when using 23andMe (edited from: customercare.23andme.com, last accessed on 30 November).

Relationship Probability of Detecting

1st Cousin or closer ~ 100% 2nd Cousin > 99% 3rd Cousin ~ 90% 4th Cousin ~ 45% 5th Cousin ~ 15% 6th Cousin < 5%

The ancestry estimate is also taken into account when individuals are matched. This estimates which proportions of your DNA come from which of the 31 worldwide populations (https://www.23andme.com/, last accessed on 18 December 2017). This is performed by comparing your DNA to that of 10.418 references with a known ancestry. Most of these people are customers, who can show that all of their four grandparents were born in the same country, who agreed to participation in research. The other references are obtained from public data. Before the analysis starts the chromosomes are ‘phased’, which means it is determined on which chromosome the SNPs belong. For this 23andMe uses their own version of the BEAGLE software, adapted from Brian Browning. This software can estimate which chromosome originates from the same parent, based on haplotype frequencies (Browning & Browning, 2011). After phasing, the chromosomes are separated into windows of 100 SNPs. The described method is similar to that of Ancestry DNA, but instead of using 97 SNPs per window 23andMe uses a 100. For every window a population is assigned with certain confidence. The detected matches can be sorted and filtered, in this way you can find the matches with the relationship you are interested in.

Assigning the populations is performed by using the super vector machine (SVM) system. This system can be trained using the examples of the reference group and can then apply this trained knowledge to assign a population for every window of the analyzed DNA. It also performs ‘smoothing’ on each of the created windows. This means the assigned populations are analyzed and if a sudden unexpected population is assigned to a window, an error within a chromosome, this will be corrected. Another error that the ‘smoothing’ process corrects is, a switch error. This occurs when the system switches the DNA originating from the mother with that from the father, an error between chromosomes. 23andMe tells SVM is the best performing system of all that they have tested (but does not mention which they have tested). The system also works fast, which is

(14)

of great importance in a fast-growing database. It seems to be somewhat similar to the ‘Timber’ algorithm used by AncestryDNA, which also corrects for errors.

When testing the reference population, a few populations were overexpressed in the results. This shows it contains some systematic bias. Therefore, a re-calibration step is incorporated, so that the results assign each of the populations in proportion to their real occurrence.

The last step of the ancestry estimate is applying a threshold. 23andMe allows you to apply your own threshold, which can range from confidence level of 50% (speculative) to 90% (conservative). The result is shown in a “Chromosome Painting”, which is a graph showing the assigned ancestry for each segment of the chromosome.

The confidence of the estimates, of course, get more refined when known close relatives are added to the database and connect with you. This because the ‘smoothing’ remove errors more successful and phases of the chromosomes can be determined more accurately.

Two additional tools, ‘Relatives in Common’ and ‘Shared DNA’, can help to learn more about your shared family history (https://customercare.23andme.com, last accessed on 8 December 2017). Relatives in common shows the relatives you have in common with a match and their estimated relationship. It also indicates if these are from the same lineage or not. Shared DNA, shows which region of the DNA is shared with the relatives in common. The shared regions are shown as IBD. Half IBD shows the relative shared amount of the total DNA. This shared percentage that is provided is based on this half IBD. Combining all this information, can help determining a family tree and identifying a recent common ancestor.

Before reporting the results to the customer, the quality of the results is reviewed (https://customercare.23andme.com, last accessed on 8 December 2017). When reviewing the data, the call rate and accuracy are assessed. If the call rate of the whole analysis is below the threshold, the sample will be reanalyzed. What this threshold is, is not explained. The accuracy involves simple checks like comparing the reported sex to the genotyped sex.

On their website, white papers can be found, but these are all regarding their research and how they determine certain traits relate to diseases and phenotype (https://www.23andme.com/, last accessed on 18 December 2017).

(15)

Overview

Additionally to their own services, the companies also allow people to upload their raw data to other companies’ databases, to find out even more information (Crawshaw, 2017). Through this data sharing, one could find additional matches that are not present in the other company’s database. However, the companies do not use the exact same methods.

In Table 5, an overview is shown of the characteristics of the three different companies.

Table 5. An overview of the information on the different DTC genetic tests provided by different companies, showing their size (estimated number), result time, which samples are taken, the type of test, how they match individuals, the price of the test (the normal price without discounts), some options that are available and general privacy. This is based on the information available November-December 2017 (https://www.23andme.com/, last accessed on 18 December 2017: https://www.ancestry.com, last accessed on 18 December 2017; https://www.familytreedna.com, last accessed on 18 December 2017; Crawshaw, 2017; Harper et al., 2016; Hoglund‐Shen, 2017; Phillips, 2016; Popovsky, 2010).

When looking at Table 5, it can be observed that the general methods are the same for each company. For the autosomal DNA SNPs are analyzed. This makes sense, because they have a mutation rate which is 10.0000 times lower than in STRs and therefore can tell more about distant relations (Kayser & De Knijff, 2011).

The methods on which the matching of individuals is based, is also quite similar for every company. However, 23andMe is the only company (of the three mentioned here) that uses the ancestry estimate to match individuals as well. This provides additional support for the match. Looking at the tests of AncestryDNA and Family Tree DNA, they could make use of this way of matching as well. They are already investigating the ancestry, but treat this as an isolated result.

Company Database

size

Results in

Sample Type of test Matching based

on

Matching criteria

Costs Privacy

AncestryDNA ~6.000.000 6 weeks saliva autosomal

SNPs and indels IBD compared to simulations combined with proportional IBD 97 matching adjacent SNPs > 6 cM

$99 Opt- and opt-out options, default is open profile Family Tree DNA ~900.000 4-6 weeks buccal cells autosomal SNPs IBD determined by number of matching segments 500 matching adjacent SNPs > 1 cM $89 Data stored with personal identification, default is open profile 8-10 weeks Y-37 STRs Haplogroup estimate same haplogroup $169 8-10 weeks Y-67 STRs Haplogroup estimate same haplogroup $268 8-10 weeks Y-111 STRs Haplogroup estimate same haplogroup $359 4-6 weeks mitochondrial HVR1 & HVR2 Matching HVRs same haplogroup $89 4-6 weeks mitochondrial HVR1 & HVR2 & coding regions Matching HVRs and coding regions

same haplogroup

$199

23andMe ~1.200.000 6 weeks saliva SNPs of

autosomal, X and Y chromosome(s) and mitochondrial DNA

relative IBD and comparison of assigned ancestry

700 matching adjacent SNPs > 7 cM

$99 Opt-in and opt-out options and, default is private

(16)

The criteria of matching differ a bit for every company. Ancestry DNA considers a match when at least 97 adjacent SNPs match, with a genetic distance of at least 6 cM (https://www.ancestry.com, last accessed on 18 December 2017). Family Tree DNA considers a match when at least 500 adjacent SNPs match, with a genetic distance of at least 1cM (https://www.familytreedna.com, last accessed on 18 December 2017). 23andMe considers a match when at least 700 adjacent SNPs match, with a genetic distance of 7 cM or more (https://www.23andme.com/, last accessed on 18 December 2017). The latter seems the strictest criterion, with the highest number of matching adjacent SNPs and the highest genetic distance. This could make their matches more accurate, but some matches could also be missed due to this criterion.

The effectiveness of the criteria could be assessed with a training set of known relationships by estimating the relationship according to this criterion. In this way, the technique could be validated. One could expect that these companies did perform such a validation before applying the criterion, however this information cannot be found.

The number of matching SNPs and their genetic distance are both important to be used in a matching criterion. The genetic distance used at FTDNA seems a bit low, but the criterion for matching adjacent SNPs could be compensating for this. Finding the right criterion is all about the right balance between the number of matching SNPs and the genetic distance. The right balance should be determined by testing different criteria using a population of known relationships. This could, again, be performed by the companies but no information on such tests has been found. By combining the IBD segments with the criterion, the companies try to limit the chance of finding a random match.

The technique of using IBD segments has been shown to be a good estimator of relationships up to 25 generations (Browning & Browning 2012). However, the genetic distance (in cM) minimally shared between individuals of specific relationships, is different for every relationship. Therefore, it seems hard to determine the criterion that should be used to match individuals, as one would not know the relationship yet. This is probably why they use somewhat low genetic distances in their criteria, because different generations of relatives are taken considered when matching individuals.

The exact SNPs that the companies use could not be found. But the ISOGG (International Society of Genetic Genealogy) website provides insight on the overlap of the SNPs used by different companies. These numbers are shown in table 6. The bold numbers should then reflect the number of SNPs used in their tests. This does not reflect the same number that the companies advertise. The website also provides information on the overlap of X-chromosomal SNPs that are used in analyses. Here there are also results shown for AncestryDNA and for Family Tree DNA. However, from the information that the companies provide themselves it could not be seen that these companies were using the X-chromosome as well. This shows that there are some conflicting numbers presented by different institutions and that the exact numbers cannot be shown in this review.

Table 6. Numbers of overlapping SNPs used by the three companies. For this the latest version of their tests was used (edited from: https://isogg.org/wiki/Autosomal_SNP_comparison_chart, last accessed on 14 December)

Overlapping SNPs AncestryDNA Family Tree DNA 23andMe

AncestryDNA 641.908

Family Tree DNA 410.572 698.179

(17)

Privacy and Regulations

DTC genetic tests come with a lot of benefits, but they also raise a lot of questions regarding privacy issues. As explained previously, most companies are able to provide match estimates for very distant relations (up to 5th cousin and remote cousins). This also means, when you would only find a very distant relative and you have the means to establish a family tree, you could find closer relatives. This is the important part when a donor-conceived individual searches for its biological parent, because the relative you are looking for does not have to be registered in the database you are searching in to find them. These familial searches are based on the fact that relatives share DNA and that the closer the relation the more DNA is shared (Suter, 2009). But in the case of an anonymous donor, this individual obviously did not want to be found and even signed a contract stating his/her anonymity.

The services of DTC genetic testing companies raise questions like, what happens to your data? You provide the company with your genetic data, but what will it be used for? This involves the companies’ policies and how they inform their consumers about this. The companies’ informed consent is a major subject of debate (Annas & Elias, 2014). Most of the information retrieved from DTC genetic tests is used for research as well and consumers agree to this utilization. The amount of information stored in the companies’ databases is very interesting for medical research because it would be a lot of work to gather this amount of information themselves. Therefore, most of the DTC genetic testing companies use their customer’s information for research as well (Phillips, 2017; Tamir, 2010).

The regulations now used by most companies, do not belong to any of the existing legal categories (Phillips, 2016). When ordering a test online you already agree on certain terms. This mostly occurs in the form of a ‘browserwrap’ or ‘clickwrap’ agreements (Kim, 2013). Browserwrap agreements shows the terms in a different window, which are most of the time not even required to be opened before clicking on the agree button. Clickwrap agreements require the consumer to scroll down the document and agree with it by clicking a button, which does not actually require the consumer to read the document. These types of agreements are very general for e-commerce and are actual contracts (Phillips, 2017). These contracts almost always include clauses which allow the company to alter their terms and consent.

AncestryDNA lets customers agree that their DNA, together with their personal information, can be used for research purposes (Shah, 2014). The DNA samples themselves are analyzed by an independent laboratory (a third party), which only gets access to the genetic information. Genebase, for example, provides an open DNA database in which every customer can search for matches. They can access the genetic information along with the personal information. So, you agree your information is open to every user. The Genographic Project is aimed at research for genealogy. Which is why there is not much privacy involved when sending in your sample. Their policy includes that your samples and results are used for research, can be published on any form of media, are shared with other organizations and can be used in advertisement.

These are some examples of terms customers agree on, which they are most of the time unaware of. Most of these contracts have a lack of transparency, customers do not always read them and if they do they do not always interpret them right (Christofides & O’Doherty, 2016). A survey of Christofides & O’Doherty (2016) showed that, even after reading the companies’ regulations, customers expected that the companies would only share the result with them and destroy the samples after the test was performed.

Another issue that rises, is the fact that customers can send in any sample they want. This means you can also send in a sample that is not from your own DNA and you do not need any permission

(18)

to do so. This would obviously violate one’s privacy. Some companies do have rules to try to prevent this from happening (Shah, 2014). 23andMe for example, they require the sample to contain a certain amount of saliva in the sample, which takes some time to collect. In this way, they try to prevent people sending in samples secretly collected from someone else’s drinking cup for example. Most companies do state that you should only send in your own samples or samples from people you are the guardian of, but there are no specific measures taken to prevent this. There are no governmental policies that prohibit this surreptitious DNA testing. In some states in the US there is legislation that prohibits surreptitious DNA testing for medical purposes, but this does not apply to genealogy tests or searching for relatives (Phillips, 2016).

In the US, there is only federal legislation regarding medical companies. The Clinical Laboratory Improvement Amendments (CLIA) applies to all the clinical laboratories in the US (Hogarth et al., 2008). States may choose to implement the CLIA or use a system that is alike or stricter. It prohibits the companies from accepting material that is derived from the human body without a certificate from the Centers for Medicare and Medicaid Services (CMS). The CLIA also set regulations regarding quality assurance and proficiency testing for medical companies. This statute was established in 1988, when DTC genetic tests were not available yet, therefore it is only limited to regulations for medical laboratories. The statute’s regulations are being updated regularly and now includes DTC genetics, but still only addresses medical application (https://wwwn.cdc.gov, last accessed on 22 November 2017). For medical DTC genetic tests, the statute requires analytical validity and publicly available proficiency tests, to provide consumers with accurate test results (Hogarth et al., 2008). However, there is no requirement for the interpretation of the results of the genetic tests (Robertson, 2009). Which is important as well because, most customers will not be known to the knowledge needed to interpret the results correctly.

Another regulatory authority is the Food and Drug Administration (FDA) (Robertson, 2009). The FDA regulates the specific tests that are used in the laboratories, such as DNA kits. They decide whether a specific test can be used for medical diagnosis. However, laboratory developed tests (LDTs) do not have to be assessed by the FDA because these are not officially developed as a ‘kit’.

An example of an interception by the FDA is the warning letter they sent to 23andMe in 2013 (Annas & Elias, 2014). The letter ordered the company to stop marketing their Personal Genome Service (PGS), because this service made use of a kit that was not authorized by the FDA. Also, their results contained incomplete risk percentages and they failed to explain how to interpret the results (Hoglund-Shen, 2017). Although the FDA does not have specific regulations for DTC genetic testing, on these grounds it could still shut down the PGS of 23andMe (Annas & Elias, 2014). The FDA argued that the consumer’s health was at risk, because these tests could provide information leading to seeking treatments for diseases they might not even have (Hoglund-Shen, 2017). However, the FDA recently (April 2017) decided to give 23andMe authorization to provide health reports to their customers again (https://blog.23andme.com, last accessed on 29 November 2017). This authorization is only granted for 10 medical conditions, which is way less compared to their previous reports on more than 200 conditions (Hoglund-Shen, 2017). This decision is based on validation studies provided by 23andMe that meet the standards given by the FDA.

In Europe, there are some countries (France, Germany, Portugal and Switzerland) in which DTC genetic testing is taken up into legislation (Borry et al., 2012). In these countries, a genetic test may only be performed by a doctor. The test may then only be executed when sufficient information is provided on the nature of the test, the consequences and consent of the person to be tested is established. This means that the described DTC genetic tests are not available in these countries.

(19)

The Netherlands does not have legislation directly addressing the DTC genetic tests, but does have legislation which could refuse licenses of laboratories when their performance is proven to scientifically unsound. However, most companies are not based in Europe, but in the US. This means that the laboratory may perform below the Dutch standards, but Dutch people could purchase their test.

The lack of proper established regulations in this arising field of DTC genetic tests, has led to a call for regulation by many scientists (Christofides & O’Doherty, 2016; Hauskeller, 2011; Phillips, 2017). In the future this needs to be properly assessed and specific regulations regarding DTC genetic (genealogy) tests should be developed.

Another issue that comes to light due to the DTC genetic testing services, is the privacy of the people that are not even using the services of these companies but do get involved when results are leading to them. This involves the anonymous sperm donors or the biological parents of an adoptee that are found with the help of these companies. Before July 2017, there were no lawsuits reported regarding sperm donors who were traced using DTC genetic tests (Hoglund-Shen, 2017). The question now is when the first case will occur rather than if such a case would occur. The whole process of using a donor to conceive a child incorporates multiple persons, which makes it a complicated issue in terms of the rights of every person involved. The donor signs a contract at the clinic, but the rights of the person conceived by the donation may contradict the rights stated in this contract (Burr & Reynolds, 2008). The Right to Privacy is embodied in different types of acts on human rights (Fundamental Rights of the European Union, the Universal Declaration of Human Rights and the UK Human Rights Act) (Boden & Williams, 2004). However, this right does not address the right to known about one’s genetic descent. The right that does address these kind of issues, is the Right to Identity which is embodied in the UN Convention of the Rights of the Child.

The Right to Privacy, however, has the right of primacy over the Right of Identity in all the declarations, acts and conventions. However, this is not always upheld because courts practically prioritize the child’s interests over parental rights. Based on this, donor anonymity would be entirely abolished like many countries already implement now (Klotz, 2016). One could argue that it is simply not feasible to offer donor anonymity, because this could not be guaranteed with the available research methods for the donor-conceived.

Possible Use in Forensics

Most of the companies state in their policies that, if requested, they would provide the information in their database to law enforcement authorities (Shah, 2014). 23andMe even provides a ‘transparency report’ on their website, showing how many times law enforcement authorities requested access to their database. This report shows in which countries it occurred and how many customer profiles were made available to the requesting authorities (https://www.23andme.com/, last accessed on 18 December 2017). Right now this occurred 5 times in the US and 6 customer profiles were made available. No further information is provided on which cases were concerned or if the result provided useful information for the requesting authorities.

The other companies do not provide such transparency reports, but all three do provide a guide for law enforcement in case they want to request information (https://www.23andme.com/, last accessed on 18 December 2017; https://www.familytreedna.com, last accessed on 18 December 2017; https://www.ancestry.com, last accessed on 18 December 2017). This involves statements saying they will only provide information in case a subpoena, warrant or other judicial request orders them to do so.

(20)

The data in their databases mostly involves data from individuals not present in the forensic databases. This could be an important aspect in an investigation where DNA profile lead to no matches or familial results in the forensic databases. However, to be able to use the information present in the databases, it should be comparable to the information obtained in forensic case work. Another option would be that forensics would need to perform additional testing of their samples, to be able to meet the standards used by the companies and make use of their databases.

By investigating the techniques used by these companies (See: ‘Three Major Companies’) it is observed that SNPs are the basis of finding (distant) relatives. SNPs are the smallest genetic markers, because they incorporate only one nucleotide. Therefore, even in small traces of DNA they can be analyzed (Divne & Allen, 2005). However, forensics make use of STRs in their DNA analysis. This is used mainly because of high number of variants per STR, which makes it highly discriminating (van Oorschot et al., 2010). SNPs only have 4 variants (A, C,T,G), which is why one would need to investigate much more SNP sites to provide the same discriminative power as for STRs analysis. STRs are also changing at a much higher mutation rate than SNPs, 1 in 103 compared to 1 in 109 respectively (Butler, 2009). This makes them more distinguishable between individuals. These are some of the reasons, along with a major reform of the forensic system that would be necessary, for forensics to stick to the STR analysis. The rapid advances in microarray DNA analysis, might make SNP analysis more interesting for forensics in the future (Divne & Allen, 2005).

For these regular autosomal STR profiles it is not possible to run through the databases of the companies. Simply because their analyses do not involve the same system. However, when investigating the Y-chromosome or the mitochondrial DNA, the commercial methods and the forensic methods are alike.

In Y-chromosomal analyses, STRs are used both in the commercial field and in the forensic field. The analysis of Y-STRs, leads to a haplotype. In forensics, this analysis has been used as an extension to the autosomal STR profile and is can be used in a familial search (Suter, 2009). Generally, 9-17 Y-STRs are used to find members of the same male lineage (Ballantyne et al., 2012). Recently, researchers have developed a new Y-STR kit for forensics, including 23 STRs (Núñez et al., 2017 & Thompson et al., 2013).

As we can see from the methods of Family Tree DNA, different tests are available for analysis of the Y-chromosome. All tests analyze different numbers of STRs (37, 67 or 111). This is more than what the forensic field uses and the question is if they are analyzing the same STRs as forensics does. If they do not analyze the same STRs, there would be no point of comparing the forensic profiles in the commercial databases.

The kits commonly used in forensic casework amplify the following 17 loci: DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and Y GATA H4 (AmpFISTR Yfiler, 2006 & Thompson et al., 2013). The kits analyzing 23 Y-STRs add DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643 to this set (Thompson et al., 2013).

Family Tree DNA does not provide such a list of STRs that they use in their analysis. The company does have an online forum, in which the customers can discuss their results and the interpretation of it (http://forums.familytreedna.com, last accessed on 8 December 2017). This forum was searched for DYS numbers, in this way a view of the discussed DYS numbers could show the DYS’s FTDNA reports. This search indicates that at least all the STRs used in the common forensic kits (the 23 mentioned before), are also used in the analysis of Family Tree

(21)

DNA. 23andMe also tests the Y chromosome, but uses SNP analysis instead of STR, which make it unsuitable for comparison of forensic profiles.

The mitochondrial DNA is analyzed using the sequences of HVR1 and HVR2 for forensic purposes (Butler, 2009). As shown previously, Family Tree DNA and 23andMe also use the HVRs of the mitochondrial DNA. In both fields the matching is based on the determined haplogroups. This would make it possible for a forensic mitochondrial DNA profile to be run through a commercial database.

At 23andMe, the X chromosome is analyzed as well. However, this chromosome is not used in specific forensic analyses.

The data on which the companies’ estimates are based, show that the exact relation between individuals can never be determined with a 100% accuracy. The estimates also get less confident when more distant. The next question would therefore be, if the results could provide enough information during a case and if this would ever be allowed as evidence. The fact that there is no proper legislation regarding the validation of the methods used by these companies, makes this not very likely at this moment. A lot needs to happen before this could become reality, but the Y-STR and mitochondrial DNA analyses show some promising possibilities for the forensic field.

If the forensic sample would have an abundance of DNA it could be interesting to send it to a company to make use of their services for DNA analysis. However, the companies use several different criteria to determine a match. If forensics would want to use their services, proper validation should be performed on the criteria and the number of SNPs used in the analysis, to determine the suitable values.

When looking at the possibilities apart from the issues of validation and legislation, the results provided by the companies could be interesting for forensics. DNA can be investigated in many different contexts of forensics, which result in different amounts of DNA traces. There are cases in which the abundance of DNA samples does occur and multiple analyses can be performed. This could be for instance a DVI (Disaster Victim Identification) case or any other case

concerning the identification of a body. Even a little information on any relative can be the lead towards identification of an individual. These types of cases are therefore most suited to make use of the services provided by the DTC genetic testing companies.

By explaining the methods used by the three major companies, it is shown that they are able to provide information on distant relatives. Their methods can match individuals up to 4th/5th

cousin. The companies provide personal information of the matches, or this can be requested. Through this communication system surnames can provide sufficient information for further investigation. Tools to generate family trees are available as well, which make it possible to visualize the family relations of different matches.

If these techniques would be used in a forensic case of identifying a body, this could provide information leading towards identification. The examples of donor-conceived tracing their

biological parents, confirm the ability of using this information to lead to identification of a person searched for. In a case where no identification can be achieved through the general forensic process, these databases could hold the crucial information. The databases are already

containing a lot of data and will keep growing as more customers will make use of the services. This makes it a growing boost of information when made available for forensic purposes which would enlarge the possibility of the identification of an unidentified body.

Referenties

GERELATEERDE DOCUMENTEN

In order to prove that a number n is prime rather than composite we need a converse to Fermat's theorem.. Two problems

Toward overcoming these hurdles, and hence unleashing the full potential of RGN-based genome editing, researchers are devising improved delivery systems (Chen and Gonc¸alves,

Dit laatste zal het gevolg zijn van het optreden van procesdemping bij grotere snijsnelheden, hetgeen weer tot de veronderstelling leidt dat voor lage

Verspreid op het onderzochte terrein zijn diverse waardevolle sporen aangetroffen, die op basis van het gevonden aardewerk te dateren zijn in de Romeinse periode. Gelet

Table 5 with sample size of 982 having full information (306 and 678 in developing and developed countries respectively) out of 7,726 all available project data in power sector

[r]

If M and M are adjacency matrices of graphs then GM switching also gives cospectral f complements and hence, by Theorem 1, it produces cospectral graphs with respect to any

This will allow personality psychologists to declare upon the connections between the levels of the model (Eysenck, 1993a, 1993b), and to declare upon the differential genetic