• No results found

Unravelling the genetic and pathophysiological complexity of the mitochondrial myopathies

N/A
N/A
Protected

Academic year: 2021

Share "Unravelling the genetic and pathophysiological complexity of the mitochondrial myopathies"

Copied!
57
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Unravelling the genetic and pathophysiological

complexity of the mitochondrial myopathies

(2)

Final Graduation Report

Unravelling the genetic and pathophysiological complexity of the

mitochondrial myopathies

University Maastricht

General data: Graduation subject Mitochondrial Myopathies Graduation term 23-01-2012 to 25-06-2012 Version 1 Author

Richard Gerardus Johannes Dohmen rgj.dohmen@student.avans.nl Student number

2016701 Deadline May/ June 2011 Education contact information

School of Life Sciences and Environment Technology Lovensdijkstraat 61-63 4818 AJ Breda Phone: 076 525 05 00 Supervisor ATGM Julian Ramakers jd.ramakers@avans.nl

Internship contact information University Maastricht

Clinical Genomics Department Universiteitssingel 50

6229 ER Maastricht Phone: 043 388 19 95 Internship mentor Prof. Dr. Bert Smeets

bert.smeets@maastrichtuniversity.nl Supervisor UM

Ing. Rick Kamps

(3)

Preface

The performed graduation term, 23rd of January 2012 to 25th of June 2012, is documented in this final

report. During the graduation my knowledge of the mechanisms of genomics, the use of different databases and my practical skills were improved. The obtained knowledge and results during the graduation are included in this report.

I would like to thank Bert Smeets for the opportunity to become an intern at Clinical Genomics. Second of all I want to thank Mike Gerards, Iris Boesten, Auke Otten and Bianca van den Bosch for their assistance, knowledge and practical tricks which they shared with me. Further more I thank the rest of the department Clinical Genomics for a wonderful and educational 9 and half months. And last but definitely not least my supervisor, Rick Kamps. I thank him for all the knowledge and the support and guidance he gave me, his structured and calm way of handling things really inspired me.

(4)

Table of contents

Preface ...iii Summary...vii Samenvatting... viii 1. Introduction...1 2. Theoretical background ... 2

2.1 Whole Exome – Enrichment...2

2.2 Illumina Sequencing...3

2.3 Bioinformatics...5

2.3.1 Variant analyses...5

2.4 Functional tests...7

2.4.1 Targeting of the protein...7

2.4.2 Analyses other patients...8

2.4.3 Gene expression levels...8

2.5 Affected patients...8 2.5.1 Family DNA 07-2283...9 2.5.1.1 LAMA3...9 2.5.1.2 SYNPO2...9 2.5.1.3 DCHS2...9 2.5.1.4 SLC6A8...9 2.5.1.5 Zyxin...10 2.5.1.6 KIAA1109...10

2.5.1.7 Transformation fibroblast - Myogenesis...10

2.5.2 Family DNA 08-5759...10

2.5.2.1 KRTAP10-6...11

2.5.2.2 PSPH...11

(5)

3. Methodology ... 12

3.1 Whole Exome Enrichment & Illumina Genoma Analyzer HiSeq 2000...12

3.1.1 Sample preparation...12

3.1.2 Hybridization...12

3.1.3 Addition of Index Tags by Post-Hybridization Amplification...13

3.1.4 Cluster generation...13

3.1.5 Sequencing by synthesis...13

3.2 Data analyses...13

3.2.1 Validation & segregation...13

3.2.2 Relate function gene to phenotype...14

3.3 Functional tests...14

3.3.1 Targeting...14

3.3.2 Analyses other patients...14

3.3.3 Gene expression levels...15

3.3.4 Transformation fibroblasts...15

4. Results & Discussion/ Conclusion... 16

4.1 Family DNA 07-2283...16

4.1.1 Variants...16

4.1.2 Sanger sequencing validation & segregation...17

4.1.3 SYNPO2 protein targeting...21

4.1.4 Myogenesis – qPCR gene expression levels...22

4.2 Family DNA 08-5759...28

4.2.1 Variants (2)...28

4.2.2 Sanger sequencing validation & segregation (2)...29

5. Discussion/ Conclusion & Recommendations... 32

(6)

Appendix’... 38

Appendix 1: Flow diagram WE-enrichment...39

Appendix 2: pEGFP-n1...40

Appendix 3: Covaris S2 fragmentation...41

Appendix 4: AMPure XP beads Agencourt...42

Appendix 5: Agilent 2100 Bioanalyzer...43

Appendix 6: Primer design...44

Appendix 7: PCR & gel electrophoresis...45

Appendix 8: Sanger sequencen...46

Appendix 9: Designed primers for Sanger sequencing genes...47

Appendix 10: Regulation of SYNPO2 localization in myocytes...48

(7)

Summary

Mitochondria are the power plants of the cell, using five complexes, which are located in the inner membrane, to generate energy. These five complexes, I to IV and ATP synthase, are encoded by mitochondrial and nuclear genes. Mutations in these genes can cause the complexes or associated proteins to become non-functional. This could lead to lack of energy for cells, tissues and organs to use; especially organs which have a high energy demand, are affected. These mutations can be the cause of a mitochondrial myopathy, disorders in which the energy supply of muscles is affected. A new approach to detect the genetic cause is insofar irresolvable. All the thousands of genetic variants in the coding regions of the genome are detected, from which the potentially pathogenic, causative mutations need to be filtered out. Whole Exome (WE) Sequencing is based on Next-Generation Sequencing (NGS) technology. These variants are validated and the segregation in the families is tested using Sanger sequencing. The variants are also functionally tested; these tests include protein targeting, cellular and gene expression studies.

Two patients from WE families, DNA 08-5759 and 07-2283, were investigated and sequenced, the sequence data obtained from the two families is filtered using various filter steps. A reduced subset of potential pathogenic variants remains, approximately 60 and 195 variants for, respectively, family 08-5759 en 07-2283. With the use of several prediction programs, which predict the damaging effect of the variant, and linking the phenotype to the gene function, the list is even further reduced to 2-5 variants. The variants of family 08-5759 were false positive or could not be related to the phenotype, KRTAP10.6. The variants, LAMA3, SYNPO2 and ZYX; which were validated and tested for segregation, in family 07-2283 are thought to be more trivial, because these can be related to the phenotype. Eventually one of these variants, SYNPO2, is functionally tested. The location of the SYNPO2 protein in the cell was determined. The protein was diffuse expressed throughout the cytoplasm and nucleus, not in the mitochondria, making a role in a mitochondrial disorder less likely. The SYNPO2 variant was tested in the family, the two affected children were homozygous for the same variant, which matches a potential pathogenic role, however, so were DCHS2, KIAA1109 and LAMA3 variants. Patient fibroblasts were transfected with a myoD (myogenic regulatory factor) vector, to transdifferentiate the fibroblasts to myotubes (muscle cells). The SYNPO2 expression in these transfected fibroblasts was quantified using a relative quantification, and it was, in comparison to the control, skeletal muscle, low.

We concluded that the mutation doesn’t alter the expression of SYNPO2 in the patient fibroblasts, because the expression was also low in the transfected control. However before further analyses can be done, the transfection of the patient fibroblasts using myoD vectors has to be optimized. The low expression of the SYNPO2 gene is probably a cause of insufficient differentiation of the transfected fibroblasts. Whenever the transfection is optimized, the transfected fibroblasts have differentiated to myotubes, further additional functional tests are possible, such as Immunohistochemical or fluorescent antibody staining, electron microscopy and Western blotting to analyze the formed Z-disc and proteins which might be affected.

(8)

Samenvatting

Mitochondriën zijn de energiefabrieken van de cel, door middel van vijf complexen, welke zich in het binnenste membraan bevinden, wordt er energie opgewekt. Deze vijf complexen, I t/m IV en ATP synthase, worden gecodeerd door mitochondriale en nucleaire genen. Wanneer er een mutatie optreedt in een van deze genen kan een van de complexen of bijbehorende eiwitten non-functioneel worden. Dit leidt tot een gebrek aan het vermogen om energie op te wekken voor de cel, weefsel en orgaan; vooral organen welke een hoog energie verbruik hebben, worden ernstig getroffen door een gebrek aan energie. Deze mutaties kunnen de oorzaak zijn van een mitochondriale myopathie zijn, een groep ziekte waarbij de energievoorziening van spieren is aangetast. Een nieuwe benadering om genetische oorzaken te detecteren is tot dusver onoplosbaar. Alle duizenden genetische varianten in het coderende gedeelte van het genoom worden gedetecteerd, vanuit deze duizenden varianten worden de mogelijk pathogene en oorzakelijke mutaties gefilterd. Whole Exome (WE) sequencen is gebaseerd op Next-Generation Sequencing (NGS) technologie. Deze varianten worden gevalideerd en segregatie in de familie wordt getest met behulp van Sanger sequencing. Verder wordt de variant ook functioneel geanalyseerd met functionele testen, als eiwit targeting, cellulaire en gen expressie studies.

Twee patiënten van de WE families, DNA 08-5759 en 07-2283, zijn onderzocht en gesequenced, de sequence data verkregen van de twee families is gefilterd met behulp van verschillende filterstappen. Er blijft een gereduceerde subset aan mogelijk pathogene varianten over, van ongeveer 60 en 195 varianten voor respectievelijk familie 08-5759 en 07-2283. Door middel van verscheidene predictie programma’s, waarmee het schadelijke effect van de variant voorspelt, en het linken van het fenotype aan gen functie werd de lijst verder gereduceerd tot er 2-5 varianten overbleven. De varianten van familie 08-5759 waren fout positief of kon niet worden gelinked aan het fenotype, KRTAP10.6. De varianten, LAMA3, SYNPO2 en ZYX; welke gevalideerd waren en getest op segregatie, in familie 07-2283 waren interessanter, deze konden worden gerelateerd aan het fenotype. Een van deze varianten, SYNPO2, is uiteindelijk functioneel getest. De locatie van het SYNPO2 eiwit in de cel is bepaald. Het eiwit vormt een diffuse kleuring van het cytoplasma en nucleus, niet in de mitochondriën, waardoor een rol in een mitochondriale ziekte minder waarschijnlijk wordt. De SYNPO2 variant was getest in de familie, beide aangedane patiënten waren homozygoot voor dezelfde mutatie, maar dat geldt zowel voor de KIAA1109, DCHS2 en LAMA3 varianten. Patiënt fibroblasten zijn getransfecteerd met een specifieke myoD (myogene regulator factor) vector, om de fibroblasten te transdifferentiëren naar spiercellen. De SYNPO2 expressie in deze getransfecteerde fibroblasten is gekwantificeerd met behulp van een relatieve kwantificatie, en was, in vergelijking met de controle, skeletspieren, laag.

Er wordt geconcludeerd dat de lage expressie niet wordt veroorzaakt door de SYNPO2 variant, doordat de getransfecteerde controle ook een lage expressie bevat. Maar voor er verdere analyses kunnen worden uitgevoerd, moet de transfectie met de myoD vector worden geoptimaliseerd, de lage expressie van SYNPO2 is mogelijk te wijten aan het onvolledig differentiëren van de getransfecteerde fibroblasten. Wanneer een optimalisatie van de transfectie is uitgevoerd, de ‘myotubes’ lang genoeg in kweek kunnen worden gehouden zodat deze differentiëren, zijn verdere aanvullende functionele testen als kleuringen met Immunohistochemische of fluorescerende antilichamen, elektronen microscopie en Western blotting mogelijk om de gevormde eiwitten te analyseren.

(9)

1. Introduction

Living cells of most eukaryotic and many prokaryotic organisms generate energy, adenosine triphosphate (ATP), using a process called the cellular respiration. The generated ATP is used to perform their many tasks. In most eukaryotic organisms mitochondria are the sites of cellular respiration, the most important process of cellular respiration is oxidative phosphorylation (OXPHOS). The OXPHOS sites in the inner membrane of the mitochondrion, it’s a collection of proteins, which exist in multiprotein complexes numbered I through IV. The subunits of these complexes are encoded by both the mitochondrial (mtDNA) and nuclear DNA (nDNA). Mutations in the mtDNA and/ or in nuclear genes, which are involved in the maintenance and replication of the mitochondria, could lead to a malfunction of the mitochondria. A malfunction of the mitochondria causes a lack of energy in cells, tissue and organs, the lack of energy could lead to disorders. These disorders are called mitochondrial myopathies. These myopathies are one of the most common metabolic or neurologic hereditary disorders, and there are over 1500 genes involved with the myopathies. Organs and tissues which have a high demand of energy are affected most often in mitochondrial myopathies. In a quarter of all patients their mitochondrial myopathy is a hereditary cause, finding an approach to detect these genetic causes has been irresolvable insofar.

Using Next-generation Sequencing (NGS) applications, as Whole Exome (WE) sequencing or Long Range PCR (LRPCR) fragments, defects present in the mtDNA and/ or nDNA are detected and investigated. The research aimed to identify the possible pathogenic defect out of thousands detected variants from patients with a mitochondrial myopathy, using standardised protocols for NGS and several filtering steps. It’s assumed that the pathogenic variant, which is found during WES, alters the amino acid, affecting the protein and its function, and the frequency of the variant in the general population is low. Based on these assumptions the list of variants, which includes the possible pathogenic defect, is filtered and shortened. The amount of variants was even more reduced using additional strategies, like relating the phenotype of the patient to the gene in which the possible pathogenic variant is found. Validating the variants to exclude false positive variants and also verifying the segregation of the variant, both were done using Sanger sequencing. Further analysis included functional tests, targeting of the transcribed and translated gene, gene and cellular expression levels and Sanger sequencing patients with the same phenotype. It was expected that the genetic and pathophysiological complexity of the mitochondrial myopathies were to be solved using NGS. And the possible pathogenic variants could be linked to the phenotype with the use of the filtering steps and functional tests.

The report is composed of several chapters; chapter two comprises of the theoretical background and the principles of the research and techniques used during the graduation. Chapter three contains the performed procedure and the used techniques. In chapter four the obtained results are included, followed by chapter five wherein the results are discussed and a conclusion is drawn. Also the references and any attachments included in the graduation report.

(10)

2. Theoretical background

2.1 Whole Exome – Enrichment

NGS is the second sequencing generation, the NGS platforms, Roche 454, Illumina and SOLiD, enable more applications to be sequenced than the first generation, Sanger sequencing. However Whole Genome-sequencing (WGS), which is a NGS application, is expensive, an alternative is Whole Exome sequencing (WES). WES is used to sequence the entire exome, the protein coding regions which constitutes of approximately 1% of the genome [1][2][3][4]. Using WES, the protein coding regions of

rare genes, which cause complex disease and health-related traits, are examined and explored. These protein coding regions, the exome, contain 85% of the mutations which have a large effect on disease-related traits [5]. The exome contains import information which is necessary for an organism

to function, mutations in the exons could have severe consequences for the functioning of the organism. The type of mutation in the exons determines the severity of the mutation, there are synonymous and non-synonymous mutations. Synonymous mutations are mutations which cause no alteration to the amino acid, these mutations modify the codon, however multiple codons produce the same amino acid. Non-synonymous mutations alter the codon and the amino acid, a change in amino acid could have severe consequences. It could change the polarity, permeability, acidity and charge of the formed protein. Using WES, these mutations, which possibly change the amino acid, are detected, however mutations present in introns are partly lost. The introns harbour specific regions which are important for the development of a protein, such as splice sites, Untranslated Regions (UTR), and some transcriptional factors. The transcriptional factors can influence the amount of protein which is synthesized from a gene. Mutations in the splice sites have an effect on the splicing of the pre-mRNA, resulting in a possible loss of an exon or an intron isn’t spliced out. During WES a part of these regions is included, however this part is only a fraction of the amount of introns. WGS provides the complete summary of all mutations, in introns and exons, and doesn’t lose any important mutations in the introns [6][7][8][9].

WES starts with the isolation of the exome using enrichment kits, there are various WE-enrichment kits. The SureSelect Target Enrichment System from Agilent Technologies is linked to the Illumina sequencers, Genoma Analyzers and HiSeq. WES is performed using Genomic DNA (gDNA), which is sheared in fragments of 150-200 bp. After shearing the created sticky ends are converted to blunt ends and the DNA-fragments are phosphorylised. By adding a dATP to the DNA-fragments, adapters, which have a dTTP-nucleotide overhang, are ligated to the both the 3’ and 5’ fragment. The adapters contain an index tag, it’s a sequence which is used for identification. There are multiple index sequences, using these multiple indexes it’s possible to pool different samples, every sample containing a different index. Using these adapters, which function as primers, the DNA-fragments are amplified. After the amplification the DNA-fragments are denatured and RNA baits are added, these RNA baits are 120 bp in size and synthesized from cDNA. The RNA baits are complementary to the cDNA, which ensures that DNA-fragments consisting of introns can’t attach to the RNA baits, whereas the exon DNA-fragments can. The end of the RNA bait is biotinylated, DNA-fragments, which are attached to the complementary RNA baits, are isolated using streptavidin coated magnetic beads. The biotinylated end of the RNA bait attaches to the streptavidin coated magnetic bead, using a magnetic field the RNA baits, with the exon DNA-fragments, are separated from the unattached fraction of intron and partly exon fragments. The biotin-streptavidin bond is broken, the DNA-fragment-RNA bait complex is denatured and the RNA is digested.

(11)

The single stranded DNA-fragments are PCR amplified, and sequenced using the Illumina HiSeq 2000

[10][11]. The whole WE-enrichment flow diagram is included in appendix 1, figure 21.

2.2 Illumina Sequencing

The Illumina/ Solexa sequencing technique uses reversible terminators and a solid surface to sequence a diverse set of applications, such as gDNA, cDNA and LRPCR-fragments. Input DNA sheared in fragments smaller than 800 bp, shearing the DNA creates sticky-ends, which are converted to blunt ends using enzymatic reactions. An A-nucleotide is added to the 3’ blunt end of the DNA-fragment. The adapters (A and B), which have a T-nucleotide overhang, are ligated to both ends of the DNA-fragment using the A-T-nucleotide bond. The adapters contain an index tag, it’s a sequence which is used for identification. There are multiple index sequences, using these multiple indexes it’s possible to pool different samples, every sample containing a different index. The Illumina sequencing software separates the sequence data of all samples according to its index tag.

A flow cell containing 8 lanes, see figure 2, is loaded in the cBot, an automatically cluster generator, see figure 3 on the next page. The denatured DNA-fragment is loaded onto the solid flow cell, the surface of the flow cell is covered with oligo nucleotides, adapters (B) and complementary adapters (A), see figure 1. The single stranded DNA-fragment binds to the oligo nucleotide complementary to adapter A, which is attached to the surface of the flow cell. Using an extension reaction the single stranded DNA-fragment is amplified starting from the complementary adapter A. After the extension, the template DNA is removed and the amplified single stranded DNA-fragment is attached to the flow cell. The 3’ end adapter sequence (B) of the amplified DNA-fragment binds to the oligo nucleotide complementary to adapter B, which is attached to the flow cell. When binding to the adapter, the DNA-fragments forms a bridge, see figure 1. The adapter, attached to the flow cell, functions as a primer for the following PCR amplification. Using PCR amplification reagents and multiple PCR-cycles the DNA-fragments are amplified starting from the adapter, this is called ‘’bridge amplification’’. The bridge amplification creates clusters of the same DNA-fragment on the surface of the flow cell, these clusters exist of thousands of DNA-fragments [12][13][2].

Figure 1: Schematic workflow Illumina sequencers, combination of multiple figures. Sticky ends of sheared DNA are converted to blunt ends, phosphorylised and an dATP is added (1). Adapters are ligated to both ends of the DNA-fragment using a dTTP overhang (2), resulting in an adapter-DNA-fragment (3). The DNA-fragment is denatured and binds to the oligo nucleotide complementary to adapter A, which is attached to the flow cell (4). Using a PCR reaction the complementary strand is applied to the flow cell (5), the strand bends and the 3’ adapter end binds to an oligo nucleotide complementary to it (6). Clusters are

(12)

After the cluster generation the flow cell is loaded into the HiSeq 2000 sequencer, the sequencer automatically adds all the sequence reagents, such as the sequence primer and four reversible terminator nucleotides (A, C, T and G). The sequence primer, which is complementary to the 3’ end adapter, attaches to the single stranded DNA-fragment. The reversible terminator nucleotides are 3’-modified nucleotides (3’-O-azidomethyl 2’-deoxynucleoside triphosphate), which each contain a different removable fluorophore, and added simultaneously. Whenever a nucleotide attaches to the DNA-fragment, it’s impossible for DNA-polymerase to bind another nucleotide because of the terminator group. After incorporation of a reversible terminator nucleotide, four lasers illuminate the fluorophore, which emits its specific wavelength. The detectors detect the emitted wavelength and identify the nucleotide and its position on the flow cell. The fluorophore and terminator groups are cleaved off by adding tris(2-carboxyethyl)phosphine (TCEP), during the cleavage TCEP regenerates the 3’-OH end of the incorporated nucleotide. The addition of the hydroxyl group enables polymerase to add the next reversible terminator nucleotide, during the next cycle of sequencing. These synthesis cycles are repeated, during each cycle the fluorescent signal of every cluster is measured as depicted in figure 2, on the next page. The DNA-fragments are sequenced during two Reads of 100 cycles, every cycle a nucleotide is attached and identified, the read length of the HiSeq 2000 is approximately 100 bp. The first read is a multiplexed read, the second 100 cycles the fragments are paired-end sequenced. Paired-end sequencing enables the sequencing of the opposite end of the fragment. The original templates are cleaved and removed, the complementary strand is regenerated in clusters and sequenced. The clusters are created in the sequencer, but in the same way as in the cBot, after the regeneration the created clusters are sequenced. Using the Illumina software the fluorescent signal is converted into raw data, the nucleotides are determined, every cluster has its own sequence. The Illumina software applies a base-calling algorithm to define the quality (Q) value for each base call. The raw data is mapped against a reference gene using mapping software, variants, insertions and deletions are detected. The average HiSeq 2000, see figure 4, data output is 2 x 300 Giga base pairs (Gb), this applies for two flow cells, each flow cell contains 8 lanes

[14][15].

Figure 4: Schematic review of the Illumina sequence run and a picture of the data. The sequence primers and reversible terminator nucleotides are added (1), the attached nucleotide, which contains a fluorophore, is illuminated and the fluorophore emits a fluorescent signal (2). The fluorescent signal is detected by the CCD camera and pictures are made of each cycle (picture on the right), the fluorescent signal is converted to a nucleotide [14]. On the right is the Illumina sequencer HiSeq 2000.

Figure 2: cBot, clusters are

(13)

2.3 Bioinformatics

Using the Agilent SureSelect Exon Enrichment kit approximately 38 Mega base pairs (Mb), containing exons, is captured, the remaining 12 Mb exists of UTR’s, microRNAs and adjacent splice sites [10][14].

While mapping the raw data against a reference gene, several thousands of single nucleotide variants (SNVs) are discovered. The majority of the variants are known polymorphism, however one, or possible more, of these variants is potential pathogenic. The detected list of variants is filtered in various steps, using these filtering steps the potential pathogenic variant is identified. The filter steps are based on the assumptions that pathogenic mutations alter an amino acid, thereby affecting the protein and its function, and the frequency of the variant in the general population is low. The variants are filtered according to the effect of the mutation, mutations in protein coding regions have, in most cases, a more severe effect than mutations in non-coding regions. The damaging effect of missense, alteration of the amino acid, or nonsense mutations, amino acid changes in a stop codon, is predicted using algorithmic calculations, Grantham table and the computer program PolyPhen-2. The Grantham score reflects the increasing chemical inequality, PolyPhen-2 predicts the functional effect of an amino acid change based on its evolutionary conservation, and the position of the amino acid alteration in the protein. These algorithmic calculations predict the damaging effect, the higher or lower the score, Grantham 101, PolyPhen 2 0.851 and SIFT 0.05, the more pathogenic a variant is considered [16][17][18]. The variants are compared with Single Nucleotide

Polymorphism (SNP) databases, these SNP databases include all known variants, which might be pathogenic. The known variants and variants which have a high frequency in the general population are excluded, it’s assumed that high frequency mutations are non-pathogenic. The next filter steps is based on the fact that most metabolic traits are autosomal or X-linked recessive traits, only homozygous (two affected alleles) or compound heterozygous (two various mutations, in combination pathogenic) aren’t excluded. Using these filtering steps the number of variants is reduced to a smaller subset of several hundred potential pathogenic mutations, see figure 5

[19][6][7][11].

2.3.1 Variant analysis

With the use of the first filter steps the number of variants is reduced, to shorten the created list of potential pathogenic variants additional analyses are used. These tests imply to the validation and segregation of the variant and the relation of the gene with the phenotype of the patient.

Total variant (±50.000 variants)

Novel & low frequency in general population (±10.000) Coding/ non-synonymous mutations (±1500)

Two alleles in one gene (homozygous, compound heterozygous) (±600) Smaller subset of variants, due to prediction programs, homozygosity (±90) Validation, segregation of small subset of potential pathogenic variants (±3-5)

(14)

Relating the function of the gene to the phenotype

During sequencing variants are discovered in several genes, using the mapping software the variants are mapped to known genes. Every gene family and subfamily has a unique function, using NCBI the function of the, in the NCBI database known, genes are retrieved [20]. Mutations in the protein coding

regions could alter the protein, resulting in a non-functional protein, the pathophysiological effect of a non-functional protein depends on the function of the protein. Mitochondrial myopathy patients have a pathophysiological cause, which manifests itself in a specific phenotype. By relating the functions of the various genes, in which a potential pathogenic mutation is found, to the clinical and subclinical phenotype, the subset of possible pathogenic variants is even further reduced. Mitochondrial related proteins are more associated with mitochondrial affections than probably i.e. hair related proteins are.

Validation of the variant

Using NGS, the throughput of large volumes of sequence data is increased, however the accuracy is reduced which enlarges the error rate. To exclude false positive reads, the region of the gene, in which the variant is found, is Sanger sequenced. The variants are validated, true variants are separated from the false positive variants, and this increases the reliability of the found variant [2].

Segregation of the variant

The segregation of the found variants is determined using familial analyses. Segregation refers to the separation of individuals with different traits, such as pathogenic variants, the segregation of genes explains how recessive disorders can skip a few generations. Pathogenic, and non-pathogenic, variants are in most cases, there do exist de novo mutations which occur spontaneously, traceable to one of the parents. Variants which segregate incorrectly are excluded as potential pathogenic variant. When the progeny, of a healthy homo- and heterozygous (AA, normal type and Aa, a=variant) parent, has a pathogenic phenotype, but is heterozygous (Aa), the variant didn’t segregate correctly. The affected progeny has got the same genotype as one of the parents, there’s no separation of two distinctive traits [25][26][27][19]. The variant is excluded as a potential pathogenic

variant. Using these analyses of variants, relating the function of the gene, segregation and validation of the variants; the variants are reduced to a manageable amount. A few of the remaining subset of variants will be functionally tested.

Various types of defects in genes

Mitochondrial functions are inherited in two ways: maternal and bipaternal inheritance. Maternal inheritance relies on the mitochondrial DNA (mtDNA), during fertilization the paternal mtDNA, of which a part ends up in the oocyte, degraded [21]. Mitochondrial dysfunctions caused by mtDNA

defects are always maternally inherited. The inheritance of a mitochondrial disorder depends on which types of mtDNA are present in the mitochondrion, a cell contains 102 - 105 mitochondria, a

mitochondrion contains two to ten copies of the mtDNA. These two types of mtDNA, normal and mutant mtDNA, can be distributed in different ratios amongst cells, tissues and organs. Whenever a cell contains all the same, normal, mtDNA type it’s called homoplasmy. Heteroplasmy is the state in which normal and mutant copies of mtDNA are mixed in the mitochondrion or cell. The percentage of mutant mtDNA in comparison with normal mtDNA, the threshold, affects the severity of the phenotype, the disease [22][23]. Bipaternal inheritance relies on the inheritance of 23 chromosomes of

(15)

Defects in these genes on the gDNA, which encode for replication and maintenance of the mtDNA, could cause mutations in the mtDNA. Pathogenic phenotypes which are caused due to a mutation on one gene, are called monogenic diseases. Such diseases, rather these mutations on the gDNA genes are inherited in a Mendelian way, a single copy of a gene of both parents. The mutations are inherited in an autosomal dominant, recessive or X-linked way. Autosomal mutations are inherited via the autosomes, X-linked mutations are mutations which inherit via the X-chromosome. A mutation, which results in a pathogenic phenotype, in a single copy of an autosomal gene is an autosomal dominant mutation. Autosomal recessive traits are caused by mutations in both copies of an autosomal gene, these are homozygous for a pathogenic mutation. Heterozygous autosomal recessive individuals contain a affected gene and a healthy gene, these individuals are carriers for the autosomal recessive trait. Parents, which both are carriers of a recessive trait, have an one to four chance the progeny has a recessive disorder, this applies to both X-linked as autosomal mutations. The autosomal and X-linked dominant inheritance has an one to two chance in which the progeny inherits the dominant disorder, see figure 6.

2.4 Functional tests

When the potential pathogenic variants are validated and the segregation of the variant is determined, functional test are performed. These functional tests include targeting, familial analyses and gene expression levels.

2.4.1 Targeting of the protein

By means of transcription and translation DNA is converted to protein, these proteins, which are synthesized in the ribosomes, each have a specific function. To function the protein is being transported to its organelle, the membrane of the organelle, the inner space of the organelle, the cell membrane or to the extracellular matrix outside the cell. The transporting process is carried out based on information of the protein itself, proteins contain a specific targeting signal, a signal peptide. Based on this signal peptide, proteins are transported and delivered to the correct organelle.

1 2 3

Figure 6: Composition of various forms of heredity. (1) Autosomal dominant inheritance: H (dominant trait) and h (healthy gene), 50% of the progeny inherits the autosomal dominant trait (red), 50% is healthy (blue). (2) Autosomal recessive inheritance: both parents are carriers of autosomal recessive traits (c, and C is healthy), 50% of the progeny are carriers (Cc), 25% is healthy (CC) and 25% inherits the autosomal recessive trait (cc). (3) X-linked heredity (recessive): one of the parents is a carrier of recessive X-linked traits (Xx), 50% of the progeny is healthy (XY and XX), 25% are carriers (Xx) and 25% inherits the recessive X-linked trait (xY). Dominant X-linked traits: 50% of the progeny is healthy, 50% inherited the X-linked dominant trait [27].

(16)

The signal peptide includes two different types, the pre-sequences and the internal targeting peptides. The pre-sequences targeting peptides are located at the N-terminus, beginning of the protein, or at the C-terminus, the end of the protein. Internal targeting peptides are enclosed by the rest of the protein [1]. Genes, which contain a potential pathogenic variant, can be localised using

these targeting peptides. Mutations in the targeting peptides could alter the targeting peptides composition, the alteration could lead to a non-functional targeting peptide. A non-functional targeting peptide results in non-targeted proteins, these aren’t targeted to its organelle.

Using a vector (pEGFP-n1, see appendix 2, figure 22), restriction enzymes and DNA ligase the pre-amplified targeting peptide, of an interesting gene (protein), is ligated into the vector. Competent cells, cells which posses a easily altered and, for DNA, crossable cell membrane, are transformed and used to replicate the vectors. The replicated vectors are isolated and used to transfect human cells using the FuGene complex, the FuGene reagent forms a complex with DNA. The complex exists of DNA which is surrounded by a lipid membrane, the FuGene lipid membrane merges with the human cell membrane, and the DNA ends up in the cell. The vector contains a promoter, the targeting peptide and the GFP, green fluorescent protein, is transcribed and translated by the cell. The targeting peptide, together with the GFP, is transported to its organelle, where the GFP emits the green fluorescence. The interesting gene and protein, which contained a potential pathogenic variant, is localised.

2.4.2 Analyses other patients

The pathogenicity of a mutation can be tested by means of sequencing other patients, the specific exon of the patients, which might have a similar phenotype, is sequenced. The analyses has also been used to analyze the inheritance of a variant (autosomal dominant or recessive, X-linked or de novo mutations), during the validation and segregation in the family [6].

2.4.3 Gene expression levels

Using transcription and translation, cells convert DNA into protein, these proteins are important for the functioning of cells. Each protein has its own function, some proteins are more important than others. There are differences in gene expression of a gene between various tissues, every tissue synthesizes its own specific proteins, which it needs to function. Specific genes are expressed in multiple tissues, these genes are called housekeeping genes. A mutation in a gene can have multiple consequences, it can cause the translation of a non-functional protein, an in- or decrease of expression or the protein isn’t expressed at all. Using quantitative PCR (qPCR) it’s possible to determine the gene expression of genes in various tissue. By analysing the gene expression levels of genes, which contain potential pathogenic variants found by WES, in multiple tissue, the effect of the variant on the protein is determined.

2.5 Affected patients

During the graduation and apprentice internship, there were two families of great importance; family DNA 07-2283 and DNA 08-5759. Because of the confidentiality no family names were used in this report. These families had one or more affected children, these were screened using WES. The bioinformatics analyzed and filtered the majority of the dataset and reduced the number of potential pathogenic variants. The remaining subset was to be validated, tested on segregation and functionally analyzed.

(17)

2.5.1 Family DNA 07-2283

Family DNA 07-2283 was a consanguineous family and had two affected family members, the patient, whom both died 6 months after birth, displayed the same phenotype and various clinical symptoms: hypertrophic cardiomyopathy (HCM), incorrect muscle development and underdevelopment. After WES and the filtering steps a subset of 195 variants remained, six variants, LAMA3, SYNPO2, DCHS2, SLC6A8, Zyxin (ZYX) and KIAA1109; were selected as potential pathogenic. Four variants, LAMA3, SYNPO2, DCHS2 and KIAA1109, were selected based on the similarity between the two patients; SLC6A8 and ZYX were chosen because of their relatedness to the phenotype. A few variants were analyzed during the graduation term.

2.5.1.1 LAMA3

LAMA3, laminin, alpha-chain (A) 3, is one of the subunit proteins which form the hetero-trimetric glycoprotein laminin 5, 6 and 7 encoded on chromosome 18. The laminins are components of the basement membrane, a thin layer of extracellular matrix which comprises epi- and endothelial, muscle, fat and Schwann cells. The basement membrane, including the laminins, mediates the maintenance of skin integrity, filtration and various stages of development. The laminins play an important role during embryonic development by mediation of the attachment, migration and organization of cells into tissues [28][29]. The LAMA3 mutation, c.2234G>T, was detected in exon 19 of

the 76 exons, and altered the amino acid 745 arginine to leucine. The nucleotide and amino acid were highly conserved, which increased the possibility that the mutation is potentially pathogenic.

2.5.1.2 SYNPO2

Synaptopodin 2, myopodin, is a multi adapter protein and encoded on chromosome 4, which interacts and co localizes with filamin and alpha-actinin during all stages of muscle development. Study [30] revealed the protein to be expressed in early stages of in vitro differentiation of human

skeletal muscle cells. Synaptopodin is thought to mediate in the early assembly and stabilization of the Z-disc, the Z-disc is one of the major components of muscle cells. An affected Z-disc can cause less functioning muscle cells and muscle, which could lead to severe problems. Mutations in several other genes, which encode Z-disc related proteins, have been found to cause myopathies and cardiomyopathies. The synaptopodin 2 variant is therefore a very interesting candidate for causing the severe phenotype [30][31]. The SYNPO2 mutation, c.1656A>C, was detected in exon 4 of the 5, and

altered the amino acid 552 lysine to asparagine. 2.5.1.3 DCHS2

Dachsous 2 (DCHS2) is on chromosome 4 encoded cadherin (calcium dependent adhesion molecule) protein. Cadherins are adhesion proteins which mediate cell to cell contact and responsible for tissue and organ organisation, cadherins depend on calcium ions to function. The DCHS2 protein functions as a homophilic cell adhesion protein, it binds to an identical DCHS2 protein of an adjacent cell [32].

The DCHS2 mutation, c.8351G>A, alters the protein 2784 serine to asparagine and was found in exon 25 of the 28.

2.5.1.4 SLC6A8

Solute carrier family 6 (neurotransmitter transporter, creatine), member 8 is a gene which encodes a transporting protein on chromosome X. The SLC6A8 protein transport creatine, which is a very important intermediate energy supply for muscle and nerve cells, into and out of the cell.

(18)

Creatine is used to store energy, phosphor is supplemented to creatine forming creatine phosphate (CP), the phosphor is cleaved of and added to ADP, resulting in ATP (energy). A deficiency in the transport of creatine into the cell results in a reduced amount of creatine, which leads to less energy production. A creatine deficiency can be caused by a defect in the SLC6A8 gene, a reduced amount of creatine can have severe consequences for organs with a high demand of energy. The SLC6A8 gene is linked to a specific trait, the X-linked creatine deficiency [33]. The mutation, c.691C>G, was found in

exon 4 of the 14 and alter the protein 231 leucine to valine. 2.5.1.5 Zyxin

Zyxin is a zinc-binding phosphoprotein and a component of the focal adhesion, which are actin-rich structures and connect the extracellular matrix to the cytoskeleton of the cell. It’s encoded on chromosome 7. These structures mediate the adhesion of the extracellular matrix to the cell and signal transduction. The zyxin binds to the alpha-actin, and is thought to mediate the assembly and control of the actin cytoskeleton. Alpha-actinin is one of the components of the Z-disc [34][35]. The

Zyxin mutation, c.263C>A, was detected in exon 3 of the 10, and alters the protein 88 alanine to asparagine.

2.5.1.6 KIAA1109

KIAA1109 is a gene which encodes a chromosome 4 protein whose function is the regulation of epidermal growth and differentiation. The region, long arm of chromosome 4, in which the gene is found, is associated with the susceptibility celiac disease [36]. The mutation, c.6664C>T, alters the

protein 2222 arginine to tryptophan and detected in exon 40 of the 84. 2.5.1.7 Transformation fibroblast - Myogenesis

Myogenesis is the development of muscle tissue, by means of fusion of myoblasts into myotubes, muscle fibers are formed. Using specific myogenic regulatory factors fibroblasts could be transformed into myotubes, failing muscle development can be detected. These specific factors are present in various vectors, these vectors are incorporated into an Adenovirus. The Adenovirus is used to transfect the fibroblasts, because of the inserted vectors, the fibroblasts begin to express, rather the fibroblast are triggered to form, muscle proteins. After the formation of the myoblasts and the fusion into myotubes, all sorts of test, qPCR, electron microscopy, Western Blots, antibody staining; can be conducted upon the myotubes[37].

2.5.2 Family DNA 08-5759

The family DNA 08-5759 wasn’t consanguineous and had one affected family member, the patient, which also died on a young age, had malfunctioning kidneys and Rhabdomyolysis. Rhabdomyolysis is a severe disease which breaks down the skeletal muscle, the disease probably caused the kidney failures. The metabolites of the damaged muscle cells are toxic for the kidneys. The subset of potential pathogenic variants contained 40 variants, most of these variants were excluded because of incorrect segregation. The potential pathogenic variants, which were validated and segregated correctly, KRTAP10-6 and PSPH, had respectively a high prediction score (Grantham, etc.) and the phenotype related to gene function. During the graduation term this variant was trivial.

(19)

2.5.2.1 KRTAP10-6

Keratin is a fibrous structural protein located in hair, skin and nails. The keratin proteins form intermediate filaments, in specific types of cells such as cells of the dermis the keratin filaments are a part of the cytoskeleton. In hair the keratin filaments include keratin associated proteins (KRTAP) in the formation of rigid and resistant hair shafts. The KRTAPs consist of three groups: high, ultrahigh cysteïne and high glycine-tyrosine, the KRTAP genes are divided in 27 subfamilies (KRTAP1 to 27). KRTAP10-6 is a high cysteïne KRTAP, encoded on 21, the protein is only expressed in hair [38][39][40].

The KRTAP10-6 mutations, c.184C>T and c.206C>G, detected in exon 1, it’s the only exon of this gene, altered the amino acid 62 arginine to cysteïne and 69 proline to arginine.

2.5.2.2 PSPH

The phosphoserine phosphatase gene encodes for a chromosome 7 catalyzing enzyme, the enzyme catalyzes the formation of L-serine. It’s a member of the haloacid dehalogenase (HAD) superfamily, members of this superfamily are very conserved. PSPH catalyzes the last and irreversible step in the formation of serine. During the catalysis, PSPH uses magnesium for the hydrolysis of L-phosphoserine, which results in two products: L-serine and inorganic phosphate (Pi) [40]. The

mutations, c.81A>T and c.95A>G, altered the amino acid asparagine to cysteïne and arginine to serine, and was found in exon 4 of the 8.

2.6 Aim/ hypothesis

The aim/ hypothesis of the project were to use WE-enrichment and Illumina sequencing to detect potential pathogenic variants in the mtDNA and gDNA. To filter the dataset of thousand detected variants, by using data and variant analyses, and to solve the genetic complexity of the mitochondrial myopathies. It’s assumed that the protein coding regions harbours the majority of the disease causing mutations, with the use of WES these regions are sequenced and the mutations are detected. The larger part of the genetic and pathophysiologic complexity of the mitochondrial myopathies can ‘’hopefully’’ be solved using WES, to, eventually, provide a more reliable diagnosis for the patient.

The thousand detected variants are filtered to find the potential pathogenic variant, the filtered subset is validated, tested on segregation and the gene function is linked to the phenotype. The remaining variants are functionally tested, targeting, gene expression levels and familial analysis; using these tests the pathogenic variant will be discovered.

(20)

3. Methodology

The following methodologies are standard methods used for the detection and validation of a pathogenic variant. It depended on the patient, rather phenotype of the patient, which functional methods were performed during the research.

3.1 Whole Exome Enrichment & Illumina Genoma Analyzer HiSeq 2000

Whole exome sequencing consist of multiple steps, first the WE enrichment kit, SureSelect Exon Enrichment (Agilent Technologies), second the cluster generation kit, TruSeq PE Cluster kit v3 and third the WE-sequencing kit, TruSeq SBS Kit v3 (200 cycles) (Illumina). During these three methods other methodologies were used, such as quality- and quantity checks (Qubit, Bioanalyzer), fragmentation (Covaris S2), and purification/ isolation using AMPure XP beads. These standard methods were copied from the apprenticeship report [42], every method is included in the appendix.

3.1.1 Sample preparation

For each DNA sample that had to be sequenced, 1 library was prepared. First the gDNA was sheared using the Covaris S2, appendix 3, after shearing the DNA sample was purified with the use of AMPure XP beads (Agencourt), appendix 4. The length of the fragmented and purified DNA sample was verified using the Bioanalyzer (Agilent 2100), appendix 5. After the verification, the ‘’End repair’’ mix was added to the sample, and it was purified using the AMPure XP beads. Then 10X Klenow Polymerase Buffer, dATP and Exo(-) Klenow was added to the DNA sample and it was purified once more with the use of AMPure XP beads. The purified DNA sample was added to the prepared ‘’ligation master mix’’, and purified using the AMPure XP beads. After purification the PCR components were added to the DNA sample, after PCR the sample was purified and its quantity, quality and size of fragments (±100-120bp) assessed using the Bioanalyzer. The sample preparation was performed using the SureSelect Exon Enrichment [10].

3.1.2 Hybridization

Each prepared DNA library sample was hybridized and captured; these weren’t pooled at this stage. First the ‘’hybridization buffers’’, the ‘’SureSelect Capture Library’’ mix and ‘’SureSelect Block’’ mix were prepared. The prepped DNA library sample was pipetted in the wells (row B) of a PCR plate, the ‘’SureSelect Block’’ mix was added to the same wells in the PCR plate and mixed by pipetting up and down. The hybridization buffers and capture library mix were also pipetted in the wells of respectively row A and C of the PCR plate. A fraction of the hybridization buffers and the entire content of the prepped DNA library sample were transferred to row C, the mixture was mixed by pipetting up and down and incubated for 24 hours at 65ºC. During incubation the magnetic beads were prepared using ‘’SureSelect Binding Buffer’’ and the ‘’SureSelect Wash Buffer #2’’. The hybridization mixture was added to the magnetic bead solution, the beads were washed using the ‘’SureSelect Wash Buffer #1 and 2’’ and the DNA was eluted with the use of the ‘’SureSelect Elution Buffer’’. After adding the ‘’SureSelect Neutralization Buffer’’ to the captured DNA, the DNA sample was purified using the AMPure XP beads. The hybridization was performed using the SureSelect Exon Enrichment [10].

(21)

3.1.3 Addition of Index Tags by Post-Hybridization Amplification

For each hybrid capture one amplification reaction was prepared. The captured DNA was added to the ‘’Herculase II Master Mix’’ in a PCR tube, after the addition of all the reagents the PCR tubes were loaded in the thermal cycler and the PCR program was run. After PCR the DNA sample was purified using the AMPure XP beads. The quality and quantity of the DNA was assessed with, respectively, the Bioanalyzer and qPCR. The samples were pooled after their quality and quantity check; the samples were added in an equimolar amount. After pooling, the samples were prepared for the cluster generation using the TruSeq Cluster Generation Kit, which included ‘’HP3 (2 N NaOH)’’, ‘’HT1 (Hybridization Buffer)’’ and the PhiX Control (known virus genome, serves as a control during sequence runs). The addition of index tags was performed using the SureSelect Exon Enrichment [10].

3.1.4 Cluster generation

First the cBot Reagent Plate was prepared by thawing the reagents in the 96-wells plate. 120 µl of the denatured DNA, last step 3.1.3, was loaded in an eight-tube strip. After preparing the reagent plate and DNA; the eight-tube strip, the 96-wells plate, waste bottle and the flow cell were loaded in the cBot. The single disposable self-piercing sippers were attached to the cBot above the flow cell and 96-wells reagent plate. The specific cBot clustering program was run. The cluster generation was performed using the TruSeq PE Cluster kit v3 [43].

3.1.5 Sequencing by synthesis

The TruSeq SBS multiplexing reagents were thawed and prepared for the first sequencing read. Before sequencing with the HiSeq 2000 all the sippers were flushed using water, after flushing the reagents and flow cell were loaded in the HiSeq 2000 and the first sequencing read began. After 4-5 days the paired-end reagents, which were stored, were freshly prepared for the second read and were loaded in the HiSeq 2000, replacing the reagents of the first read. After loading the reagents in the HiSeq 2000 the program was run. The SBS was performed using the TruSeq SBS kit v3 (200 cycles)

[44].

3.2 Data analysis

The obtained sequence dataset of all variants was filtered by the bioinformatics, the filtered subset of possible pathogenic variants were further reduced using additional analyses: validation, segregation of the variant and relating the function of the gene to the phenotype.

3.2.1 Validation & segregation

The validation and segregation of the variants were performed with the same methodologies. First primers were designed for the specific target gene, the amplicon contained the mutation. With these primers a PCR, including possible optimization, was conducted, the amplicon was visualized using gel electrophoresis. The amplicon was purified with the use of AMPure XP beads and sequenced using Sanger sequencing. The Sanger sequencing data was compared with a reference and/ or multiple family members. All these methods were copied from the apprenticeship report [42], and are included

(22)

3.2.2 Relate function gene to phenotype

The following text was typed in the NCBI database: ‘’name gene, Homo sapiens’’, and the search database was set to gene. The display of the gene was opened by clicking it and the summary, gene ontology or related articles were sought. The function of the gene were linked to the phenotype, genes with non-mitochondrial or metabolic related functions were excluded.

3.3 Functional tests

The reduced subset of validated variants was functionally tested, which included targeting, familiar and gene expression level analyses. Not all these functional tests were used, it depended on the type of variant, patient and phenotype of the patient which functional test was suitable.

3.3.1 Targeting

First primers were designed for the specific target gene, the amplicon contained the first 60-120 coding bp of the N-terminus. Also a restriction site of two different restriction enzymes and an overhang were added to the primers, each primer contained one of the two restriction sites and overhang. With these primers a slowdown PCR was conducted, the amplicon was visualized using gel electrophoresis. The sample was purified using the MSB Spin PCRapace (Invitek), after purification the purified amplicon and a vector (pEGFP-n1) were digested with restriction enzymes KpnI and AgeI. The digested sample and vector were purified again using MSB Spin PCRapace, after purification 1 µl T4 DNA ligase, ligase buffer and digested and purified vector were added to the sample. The sample was incubated overnight at 4ºC, the ligation was verified using a slowdown PCR and specific ‘’vector’’-primers, see appendix 2. PCR amplicons of insert-vector and ‘’empty’’-vector ligations were visualized using gel electrophoresis. For each sample, which contained an insert-vector ligation, an agar plate and 100 ml LB-medium were prepared containing kanamycin (30 µg/ml). During the autoclaving of the LB-agar and medium, for each sample one vial containing competent cells (100 µl MAX efficiency DH5 competent cells – Invitrogen) was transformed. The solidified agar plate was streaked with the competent cells and incubated overnight at 37ºC. The colonies were picked into a 15 ml glass tube containing 3 ml LB-medium (+ 30 µg/ml kanamycin), for each colony one tube was prepared. The tubes were incubated for 7-8 hours, 250 rpm, at 37ºC, after incubation the vectors were isolated using the GeneJET Plasmid Miniprep Kit (Fermentas Life Sciences). The isolated vectors were quantified by the Nanodrop, and verified using a slowdown PCR and a NcoI digestion. After verifying the insert-vector samples, these were mixed, in the proper concentration, with the FuGene HD Transfection Reagent (Roche Applied Science) and used to transfect Hela cells. For each sample a chamber slide containing 50-70% confluent Hela cells was used. The transfected Hela cells were incubated for 1-2 days at 37ºC, after incubation Mitotracker Red CMXRos (Molecular Probes, Invitrogen) was added to the chamber slides and the cells were fixed. The slides were dried by air and a cover glass was placed to cover the cells imprisoning a drop of DABCO DAPI. The slides were examined using a fluorescent microscope.

3.3.2 Analyses other patients

The analyses of other patients included the same protocols that were used during the validation and segregation of the variant. All these methods were copied from the apprenticeship report [42].

(23)

3.3.3 Gene expression levels

First the cultured cell were harvested, RNA was isolated from these cells using the High Pure RNA isolation kit (Roche Applied Science). 1-2 µg RNA was used for the reverse transcriptase reaction, the volume of the RNA was adjusted to 26 µl using DEPC-water. The following reagents were added to the RNA: 4 µl Reverse Transcriptase 200 U (Finzymes), 1,5 µl oligodT 500 µg/ml (Invitrogen), 1,5 µl random hexameer primer 500 µg/ml, 5,0 µl dNTPs 10mM, 1 µl RNAsin 40 U (Promega), 1 µl First strand buffer 10x (Finzymes). After incubating for 1 hour at 42ºC and 5 minutes at 95ºC, the cDNA was used for a qPCR. The data was analyzed to estimate the relative gene expression level of the gene.

3.3.4 Transformation fibroblasts

Two of the wells of a 6-wells plate were first covered with a 1:50 dilution of matrigel, the matrigel was solidified during two hours of incubation, these two wells were used for the transfection. Then the patient fibroblasts and control fibroblast were seeded resulting in a 50-70% confluent well. 100 µl of a 2.5*107 adenovirus particles solution was added to 3 ml DMEM medium (+10% FBS, 1%P/S),

1.5 ml was added to each of the two wells containing matrigel. After approximately three hours the transfection medium was removed, the two wells were washed twice using PBS (1%) and 2 ml clean DMEM medium was added. During the 24 hours incubation the differentiation medium was made: 50 ml DMEM + 4.5 g/l glucose containing 0.5% BSA, 0.15 mg/µl creatine, 5 ng/ml insulin; the solution was filtered through a 0.2 µm filter and before use EGF (final conc.:10 ng/µl) was added. After the (24 hours) incubation the DMEM medium was removed and the differentiation medium was pipetted in the two wells containing matrigel. Every 1-2 days the medium was changed by fresh differentiation medium [45].

(24)

4. Results & Discussion/ Conclusion

4.1 Family DNA 07-2283

4.1.1 Variants

During the WES thousands of variants are detected, these variants are filtered to a reduced and manageable subset of potential pathogenic variants. The number of variants during each filter step is included in table 1.

Table 1: Number of variants of family DNA 07-2283 during each filter step.

Filter step: Number of variants

Total detected variants ± 39914

No RS number & low frequency ± 7241

Coding regions & non-synonymous ± 1472

Homozygous/ compound heterozygous (2 alleles) ± 694

Only homozygous ± 195

Because of the high number of variants remaining, the family of the two patients were homozygosity mapped against each other. During the mapping of the data a few variants remained which were encountered in both patients, LAMA3, SYNPO2, DCHS2 and KIAA1109. Two variants were picked from the larger WES data, these variants, ZYX and SLC6A8, were chosen because of the function of the gene, which could be related to the phenotype. The damaging effect of the variants was determined using the Grantham, PolyPhen-2 and SIFT score. This small subset of variants was chosen, because of their prediction score or metabolic or pathway related function (theoretical background). The prediction scores of the six variants, LAMA3, SYNPO2, DCHS2, SLC6A8, Zyx and KIAA1109; are depicted in table 2.

Table 2: The damaging effect of the six potential pathogenic variants, determined by various prediction scores.

Prediction program Variants

Grantham score PolyPhen-2 SIFT

LAMA3 102 1.000 probably damaging 0 Damaging SYNPO2 94 0.493 possibly damaging 0.43 Tolerated DCHS2 46 0.067 benign 0.71 Tolerated SLC6A8 32 0.003 benign 0.37 Tolerated ZYX 126 0.079 benign 0.28 Tolerated KIAA1109 101 0.999 probably damaging 0 Damaging

(25)

Discussion/ Conclusion

After the filtering steps the number of variants was high, the remaining subset of variants were to be reduced. The family is consanguineous, meaning the parents are related to each other and could have a large genetic similarity. That’s why, an additional strategy was used, the family of the patients was mapped against each other to find significant homozygosity regions. The additional strategy was combined with the WES data and four variants remained, the other two variants were picked from the WES data. These variants were chosen because of their relatedness with the phenotype, ZYX is a Z-disc related protein and SLC6A8 creatine related, which is used in muscles to story energy. Of the four variants, which were found during the homozygosity mapping and WES combination, only LAMA3 and SYNPO2 were thought to be interesting. The two other variants, KIAA1109, which is related to celiac diseases, and DCHS2, which is related to cadherin proteins, might be less trivial because of their function.

The three prediction scores predict generally, overall the variants, the same damaging effect, the variant is interesting whenever the scores predict the variant to be possibly or probably damaging. Possibly damaging: Grantham table 101, PolyPhen 2.0 0.151 and SIFT 0.05; probably damaging: Grantham 150, PolyPhen 2.0 0.851 and SIFT 0.05. The prediction scores of LAMA3 indicate that the alteration of the amino acid is a severe change and is possibly to probably damaging. SYNPO2, DCHS2, SLC6A8 and ZYX are more tolerated mutations, although SYNPO2 might be possibly damaging. The KIAA1109 variant is predicted to be probably damaging, despite of the high prediction score the variant is thought to be less important because of its function. The most interesting variants, according to their relatedness to the phenotype and prediction scores, are LAMA3 and SYNPO2. However the prediction score, as the name already indicates, is a prediction of the damaging effect, only functional tests will prove the variant to be damaging or not. Yet, the prediction programs remain a useful tool to filter the potential pathogenic variants.

4.1.2 Sanger sequencing validation & segregation

The subset of potential pathogenic variants was to be validated and tested on segregation, of most genes, not every gene has been validated or tested for segregation, the validation and segregation data is included, an overview is depicted in table 3. The Sanger sequence data, forward and reverse strand, is mapped against the reference, the variant and its reference are marked between the two striped lines. The codon, in which the mutation is located, is underlined, the genotype of the patient, mother and father is also depicted in the family tree. Primers for the Sanger sequencing are included in appendix 9.

Table 3: Overview of validation and segregation of the variants.

Variant Status of validation Segregation check

LAMA3 Confirmed Segregated correctly

SYNPO2 Confirmed Segregated correctly

DCHS2 Confirmed Not tested

SLC6A8 Unconfirmed Segregated incorrectly

ZYX Confirmed as heterozygous, not

homozygous

Segregated correctly as a heterozygous variant

(26)

LAMA3,

c.2234G>T p.Arg745Leu (figure 7):

Patient (T/T): Father (G/T):

Forward: 5’ CTT C T A TTT 3’ Forward: 5’ CTT C T/G A TTT 3’

Reference: 5’ CTT C G A TTT 3’ Reference: 5’ CTT C G A TTT 3’

Reverse: 5’ AAA T A G AAG 3’ Reverse: 5’ AAA T A/C G AAG 3’

Reference: 5’ AAA T C G AAG 3’ Reference: 5’ AAA T C G AAG 3’

Mother (G/T):

Forward: 5’ CTT C T/G A TTT 3’ Reverse: 5’ AAA T A/C G AAG 3’

Reference: 5’ CTT C G A TTT 3’ Reference: 5’ AAA T C G AAG 3’ In the forward strand the T is the mutation, in the reverse strand the A is the mutation.

SYNPO2,

c.1656A>C p.Lys552Asn (figure 8):

Patient (C/C): Father (A/C):

Forward: 5’ GCA A C A GCT 3’ Forward: 5’ GCA A A/C A GCT 3’

Reference: 5’ GCA A A A GCT 3’ Reference: 5’ GCA A A A GCT 3’

Reverse: 5’ AGC T G T TGC 3’ Reverse: 5’ AGC T G/T T TGC 3’

Reference: 5’ AGC T T T TGC 3’ Reference: 5’ AGC T T T TGC 3’

Mother (A/C):

Forward: 5’ GCA A A/C A GCT 3’ Reverse: 5’ AGC T G/T T TGC 3’

Reference: 5’ GCA A A A GCT 3’ Reference: 5’ AGC T T T TGC 3’ In the forward strand the C is the mutation, in the reverse strand the G is the mutation.

G/T G/T

T/T T/T

Figure 7: Family tree of the LAMA3 variant. Both the mother and father are heterozygous (no trait, carriers, crossed sign (G/T)), the patients were both homozygous (trait, fully colored (T/T)). Grandparents weren’t included in the genotyping (blank).

A/C A/C

C/C C/C

Figure 8: Family tree of SYNPO2 variant. Both the mother and father are heterozygous (no trait, carriers, crossed sign (A/C)), the patients were both homozygous (trait, fully colored (C/C). Grandparents weren't included in the genotyping (blank).

(27)

DCHS2,

c.8351G>A p.Ser2784Asn:

Patient (A/A):

Forward: 5’ GGC A A T AAA 3’ Reverse: 5’ TTT A T T GCC 3’

Reference: 5’ GGC A G T AAA 3’ Reference: 5’ TTT A C T GCC 3’

In forward strand the A is the mutation, in the reverse strand the T is the mutation. The segregation of the DCHS2 mutation wasn’t tested, because of the priority of the other genes.

SLC6A8,

c.691C>G

p.Leu231Val (figure 9):

Patient (G/G): Father (C/G):

Forward: 5’ GCC C TC AAC 3’ Forward: 5’ GCC C TC AAC 3’

Reference: 5’ GCC C TC AAC 3’ Reference: 5’ GCC C TC AAC 3’

Reverse: 5’ GTT G AG GGC 3’ Reverse: 5’ GTT GA G GGC 3’

Reference: 5’ GTT G AG GGC 3’ Reference: 5’ GTT GA G GGC 3’

Mother (C/G):

Forward: 5’ GCC C TC AAC 3’ Reverse: 5’ GTT GA G GGC 3’

Reference: 5’ GCC C TC AAC 3’ Reference: 5’ GTT GA G GGC 3’

In the forward strand a G is a mutation, in the reverse strand a C is a mutation. The mutation has not been observed in the Sanger sequence results.

ZYX, c.263C>A p.Ala88Asp (figure 10, next page)

Patient (C/A): Father (C/A):

Forward: 5’ GGT G C/ A T CTG 3’ Forward: 5’ GGT G C/A T CTG 3’

Reference: 5’ GGT G C T CTG 3’ Reference: 5’ GGT G C T CTG 3’

Reverse: 5’ CAG A G/T C ACC 3’ Reverse: 5’ CAG A G/T C ACC 3’

Reference: 5’ CAG A G C ACC 3’ Reference: 5’ CAG A G C ACC 3’ Mother (C/C):

Forward: 5’ GGT G C T CTG 3’ Reverse: 5’ CAG A G C ACC 3’

Reference: 5’ GGT G C T CTG 3’ Reference: 5’ CAG A G C ACC 3’

In forward strand the A is the mutation, in the reverse strand the T is the mutation. C/C

C/C

C/C

C/C

Figure 9: Family tree of SLC6A8 variant. Both the mother and father are homozygous (no trait, blank (C/C)), the patients were also both homozygous (no trait, blank (C/C)). Grandparents weren't included in the genotyping (blank).

(28)

KIAA1109, c.6664C>T p.Arg2222Trp (figure 12) Hasn’t been validated or tested on segregation.

Discussion/ Conclusion:

The LAMA3, SYNPO2, DCHS2 and ZYX variants were all confirmed during the Sanger sequencing validation. The SLC6A8 variant was unconfirmed, which signifies that SLC6A8 is a false positive read and is therefore excluded. Zyxin is actually a heterozygous variant, and not homozygous. The KIAA1109 variant wasn’t validated and tested on segregation, because the variant couldn’t be linked to the phenotype. Only the LAMA3, high prediction scores, SYNPO2, SLC6A8 and zyxin, all related to the phenotype, variants were tested on segregation. The DCHS2 wasn’t tested on segregation, the function of the gene didn’t relate to the phenotype. According to the results of the segregation test, it’s concluded that both LAMA3 and SYNPO2 segregated correctly. The zyxin variants did also segregate correctly as a heterozygous variant, the mother didn’t have a mutation (C/C), the variant in the patients can therefore never be homozygous.

The parents, who had no trait, were both carriers of a heterozygous LAMA3 or SYNPO2 mutation (LAMA3, G/T; SYNPO2, A/C), the two patients were homozygous to the mutation (LAMA3, T/T; SYNPO2, C/C). The SLC6A8 variant wasn’t segregated correctly, the parents were both homozygous and didn’t have any mutation (C/C). As mentioned before the patients didn’t either, both were homozygous for the wild type (C/C). The segregation and validation of the SLC6A8 variant was done simultaneously, to run the validation and segregation of one gene separately would be time-consuming.

Only the LAMA3 and SYNPO2 variants are left after the validation and segregation of the variants, these are thought to be the most trivial variants. LAMA3 is related to the basal membrane and requires material, which at the moment isn’t available, therefore SYNPO2 was chosen for the functional tests. SYNPO2 is linked to the alpha actin in muscle cells, rather the Z-disc assembly, which makes SYNPO2 an easier variant to investigate.

A/C

A/C C/C

A/C

Figure 10: Family tree of the ZYX variant. The mother is homozygous for the wild-type (C/C, no trait, blank) the father is heterozygous (no trait, carrier, crossed sign (A/C)), the patients were both heterozygous (yet a trait, crossed sign because of heterozygosity (A/C)). Grandparents weren't included in the genotyping

Referenties

GERELATEERDE DOCUMENTEN

Mechanisms of mtDNA segregation and mitochondrial signalling in cells with the pathogenic A3243G mutation.. Jahangir

Here we used two strategies for single cell A3243G mtDNA mutation load quantization: i) physical isolation of individual cells by single cell sorting, followed by

In a first series of experiments we generated, by PCR-FMT (21), mutation load histograms of individual cells in multiple passages of 3 sub-cloned A3243G mtDNA 143B

With the aim to elucidate pathways involved in mitochondrial-nuclear genome cross-talk, we have undertaken a genome-wide analysis of the alterations in nuclear gene expression

To identify such responses we extensively compared nuclear expression profiles of cell clones proficient and deficient in mitochondrial respiration because of A3243G mtDNA mutation..

Next to mutation accumulation by still elusive segregation mechanisms, it has been suggested that mtDNA disease expression is modulated by aberrant mtDNA gene products, interacting

By single cell mutation analysis at a time point where random segregation should have been obvious by appearance of homoplasmic cells (genetic fixation), it was found in one study

Door middel van analyse van de mutatiegraad in individuele cellen op een tijdstip dat er volgens een segregatie mechanisme op basis van willekeur homoplastische cellen aanwezig