• No results found

University of Groningen MicroRNA expression and functional analysis in Hodgkin lymphoma Yuan, Ye

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen MicroRNA expression and functional analysis in Hodgkin lymphoma Yuan, Ye"

Copied!
45
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

MicroRNA expression and functional analysis in Hodgkin lymphoma

Yuan, Ye

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Yuan, Y. (2019). MicroRNA expression and functional analysis in Hodgkin lymphoma. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

52. Klefstrom, J., et al., Induction of TNF-sensitive cellular phenotype by c-Myc involves p53 and impaired NF-kappaB activation. EMBO J, 1997. 16(24): p. 7382-92.

53. Feuerborn, A., et al., Dysfunctional p53 deletion mutants in cell lines derived from Hodgkin's lymphoma. Leuk Lymphoma, 2006. 47(9): p. 1932-40.

54. Liu, Y., et al., The mutational landscape of Hodgkin lymphoma cell lines determined by whole-exome sequencing. Leukemia, 2014. 28(11): p. 2248-51.

55. Hoffman, B. and D.A. Liebermann, Apoptotic signaling by c-MYC. Oncogene, 2008. 27(50): p. 6462-72.

56. Hueber, A.O., et al., Requirement for the CD95 receptor-ligand pathway in c-Myc-induced apoptosis. Science, 1997. 278(5341): p. 1305-9.

57. Dansen, T.B., et al., Specific requirement for Bax, not Bak, in Myc-induced apoptosis and tumor suppression in vivo. J Biol Chem, 2006. 281(16): p. 10890-5.

58. Jiang, X., Y.H. Tsang, and Q. Yu, c-Myc overexpression sensitizes Bim-mediated Bax activation for apoptosis induced by histone deacetylase inhibitor suberoylanilide hydroxamic acid (SAHA) through regulating Bcl-2/Bcl-xL expression. Int J Biochem Cell Biol, 2007. 39(5): p. 1016-25.

CHAPTER 3

Setting up a high-throughput screen in Hodgkin

lymphoma

(3)

CHAPTER 3A

Feasibility testing of a high-throughput screen in

Hodgkin lymphoma cell lines using a barcoded

empty vector approach

Ye Yuan1,4, Joost Kluiver1, Jan Osinga2, Maria Sarkis Azkanaz2, Martijn Terpstra2, Jantine Sietzema1, Jasper Koerts1, Debora de Jong1, Leonid Bystrykh3, Klaas Kok2, Anke van den Berg1

Department of 1Pathology and Medical Biology, 2Genetics, 3European Research Institute for Biology of Ageing, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands. 4Institute of Clinical Pharmacology of the Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China.

(4)

CHAPTER 3A

Feasibility testing of a high-throughput screen in

Hodgkin lymphoma cell lines using a barcoded

empty vector approach

Ye Yuan1,4, Joost Kluiver1, Jan Osinga2, Maria Sarkis Azkanaz2, Martijn Terpstra2, Jantine Sietzema1, Jasper Koerts1, Debora de Jong1, Leonid Bystrykh3, Klaas Kok2, Anke van den Berg1

Department of 1Pathology and Medical Biology, 2Genetics, 3European Research Institute for Biology of Ageing, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands. 4Institute of Clinical Pharmacology of the Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China.

(5)

Abstract

In this chapter, we aimed to test the feasibility of a high-throughput screening approach using the cellular barcoding principle in Hodgkin lymphoma. Potential problems in this type of experiments might be a high background noise or the presence of cancer stem cells that will mask true effects of modulated genes or microRNAs. We infected HL cell lines in duplicate with a mix of barcoded lentiviral constructs (n=500 in the first experiment and n=222 in the second experiment). GFP+ cells were sorted at day 5, day 13 and day 21 after infection for DNA isolation and barcode fragments were amplified in duplicate or triplicate with tagged forward primers. PCR products were mixed and subjected to next generation sequencing. Data were de-multiplexed and read counts were normalized to 40,000. The variation in read counts per barcode in the duplicate PCRs was large in the first experiment due to various suboptimal technical conditions. After optimizing the experimental conditions, the barcode read counts between triplicate PCR results showed a much higher consistency in the second experiment. The average read count per barcode ranged from 46 to 572. Fold changes in construct abundance at day 13 and day 21 relative to day 5 showed limited variation, with an average fold change of 1.01 (SD=0.185). In conclusion, the limited variation in read counts per barcode indicated that the experimental variation is limited and that a high throughput screening approach using cellular barcoding is feasible in HL.

Introduction

DNA barcoding in combination with high-throughput sequencing has become a popular tool to follow individual cells. [1, 2] The basic idea of barcoding is to create a cell-specific DNA mark to uniquely label each individual cell. This so-called cellular barcoding offers a powerful approach to characterize the growth pattern and differentiation potential of individual cells. [3] Some researchers have used cellular barcoding to track human cord blood (CB) cells upon transplantation into mice. This provided a detailed overview of the growth and differentiation of human CB cells in vivo. [4] Cellular barcoding has also been used to track progeny after transplantation of normal stem cells both in vitro and in vivo [1, 5] and to measure clonal contribution of individual stem cells to hematopoiesis. [6-8] Gerrits and colleagues tracked cell growth of primary BM cells upon transplantation into lethally irradiated C57BL/6 mice to track the differentiation potential of individual stem cells in vivo over time. [2] A similar barcoding approach was used to investigate how often HSCs divide, how divisional history relates to repopulating potential and how many individual HSCs contribute to hematopoiesis. [9]

A currently unexplored field of research is to study the effect of miRNA gain- and loss-of-function in a high-throughput approach in Hodgkin lymphoma. A potential problem of this approach in Hodgkin lymphoma cell lines could be the presence of cancer stem cells [10-12]. In this study, we used a lentiviral GFP barcoded empty vector library to study the feasibility of high-throughput screening in Hodgkin lymphoma cell lines and to determine the background of such an approach.

Materials and methods

Preparation of empty vector-barcodes (EV-BC) library

The barcodes were designed using a fixed strategy with six (type I) or seven (type II) times a variable nucleotide triplet separated by a constant nucleotide doublet (Figure 1). The oligo’s were hybridized and ligated into the BsrGI and BamHI restriction sites of the lentiviral PeGZ2 vector [13]. 500 individual colonies were randomly picked and cultured for DNA isolation. Plasmids were mixed (Mix 1) and the mix was used for the generation of lentiviral particles. To identify the sequence of each barcode, next generation sequencing was performed using the plasmid mix as described below. Data

(6)

3

Abstract

In this chapter, we aimed to test the feasibility of a high-throughput screening approach using the cellular barcoding principle in Hodgkin lymphoma. Potential problems in this type of experiments might be a high background noise or the presence of cancer stem cells that will mask true effects of modulated genes or microRNAs. We infected HL cell lines in duplicate with a mix of barcoded lentiviral constructs (n=500 in the first

experiment and n=222 in the second experiment). GFP+ cells were sorted at day 5,

day 13 and day 21 after infection for DNA isolation and barcode fragments were amplified in duplicate or triplicate with tagged forward primers. PCR products were mixed and subjected to next generation sequencing. Data were de-multiplexed and read counts were normalized to 40,000. The variation in read counts per barcode in the duplicate PCRs was large in the first experiment due to various suboptimal technical conditions. After optimizing the experimental conditions, the barcode read counts between triplicate PCR results showed a much higher consistency in the second experiment. The average read count per barcode ranged from 46 to 572. Fold changes in construct abundance at day 13 and day 21 relative to day 5 showed limited variation, with an average fold change of 1.01 (SD=0.185). In conclusion, the limited variation in read counts per barcode indicated that the experimental variation is limited and that a high throughput screening approach using cellular barcoding is feasible in HL.

Introduction

DNA barcoding in combination with high-throughput sequencing has become a popular tool to follow individual cells. [1, 2] The basic idea of barcoding is to create a cell-specific DNA mark to uniquely label each individual cell. This so-called cellular barcoding offers a powerful approach to characterize the growth pattern and differentiation potential of individual cells. [3] Some researchers have used cellular barcoding to track human cord blood (CB) cells upon transplantation into mice. This provided a detailed overview of the growth and differentiation of human CB cells in vivo. [4] Cellular barcoding has also been used to track progeny after transplantation of normal stem cells both in vitro and in vivo [1, 5] and to measure clonal contribution of individual stem cells to hematopoiesis. [6-8] Gerrits and colleagues tracked cell growth of primary BM cells upon transplantation into lethally irradiated C57BL/6 mice to track the differentiation potential of individual stem cells in vivo over time. [2] A similar barcoding approach was used to investigate how often HSCs divide, how divisional history relates to repopulating potential and how many individual HSCs contribute to hematopoiesis. [9]

A currently unexplored field of research is to study the effect of miRNA gain- and loss-of-function in a high-throughput approach in Hodgkin lymphoma. A potential problem of this approach in Hodgkin lymphoma cell lines could be the presence of cancer stem cells [10-12]. In this study, we used a lentiviral GFP barcoded empty vector library to study the feasibility of high-throughput screening in Hodgkin lymphoma cell lines and to determine the background of such an approach.

Materials and methods

Preparation of empty vector-barcodes (EV-BC) library

The barcodes were designed using a fixed strategy with six (type I) or seven (type II) times a variable nucleotide triplet separated by a constant nucleotide doublet (Figure 1). The oligo’s were hybridized and ligated into the BsrGI and BamHI restriction sites of the lentiviral PeGZ2 vector [13]. 500 individual colonies were randomly picked and cultured for DNA isolation. Plasmids were mixed (Mix 1) and the mix was used for the generation of lentiviral particles. To identify the sequence of each barcode, next generation sequencing was performed using the plasmid mix as described below. Data

(7)

analysis of the plasmid mix revealed 522 unique EV-BC constructs, with a distance of at least two nucleotides in the variable triplet positions. Read counts of these 522 EV-BC was used to analyze the data of the first experiment.

Based on the suboptimal results of the first experiment, we decided to verify the insert sequence of each individual construct by Sanger sequencing. To have a uniform insert size we decided to focus on type II inserts (seven random triplets, n=417). Type II insert constructs with concatemer BC inserts (n=24) based on sizes of the insert were excluded. The remaining constructs (n=393) were sequenced individually to verify the insert and the primer binding sites. This revealed 222 constructs with a correct insert sequence and fully matching binding sites for the primers used to amplify the inserts. These constructs were used for generation of lentiviral particles (Mix 2).

Culture of Hodgkin lymphoma cell lines

L540 (nodular sclerosis, T-cell derived) and KM-H2 (mixed cellularity) cells were cultured in RPMI-1640 (Cambrex Biosciences, Walkersville, USA) medium at 37℃ in an atmosphere containing 5% CO2. Culture medium was supplemented with 2mM ultraglutamine (Cambrex Biosciences), 100 U/ml penicillin/streptomycin and 10% FBS (Cambrex Biosciences) for KM-H2 and 20% FBS for L540.

Lentiviral transduction

Lentiviral particles were produced in HEK-293T cells by calcium phosphate precipitation transfection of a mixture of the EV-BC constructs. Briefly, HEK-293T cells were seeded in a 6-well plate and grown till 70-80% confluence. A plasmid mix consisting of 15µl CaCl2 (2.5M), 1µg pMSCV-VSV-G, 1µg pRSV.REV, 1µg pMDL-gPRRE, 2µg lentiviral vector barcode pool and 150µl of 2x HBS was prepared to infect the HEK-293T cells. Virus was harvested 48 hours after transfection and supernatants were filtered through a 0.45-µm filter and stored at -80℃. Two different virus pools were generated for plasmid mix 1 and one virus pool for plasmid mix 2. Around 5 million cells were transduced with different amounts of virus in the presence of 4 µg/ml polybrene. Approximately 1.5% of the cells were washed with PBS and analyzed by FACS to determine the percentage of GFP+ cells at day 4. For follow-up experiments, the cultures with <20% of GFP+ cells were selected to avoid multiple infections per cell. Each cell line was infected in duplicate.

Cell preparation and sorting

At day 5, 13 and 21, a total of >5 million cells were prepared for sorting, whereas the remainder of the cells (>2 million) was used to continue the culture. The cells were centrifuged at 1,000rpm for 5 minutes and washed with PBS for three times. Cells were re-suspended in PBS, filtered and kept on ice till sorting. GFP+ cells were sorted on the MoFlo sorter using a 70-µm nozzle (BD Biosciences, San. Jose, California, USA). The percentage of GFP-positive cells in the sorted fraction was determined by FACS on a small aliquot of the sorted cells. Sorted cells were harvested by centrifugation at 1,000rpm for 5 minutes and cell pellets were stored at -20℃.

DNA isolation

Cell pellets were resuspended in 0.5ml SE buffer of 75mM NaCl and 25mM EDTA. After adding 50µl 10% SDS and 2.5µl proteinase K (20mg/ml) cells were incubated at 55℃ overnight. 180µl pre-heated 6M NaCl solution and 700µl chloroform were added to the cell lysate and mixed for 1 hour. The upper phase was collected after a centrifugation step of 1 hour at 5,500rpm and 4℃. DNA was precipitated by adding 700µl isopropanol and the pellet was washed with 70% ethanol. After air-drying the DNA was dissolved overnight in a buffer containing 10mM Tris-HCl and 0.1mM EDTA adjusted to pH 8.0. The DNA concentration was measured with the NanoDropTM 1000 Spectrophotometer (Thermo Fisher Scientific Inc., Waltham, Massachusetts, USA) and the quality was checked on a 1% agarose gel.

Polymerase chain reaction (PCR)

For the first experiment, we used the PCR kit from GE healthcare (Little Chalfont, UK). Briefly, each DNA sample was amplified in duplicate using an input of 240ng DNA in a PCR mixture consisting of 1x PCR buffer, 0.2mM dNTP, 1.5mM MgCl2, 1U Taq DNA polymerase in a total volume of 30µl. The PCR consisted of 32 cycles of 94℃ for 30 seconds, 60℃ for 30 seconds and 72℃ for 30 seconds, with an initial denaturation step of 5 minutes and a final extension step of 7 minutes. Two universal forward primers (eGFPfwd3E 5’-CTCGGCATGGACGAGCTG-3’ and eGFPfwd3L 5’-GGCATGGACGAGCTGTACAAG-3’) with a unique flag (Supplementary Table 1) for each individual PCR and five different reverse primers (rev-L5’-GGGGGATCCTCACTGGCC-3’; rev-L+1 5’-T(rev-L5’-GGGGGATCCTCACTGGCC-3’; BC-rev-L+2 5’-ATGGGGGATCCTCACTGGCC-3’; BC-rev-L+5

(8)

5’-3

analysis of the plasmid mix revealed 522 unique EV-BC constructs, with a distance of at least two nucleotides in the variable triplet positions. Read counts of these 522 EV-BC was used to analyze the data of the first experiment.

Based on the suboptimal results of the first experiment, we decided to verify the insert sequence of each individual construct by Sanger sequencing. To have a uniform insert size we decided to focus on type II inserts (seven random triplets, n=417). Type II insert constructs with concatemer BC inserts (n=24) based on sizes of the insert were excluded. The remaining constructs (n=393) were sequenced individually to verify the insert and the primer binding sites. This revealed 222 constructs with a correct insert sequence and fully matching binding sites for the primers used to amplify the inserts. These constructs were used for generation of lentiviral particles (Mix 2).

Culture of Hodgkin lymphoma cell lines

L540 (nodular sclerosis, T-cell derived) and KM-H2 (mixed cellularity) cells were cultured in RPMI-1640 (Cambrex Biosciences, Walkersville, USA) medium at 37℃ in

an atmosphere containing 5% CO2. Culture medium was supplemented with 2mM

ultraglutamine (Cambrex Biosciences), 100 U/ml penicillin/streptomycin and 10% FBS (Cambrex Biosciences) for KM-H2 and 20% FBS for L540.

Lentiviral transduction

Lentiviral particles were produced in HEK-293T cells by calcium phosphate precipitation transfection of a mixture of the EV-BC constructs. Briefly, HEK-293T cells were seeded in a 6-well plate and grown till 70-80% confluence. A plasmid mix

consisting of 15µl CaCl2 (2.5M), 1µg pMSCV-VSV-G, 1µg pRSV.REV, 1µg

pMDL-gPRRE, 2µg lentiviral vector barcode pool and 150µl of 2x HBS was prepared to infect the HEK-293T cells. Virus was harvested 48 hours after transfection and supernatants were filtered through a 0.45-µm filter and stored at -80℃. Two different virus pools were generated for plasmid mix 1 and one virus pool for plasmid mix 2. Around 5 million cells were transduced with different amounts of virus in the presence of 4 µg/ml polybrene. Approximately 1.5% of the cells were washed with PBS and analyzed by FACS to

determine the percentage of GFP+ cells at day 4. For follow-up experiments, the

cultures with <20% of GFP+ cells were selected to avoid multiple infections per cell.

Each cell line was infected in duplicate.

Cell preparation and sorting

At day 5, 13 and 21, a total of >5 million cells were prepared for sorting, whereas the remainder of the cells (>2 million) was used to continue the culture. The cells were centrifuged at 1,000rpm for 5 minutes and washed with PBS for three times. Cells were

re-suspended in PBS, filtered and kept on ice till sorting. GFP+ cells were sorted on

the MoFlo sorter using a 70-µm nozzle (BD Biosciences, San. Jose, California, USA). The percentage of GFP-positive cells in the sorted fraction was determined by FACS on a small aliquot of the sorted cells. Sorted cells were harvested by centrifugation at 1,000rpm for 5 minutes and cell pellets were stored at -20℃.

DNA isolation

Cell pellets were resuspended in 0.5ml SE buffer of 75mM NaCl and 25mM EDTA. After adding 50µl 10% SDS and 2.5µl proteinase K (20mg/ml) cells were incubated at 55℃ overnight. 180µl pre-heated 6M NaCl solution and 700µl chloroform were added to the cell lysate and mixed for 1 hour. The upper phase was collected after a centrifugation step of 1 hour at 5,500rpm and 4℃. DNA was precipitated by adding 700µl isopropanol and the pellet was washed with 70% ethanol. After air-drying the DNA was dissolved overnight in a buffer containing 10mM Tris-HCl and 0.1mM EDTA adjusted to pH 8.0. The DNA concentration was measured with the NanoDropTM 1000 Spectrophotometer (Thermo Fisher Scientific Inc., Waltham, Massachusetts, USA) and the quality was checked on a 1% agarose gel.

Polymerase chain reaction (PCR)

For the first experiment, we used the PCR kit from GE healthcare (Little Chalfont, UK). Briefly, each DNA sample was amplified in duplicate using an input of 240ng DNA in a

PCR mixture consisting of 1x PCR buffer, 0.2mM dNTP, 1.5mM MgCl2, 1U Taq DNA

polymerase in a total volume of 30µl. The PCR consisted of 32 cycles of 94℃ for 30 seconds, 60℃ for 30 seconds and 72℃ for 30 seconds, with an initial denaturation step of 5 minutes and a final extension step of 7 minutes. Two universal forward primers

(eGFPfwd3E 5’-CTCGGCATGGACGAGCTG-3’ and eGFPfwd3L

5’-GGCATGGACGAGCTGTACAAG-3’) with a unique flag (Supplementary Table 1) for each individual PCR and five different reverse primers (rev-L5’-GGGGGATCCTCACTGGCC-3’; rev-L+1 5’-T(rev-L5’-GGGGGATCCTCACTGGCC-3’;

(9)

5’-AATATGGGGGATCCTCACTGGCC-3’; and BC-rev-L+6 5’-TAATATGGGGGATCCTCACTGGCC-3’) were used randomly. PCR primers were randomized amongst all samples irrespective of the infected cell line, virus pool and PCR replicate. By performing duplicate PCR reactions, 24 PCR products were derived from 12 DNA samples (2 cell lines, 3 time points, duplicate infection). PCR products were analyzed on 2% agarose gel to check the amplification (93-99bp) and to estimate the yield. PCR products were mixed depending on band intensities.

After optimization of the procedures, we performed a triplicate PCR using AmpliTaq DNA Polymerase (Thermo Fisher, Waltham, Massachusetts, USA) in the second set of experiments. The DNA input was 400ng in a PCR mixture consisting of 1x PCR buffer, 0.2mM dNTP, 1.5mM MgCl2, 1U AmpliTaq DNA polymerase in a total volume of 30µl. The PCR consisted of 32 cycles of 94℃ for 30 seconds, 58℃ for 30 seconds and 72℃ for 30 seconds, with an initial denaturation step of 10 minutes and a final extension step of 7 minutes. We used one universal forward primer (5’-TCTCGGCATGGACGAGCTG-3’) with a unique 9nt flag (Supplementary Table 2) for each individual PCR and 2 different reverse primers, i.e. BC-rev-L+1-TIIBC TGGGGGATCGTCACTGGCC-3’) for all KM-H2 samples and BC-rev-L+2-TIIBC (5’-ATGGGGGATCGTCACTGGCC-3’) for all L540 samples. Amplification reactions were performed in one run to reduce experimental variation. In total, 36 PCR products were derived from 12 DNA samples (2 cell lines, 3 time points, duplicate infection). PCR products were analyzed on 2% agarose gel to check efficiency of the amplification (97-99bp) and to estimate the yield. PCR products were mixed depending on band intensities.

Next generation sequencing and read processing

The PCR product mix was purified using the DNA Clean & ConcentratorTM-5 kit (Zymo Research, Irvine, CA) following the manufacturer’s protocol. Ligation of adapters was done using the NEBNext Multiplex (#E7335) oligo’s for Illumina (New England biolabs, Ipswich, Massachusetts, USA) kit following the manufacturer’s instructions. Paired-end sequencing was performed on the MiSeq™ (Illumina, San Diego, CA). The read counts were assigned to a sample using the sample specific adapter and aligned to the predefined barcode sequences of the 522 or 222 constructs. The alignment was performed using BWA (version 0.7.12; https://github.com/lh3/bwa) and processing of the reads was done with SAM tools (version 1.3; http://www.htslib.org/) [14].

Data analysis

In the first experiment, we performed each PCR with 240ng genomic DNA, which corresponds to approximately 40,000 cells assuming a DNA content of approximately 6pg per cell (24,400 GFP+ cells or more per PCR with a GFP+ purity after sorting of at least 61%). Thus, on average we had at a minimum 47 (24,400/522) unique copies per barcode present in each PCR.

In the second experiment, we performed the PCR with 400ng of genomic DNA, which is the content of approximately 66,667 cells. As the lowest GFP percentage after sorting was 57%, the number of GFP+ cells per PCR was about 38,266. The expected average number of unique copies per barcode in the PCR will thus be ≥172 (38,266/222).

For data analysis, we normalized total read counts per sample to 40,000 and excluded EV-BC with an average read counts per EV-BC in all samples less than 10. We next calculated the average read counts of duplicate or triplicate PCRs per construct per sample. Based on this, changes in abundance at day 13 and day 21 were calculated relative to day 5. The fold change for constructs with decreased read counts was calculated by dividing the day 5 read counts over the read counts at day 13 and 21 and for constructs with increased abundance by dividing the read counts at day 13 and 22 over the read counts at day 5. The average fold changes of the two independent infections were used for the final comparison.

Results

Initial EV-BC pool (1

st

screen)

Two HL cell lines were infected with the EV-BC lentiviral pool in duplicate. The workflow for infection and sorting is shown in Figure 1. The GFP percentages before sorting were 12.1% and 9.5% in KM-H2 and 7.8% and 13.5% in L540 at day 4. We saw no obvious changes in the percentage of GFP+ cells over time. GFP+ cells were sorted at three-time points and used for DNA isolation. The GFP percentages after sorting ranged from 65.6% to 87.7% in KM-H2 (Table 1) and from 64.9% to 86.1% in L540 (Table 1). After isolation of genomic DNA of the sorted cells, duplicate PCRs (240ng DNA as input) were performed to amplify EV-BC fragments using flag-labeled forward primers (Table 4). PCR products were mixed based on band intensities and subjected

(10)

3

AATATGGGGGATCCTCACTGGCC-3’; and BC-rev-L+6

5’-TAATATGGGGGATCCTCACTGGCC-3’) were used randomly. PCR primers were randomized amongst all samples irrespective of the infected cell line, virus pool and PCR replicate. By performing duplicate PCR reactions, 24 PCR products were derived from 12 DNA samples (2 cell lines, 3 time points, duplicate infection). PCR products were analyzed on 2% agarose gel to check the amplification (93-99bp) and to estimate the yield. PCR products were mixed depending on band intensities.

After optimization of the procedures, we performed a triplicate PCR using AmpliTaq DNA Polymerase (Thermo Fisher, Waltham, Massachusetts, USA) in the second set of experiments. The DNA input was 400ng in a PCR mixture consisting of 1x PCR

buffer, 0.2mM dNTP, 1.5mM MgCl2, 1U AmpliTaq DNA polymerase in a total volume of

30µl. The PCR consisted of 32 cycles of 94℃ for 30 seconds, 58℃ for 30 seconds and 72℃ for 30 seconds, with an initial denaturation step of 10 minutes and a final extension step of 7 minutes. We used one universal forward primer (5’-TCTCGGCATGGACGAGCTG-3’) with a unique 9nt flag (Supplementary Table 2) for each individual PCR and 2 different reverse primers, i.e. BC-rev-L+1-TIIBC TGGGGGATCGTCACTGGCC-3’) for all KM-H2 samples and BC-rev-L+2-TIIBC (5’-ATGGGGGATCGTCACTGGCC-3’) for all L540 samples. Amplification reactions were performed in one run to reduce experimental variation. In total, 36 PCR products were derived from 12 DNA samples (2 cell lines, 3 time points, duplicate infection). PCR products were analyzed on 2% agarose gel to check efficiency of the amplification (97-99bp) and to estimate the yield. PCR products were mixed depending on band intensities.

Next generation sequencing and read processing

The PCR product mix was purified using the DNA Clean & ConcentratorTM-5 kit (Zymo

Research, Irvine, CA) following the manufacturer’s protocol. Ligation of adapters was done using the NEBNext Multiplex (#E7335) oligo’s for Illumina (New England biolabs, Ipswich, Massachusetts, USA) kit following the manufacturer’s instructions. Paired-end sequencing was performed on the MiSeq™ (Illumina, San Diego, CA). The read counts were assigned to a sample using the sample specific adapter and aligned to the predefined barcode sequences of the 522 or 222 constructs. The alignment was performed using BWA (version 0.7.12; https://github.com/lh3/bwa) and processing of the reads was done with SAM tools (version 1.3; http://www.htslib.org/) [14].

Data analysis

In the first experiment, we performed each PCR with 240ng genomic DNA, which corresponds to approximately 40,000 cells assuming a DNA content of approximately

6pg per cell (24,400 GFP+ cells or more per PCR with a GFP+ purity after sorting of at

least 61%). Thus, on average we had at a minimum 47 (24,400/522) unique copies per barcode present in each PCR.

In the second experiment, we performed the PCR with 400ng of genomic DNA, which is the content of approximately 66,667 cells. As the lowest GFP percentage after

sorting was 57%, the number of GFP+ cells per PCR was about 38,266. The expected

average number of unique copies per barcode in the PCR will thus be ≥172 (38,266/222).

For data analysis, we normalized total read counts per sample to 40,000 and excluded EV-BC with an average read counts per EV-BC in all samples less than 10. We next calculated the average read counts of duplicate or triplicate PCRs per construct per sample. Based on this, changes in abundance at day 13 and day 21 were calculated relative to day 5. The fold change for constructs with decreased read counts was calculated by dividing the day 5 read counts over the read counts at day 13 and 21 and for constructs with increased abundance by dividing the read counts at day 13 and 22 over the read counts at day 5. The average fold changes of the two independent infections were used for the final comparison.

Results

Initial EV-BC pool (1

st

screen)

Two HL cell lines were infected with the EV-BC lentiviral pool in duplicate. The workflow for infection and sorting is shown in Figure 1. The GFP percentages before sorting were 12.1% and 9.5% in KM-H2 and 7.8% and 13.5% in L540 at day 4. We saw no

obvious changes in the percentage of GFP+ cells over time. GFP+ cells were sorted at

three-time points and used for DNA isolation. The GFP percentages after sorting ranged from 65.6% to 87.7% in KM-H2 (Table 1) and from 64.9% to 86.1% in L540 (Table 1). After isolation of genomic DNA of the sorted cells, duplicate PCRs (240ng DNA as input) were performed to amplify EV-BC fragments using flag-labeled forward primers (Table 4). PCR products were mixed based on band intensities and subjected

(11)

to next generation sequencing.

The totals of the mapped reads per PCR are shown in Table 1. After normalization of the total read counts to 40,000 and calculation of the average read counts of the duplicate PCRs, an average read counts per sample of more than 10 was observed for 410 of the 522 EV-BC. The average read counts of these 410 EV-BCs ranged from 10 to 1,713.

To assess read count changes of EV-BCs over time, we calculated the fold changes of the normalized read counts of each construct at day 13 and day 21. The maximal decrease and increase in abundance in KM-H2 was 21.1- and 15.3-fold, respectively. For L540 the maximal decrease and increase in abundance was 13.3- and 8.3-fold. The anticipated cutoff for defining true vector induced changes was set at the average fold change (avg FC=1.11) plus or minus 2x the SD (SD=1.03), which resulted in cutoff values of 2.96-fold decrease (=1/1.11+1.03*2) and 3.16-fold increase (=1.11+1.03*2) (Figure 5A). The fold changes of 44 (10.7%) constructs in KM-H2 and 80 (19.5%) in L540 were larger than this cutoff. The overlap in constructs with a fold change larger than the cutoff between the two independent infections was 1 in both cell lines, whereas the overlap in constructs with a consistent decrease in both infections was 2 in KM-H2 and 1 in L540. Thus, although the overlap is limited, the overall number of constructs with changes in the relative abundance over time is quite extensive. The unexpected large variation in the abundance of the EV-BC constructs prompted us to explore which factors caused these differences. Upon generation of heatmaps of all samples based on normalized read counts, we observed a first main clustering of samples based on the two general forward primers used for the amplification of the insert. In addition, we observed a clustering of samples amplified with the same reverse primer (data not shown). No clustering based on duplicate PCRs, time points or cell lines was observed. Besides the variation caused by forward and reverse primers used for the amplification reaction, we anticipated that also other problems related to the inserts might have caused the high experimental variation, as we observed read counts above 10 for only 410 of the 522 constructs. Based on these findings we decided to set up a new experiment, using only those constructs with appropriate inserts and fully matching primer binding sites. We decided to minimize PCR artifacts by increasing the DNA input, using the same forward and reverse primer design for all time points per cell line and to perform a triplicate PCR per time point per cell line.

EV-BC screen with optimized pool (2

nd

screen)

In the second screen using an optimized EV pool (see methods), the initial GFP percentages were 11.2% and 12.1% in KM-H2 and 8.7% and 7.7% in L540 at day 4. No obvious changes were seen in the percentage of GFP over time. After sorting, the GFP percentages ranged from 64.9% to 78.5% in KM-H2 (Figure 2 and Table 2) and from 57.4% to 90.9% in L540 (Table 2). After genomic DNA isolation, PCR was performed in triplicate. To get a better representation of unique BC constructs, we amplified EV-BC fragments with 400ng DNA as input using sample-specific flag-labeled forward primers (Table 5). A representative example of the PCR products is shown in Figure 3A. PCR products were mixed based on band intensities on gel and subjected to next generation sequencing (Figure 3B).

The total amount of mapped reads per PCR are shown in Table 2. After normalization of the total read counts per sample to 40,000, an average read count of more than 10 was obtained for 214 of the 222 included EV-BC constructs. Comparison of the distribution in read counts per construct of the second experiment to the first experiment, revealed a much smaller read count range in the second experiment (Figure 4). This suggests that the use of a high confidence Sanger sequencing verified pool of BC’s as in the second experiment enhances the quality of such a screen. Next, we calculated the fold changes in read counts at day 13 and day 21, to determine changes in abundance of EV-BC constructs in the infected cells over time. The average fold change in read counts was 1.011 and 1.008 in KM-H2 and L540. The fold changes varied between 1.46-fold decrease and 1.48-fold increase in KM-H2 and between 2.27-fold decrease and 1.78-2.27-fold increase in L540. Using the average 2.27-fold change in read counts plus or minus 2x the SD (SD=0.19) revealed cutoff values of (1/1.01+2*0.185=1.36-fold) decrease and (1.01+2*0.185=1.38-fold) increase. None of the EV-BC constructs showed a consistent increase or decrease in abundance in KM-H2, while 2 EV-BC constructs showed a consistent increase in L540 (Figure 5B).

Changes in EV-BC using a predefined cut off of 40%

In GFP competition assays for individual miRNA constructs, we analyze the GFP percentages every other day for a period of 22 days and observe a relative change in GFP percentage of 40-80% for miRNAs whose expression or inhibition can affect cell growth. For the first experiment, a 40% change as a predefined cutoff revealed 12 (2.9%) consistently increased and 15 (3.7%) consistently decreased constructs in

(12)

KM-3

to next generation sequencing.

The totals of the mapped reads per PCR are shown in Table 1. After normalization of the total read counts to 40,000 and calculation of the average read counts of the duplicate PCRs, an average read counts per sample of more than 10 was observed for 410 of the 522 EV-BC. The average read counts of these 410 EV-BCs ranged from 10 to 1,713.

To assess read count changes of EV-BCs over time, we calculated the fold changes of the normalized read counts of each construct at day 13 and day 21. The maximal decrease and increase in abundance in KM-H2 was 21.1- and 15.3-fold, respectively. For L540 the maximal decrease and increase in abundance was 13.3- and 8.3-fold. The anticipated cutoff for defining true vector induced changes was set at the average fold change (avg FC=1.11) plus or minus 2x the SD (SD=1.03), which resulted in cutoff values of 2.96-fold decrease (=1/1.11+1.03*2) and 3.16-fold increase (=1.11+1.03*2) (Figure 5A). The fold changes of 44 (10.7%) constructs in KM-H2 and 80 (19.5%) in L540 were larger than this cutoff. The overlap in constructs with a fold change larger than the cutoff between the two independent infections was 1 in both cell lines, whereas the overlap in constructs with a consistent decrease in both infections was 2 in KM-H2 and 1 in L540. Thus, although the overlap is limited, the overall number of constructs with changes in the relative abundance over time is quite extensive. The unexpected large variation in the abundance of the EV-BC constructs prompted us to explore which factors caused these differences. Upon generation of heatmaps of all samples based on normalized read counts, we observed a first main clustering of samples based on the two general forward primers used for the amplification of the insert. In addition, we observed a clustering of samples amplified with the same reverse primer (data not shown). No clustering based on duplicate PCRs, time points or cell lines was observed. Besides the variation caused by forward and reverse primers used for the amplification reaction, we anticipated that also other problems related to the inserts might have caused the high experimental variation, as we observed read counts above 10 for only 410 of the 522 constructs. Based on these findings we decided to set up a new experiment, using only those constructs with appropriate inserts and fully matching primer binding sites. We decided to minimize PCR artifacts by increasing the DNA input, using the same forward and reverse primer design for all time points per cell line and to perform a triplicate PCR per time point per cell line.

EV-BC screen with optimized pool (2

nd

screen)

In the second screen using an optimized EV pool (see methods), the initial GFP percentages were 11.2% and 12.1% in KM-H2 and 8.7% and 7.7% in L540 at day 4. No obvious changes were seen in the percentage of GFP over time. After sorting, the GFP percentages ranged from 64.9% to 78.5% in KM-H2 (Figure 2 and Table 2) and from 57.4% to 90.9% in L540 (Table 2). After genomic DNA isolation, PCR was performed in triplicate. To get a better representation of unique BC constructs, we amplified EV-BC fragments with 400ng DNA as input using sample-specific flag-labeled forward primers (Table 5). A representative example of the PCR products is shown in Figure 3A. PCR products were mixed based on band intensities on gel and subjected to next generation sequencing (Figure 3B).

The total amount of mapped reads per PCR are shown in Table 2. After normalization of the total read counts per sample to 40,000, an average read count of more than 10 was obtained for 214 of the 222 included EV-BC constructs. Comparison of the distribution in read counts per construct of the second experiment to the first experiment, revealed a much smaller read count range in the second experiment (Figure 4). This suggests that the use of a high confidence Sanger sequencing verified pool of BC’s as in the second experiment enhances the quality of such a screen. Next, we calculated the fold changes in read counts at day 13 and day 21, to determine changes in abundance of EV-BC constructs in the infected cells over time. The average fold change in read counts was 1.011 and 1.008 in KM-H2 and L540. The fold changes varied between 1.46-fold decrease and 1.48-fold increase in KM-H2 and between 2.27-fold decrease and 1.78-2.27-fold increase in L540. Using the average 2.27-fold change in read counts plus or minus 2x the SD (SD=0.19) revealed cutoff values of (1/1.01+2*0.185=1.36-fold) decrease and (1.01+2*0.185=1.38-fold) increase. None of the EV-BC constructs showed a consistent increase or decrease in abundance in KM-H2, while 2 EV-BC constructs showed a consistent increase in L540 (Figure 5B).

Changes in EV-BC using a predefined cut off of 40%

In GFP competition assays for individual miRNA constructs, we analyze the GFP percentages every other day for a period of 22 days and observe a relative change in GFP percentage of 40-80% for miRNAs whose expression or inhibition can affect cell growth. For the first experiment, a 40% change as a predefined cutoff revealed 12 (2.9%) consistently increased and 15 (3.7%) consistently decreased constructs in

(13)

KM-H2, while 25 (6.1%) were increased and 31 (7.6%) were decreased in L540. In contrast, we did not observe any EV-BC constructs with consistent increases/decreases in abundance over time in KM-H2 and only 2 (0.9%) with consistent increases and no consistent decreases in L540 in the second experiment (Table 3). This clearly shows that the false positive rate in the second experiment was much lower than in the first experiment.

Discussion and conclusion

Next generation sequencing applications have been broadly implemented in multiple lines of research over the past years. Various platforms were commonly used and allowed generation of large datasets in a short time at relatively low costs, albeit at the expense of increased error rates and shorter read lengths as compared to Sanger sequencing based approaches. [15] The cellular barcoding approach has been widely used to track cell progeny after transplantation both in vitro and in vivo. [16, 17] This technique has been shown to be valuable in studies such as cancer stem cell identification and outperforms conventional methods relying on single-cell transplantation. [18, 19]

In this study, we explored the feasibility of performing a high-throughput screen in HL cells using a lentiviral eGFP EV-BC library. In the first experiment, due to non-optimal experimental conditions such as different forward and reverse primers used in the PCR, different lentiviral EV-BC library virus pools and inappropriate EV-BC inserts, we obtained a broad variation in read counts across constructs and marked changes in read counts per construct over time. After this initial relatively unsuccessful experiment, we optimized several steps of the screening procedure. First of all, we restricted our pool to 222 Sanger sequencing verified EV-BC constructs that could all be amplified with similar forward and reverse primers. In addition, we minimized experimental variation by performing all PCRs per cell line in the same experiment using the same reverse primer. In addition, we used a triplicate instead of a duplicate PCR and increased the DNA input per PCR. These changes in the setup resulted in significantly less variation in EV-BC read count as compared to the first experiment with read counts ranging between 46 and 572 in the second as compared to 10 and 1,713 in the first experiment (Figure 4). Moreover, the strongly reduced false positive rate using a predefined cutoff over time indicates feasibility of the procedure and supports the potential value of a high-throughput screen to identify miRNAs or genes that affect growth of HL cells. The background noise cutoff values as defined in this chapter can

be used to define the cutoff value for the identification of true miRNA-induced effects in the miRNA overexpression and inhibition screens as described in chapters 3B, chapter 4 and chapter 5. Based on the higher range of read count fold changes in L540 as compared to KM-H2, it seems to be advisable to perform such an EV-BC experiment for every cell line.

Presence of cancer stem cells (CSCs) in cell lines might interfere with high-throughput screening results. HL cell lines have been reported to express the stem cell marker aldehyde dehydrogenase (ALDH) in a proportion of the cells indicative of CSC properties. [15, 16] In addition, the cells with a so-called Hodgkin phenotype, i.e. the smaller mononucleated cells, have been shown to have an enhanced proliferative capacity as compared to the bi- or multinucleated Reed-Sternberg cells. [10] Whether presence of CSCs in HL cell lines could interfere with results of high-throughput screening approaches using cellular barcoding was unknown. We did not observe any obvious changes in read counts of specific EV-BC constructs that could be attributed to the targeting of CSCs in our experiments. This may very well be due to the setup of our experiments, in which we infected around 1,000 independent cells per EV-BC construct. The percentages of CSC in HL cell lines were reported to be in the range from 0.1% to 1.0% of the total cells. [20] So, to optimally study the presence and growth characteristics of potential CSC in the HL cell lines, a much larger pool of unique EV-BC constructs would be required.

In conclusion, we showed feasibility of performing a high-throughput screen for identification of miRNAs or genes involved in HL cell growth. Moreover, we defined cutoff values that will allow discrimination between false positive and true read count changes over time for future high-throughput screens in HL cell lines.

(14)

3

H2, while 25 (6.1%) were increased and 31 (7.6%) were decreased in L540. In contrast, we did not observe any EV-BC constructs with consistent increases/decreases in abundance over time in KM-H2 and only 2 (0.9%) with consistent increases and no consistent decreases in L540 in the second experiment (Table 3). This clearly shows that the false positive rate in the second experiment was much lower than in the first experiment.

Discussion and conclusion

Next generation sequencing applications have been broadly implemented in multiple lines of research over the past years. Various platforms were commonly used and allowed generation of large datasets in a short time at relatively low costs, albeit at the expense of increased error rates and shorter read lengths as compared to Sanger sequencing based approaches. [15] The cellular barcoding approach has been widely used to track cell progeny after transplantation both in vitro and in vivo. [16, 17] This technique has been shown to be valuable in studies such as cancer stem cell identification and outperforms conventional methods relying on single-cell transplantation. [18, 19]

In this study, we explored the feasibility of performing a high-throughput screen in HL cells using a lentiviral eGFP EV-BC library. In the first experiment, due to non-optimal experimental conditions such as different forward and reverse primers used in the PCR, different lentiviral EV-BC library virus pools and inappropriate EV-BC inserts, we obtained a broad variation in read counts across constructs and marked changes in read counts per construct over time. After this initial relatively unsuccessful experiment, we optimized several steps of the screening procedure. First of all, we restricted our pool to 222 Sanger sequencing verified EV-BC constructs that could all be amplified with similar forward and reverse primers. In addition, we minimized experimental variation by performing all PCRs per cell line in the same experiment using the same reverse primer. In addition, we used a triplicate instead of a duplicate PCR and increased the DNA input per PCR. These changes in the setup resulted in significantly less variation in EV-BC read count as compared to the first experiment with read counts ranging between 46 and 572 in the second as compared to 10 and 1,713 in the first experiment (Figure 4). Moreover, the strongly reduced false positive rate using a predefined cutoff over time indicates feasibility of the procedure and supports the potential value of a high-throughput screen to identify miRNAs or genes that affect growth of HL cells. The background noise cutoff values as defined in this chapter can

be used to define the cutoff value for the identification of true miRNA-induced effects in the miRNA overexpression and inhibition screens as described in chapters 3B, chapter 4 and chapter 5. Based on the higher range of read count fold changes in L540 as compared to KM-H2, it seems to be advisable to perform such an EV-BC experiment for every cell line.

Presence of cancer stem cells (CSCs) in cell lines might interfere with high-throughput screening results. HL cell lines have been reported to express the stem cell marker aldehyde dehydrogenase (ALDH) in a proportion of the cells indicative of CSC properties. [15, 16] In addition, the cells with a so-called Hodgkin phenotype, i.e. the smaller mononucleated cells, have been shown to have an enhanced proliferative capacity as compared to the bi- or multinucleated Reed-Sternberg cells. [10] Whether presence of CSCs in HL cell lines could interfere with results of high-throughput screening approaches using cellular barcoding was unknown. We did not observe any obvious changes in read counts of specific EV-BC constructs that could be attributed to the targeting of CSCs in our experiments. This may very well be due to the setup of our experiments, in which we infected around 1,000 independent cells per EV-BC construct. The percentages of CSC in HL cell lines were reported to be in the range from 0.1% to 1.0% of the total cells. [20] So, to optimally study the presence and growth characteristics of potential CSC in the HL cell lines, a much larger pool of unique EV-BC constructs would be required.

In conclusion, we showed feasibility of performing a high-throughput screen for identification of miRNAs or genes involved in HL cell growth. Moreover, we defined cutoff values that will allow discrimination between false positive and true read count changes over time for future high-throughput screens in HL cell lines.

(15)

Figure 1. Schematic representation of the barcoded lentiviral vector and overview of the experimental workflow. The 33bp DNA barcode was cloned into the vector upstream of eGFP between the BsrGI and BamHI restriction sites. Barcoded lentiviral particles were infected into HL cells. GFP+ cells were sorted and used to isolate genomic DNA at different time points. Barcode fragments were amplified and subjected to next generation sequencing.

Figure 2. Infection percentages and purity after sorting in KM-H2 cells. KM-H2 cells were infected with the first and second EV-BC virus pool in duplicate (Infection 1 and 2) in the 2nd screening. GFP+ cells were sorted at day 5, day 13 and day 21 after infection. A and B panels were show the results of the first and second infection, respectively. The percentages of GFP+ cells before and after sorting are indicated in each graph. Similar results were obtained for the L540 cell line.

(16)

3

Figure 1. Schematic representation of the barcoded lentiviral vector and overview of the experimental workflow. The 33bp DNA barcode was cloned into the vector upstream of eGFP between the BsrGI and BamHI restriction sites. Barcoded lentiviral particles were infected into HL cells. GFP+ cells were sorted and used to isolate genomic DNA at different time points. Barcode fragments were amplified and subjected to next generation sequencing.

Figure 2. Infection percentages and purity after sorting in KM-H2 cells. KM-H2 cells were infected with the first and second EV-BC virus pool in duplicate (Infection 1 and 2) in the 2nd screening. GFP+ cells were sorted at day 5, day 13 and day 21 after infection. A and B panels were show the results of the first and second infection, respectively. The percentages of GFP+ cells before and after sorting are indicated in each graph. Similar results were obtained for the L540 cell line.

(17)

Figure 3. PCR of EV-BC inserts and schematic representation of the procedure for library preparation and next generation sequencing. (A) Agarose gel of the PCR products of the barcode fragments of the KM-H2 cell line infected with the second virus pool at three time points. Expected amplicon sizes ranges from 97 to 99bp (thick fragment in the 25bp ladder indicates a 125bp fragment). DNA barcode fragments were amplified from genomic DNA of sorted cells at three time points, i.e. day 5, day 13 and day 21. PCR products were mixed based on band intensities with 5µl for bands with a strong intensity and up to 20µl for bands with a weak intensity. (B) Schematic presentation of the Illumina next generation sequencing approach. P7 reverse Illumina sequencing adapter and sequence of sequence by synthesis (SBS) were integrated next to the barcode, and the P5 Illumina sequencing adapter was added during the PCR amplification step. DNA clusters were amplified on the flow cell, which is covered with fixed P7 and P5 oligo’s. After amplification, fluorescent dye is imaged to identify each base one by one.

Figure 4. Average read counts per construct in the first and second experiment in the two HL cell lines. 1st is an overview of the read counts per EV-BC construct of the first screen and 2nd is an overview of the read counts per EV-BC construct of the second screen. Each dot represents the average read counts across all samples for an EV-BC construct. The red lines indicate the average read counts and SD in each cell line. The distribution of the average read counts is much more narrow in the second screen as compared to the first screen.

(18)

3

Figure 3. PCR of EV-BC inserts and schematic representation of the procedure for library preparation and next generation sequencing. (A) Agarose gel of the PCR products of the barcode fragments of the KM-H2 cell line infected with the second virus pool at three time points. Expected amplicon sizes ranges from 97 to 99bp (thick fragment in the 25bp ladder indicates a 125bp fragment). DNA barcode fragments were amplified from genomic DNA of sorted cells at three time points, i.e. day 5, day 13 and day 21. PCR products were mixed based on band intensities with 5µl for bands with a strong intensity and up to 20µl for bands with a weak intensity. (B) Schematic presentation of the Illumina next generation sequencing approach. P7 reverse Illumina sequencing adapter and sequence of sequence by synthesis (SBS) were integrated next to the barcode, and the P5 Illumina sequencing adapter was added during the PCR amplification step. DNA clusters were amplified on the flow cell, which is covered with fixed P7 and P5 oligo’s. After amplification, fluorescent dye is imaged to identify each base one by one.

Figure 4. Average read counts per construct in the first and second experiment in the two HL cell lines. 1st is an overview of the read counts per EV-BC construct of the first screen and 2nd is an overview of the read counts per EV-BC construct of the second screen. Each dot represents the average read counts across all samples for an EV-BC construct. The red lines indicate the average read counts and SD in each cell line. The distribution of the average read counts is much more narrow in the second screen as compared to the first screen.

(19)

Figure 5. Fold changes of each EV-Barcode construct over time per cell line. The constructs are sorted from low to high average read counts per construct in all samples (indicated below the graphs). The fold change of each construct was calculated based on normalized read counts at day 13 or day 21relative to day 5. (A) The cutoff value in the first experiment using the mean ratio ±2x the SD was -2.96and 3.16. Note, the fold change in read counts was outside the limits of the y-axis of 25 EV-BC constructs in KM-H2 and of 57 constructs in L540. (B) Results of the second experiment revealed an average ratio ±2x the SD of -1.36 and 1.38. None of the EV-BC constructs of this experiment had a fold change outside the limits of the y-axis.

Table 1. An overview of the sequencing results of the EV-BC pool infected samples before normalization for the first experiment

KM-H2 1st screen (day) GFP (%)

Mapped read counts and percentage Sorted cells PCR 1 N (%) PCR 2 N (%)

day 5 79.8% 560,000 28,377 (61.4%) 76,921 (70.0%)

day 13 85.3% 745,000 36,102 (63.6%) 102,284 (71.8%) day 21 83.9% 1,000,000 9,060 (55.9%) 103,794 (68.2%)

KM-H2 2nd screen (day) GFP%

Mapped read counts and percentage Sorted cells PCR 1 N (%) PCR 2 N (%) day 5 85.2% 500,000 56,737 (66.0%) 81,937 (60.9%) day 13 65.6% 800,000 19,892 (67.5%) 63,523 (62.2%) day 21 87.7% 800,000 83,151 (63.7%) 70,727 (56.9%)

L540 1st screen (day) GFP%

Mapped read counts and percentage Sorted cells PCR 1 N (%) PCR 2 N (%) day 5 64.9% 300,000 42,248 (76.1%) 63,360 (59.5%) day 13 81.3% 350,000 31,785 (70.8%) 60,518 (59.1%) day 21 86.1% 600,000 46,997 (74.7%) 86,789 (57.6%)

L540 2nd screen (day) GFP%

Mapped read counts and percentage Sorted cells PCR 1 N (%) PCR 2 N (%) day 5 87.9% 434,000 103,925 (72.1%) 105,118 (71.6%) day 13 68.7% 759,000 305,908 (76.8%) 80,910 (83.7%) day 21 84.8% 730,000 148,903 (89.8%) 123,322 (84.7%)

(20)

3

Figure 5. Fold changes of each EV-Barcode construct over time per cell line. The constructs are sorted from low to high average read counts per construct in all samples (indicated below the graphs). The fold change of each construct was calculated based on normalized read counts at day 13 or day 21relative to day 5. (A) The cutoff value in the first experiment using the mean ratio ±2x the SD was -2.96and 3.16. Note, the fold change in read counts was outside the limits of the y-axis of 25 EV-BC constructs in KM-H2 and of 57 constructs in L540. (B) Results of the second experiment revealed an average ratio ±2x the SD of -1.36 and 1.38. None of the EV-BC constructs of this experiment had a fold change outside the limits of the y-axis.

Table 1. An overview of the sequencing results of the EV-BC pool infected samples before normalization for the first experiment

KM-H2 1st screen (day) GFP (%)

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%)

day 5 79.8% 560,000 28,377 (61.4%) 76,921 (70.0%)

day 13 85.3% 745,000 36,102 (63.6%) 102,284 (71.8%)

day 21 83.9% 1,000,000 9,060 (55.9%) 103,794 (68.2%)

KM-H2 2nd screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%)

day 5 85.2% 500,000 56,737 (66.0%) 81,937 (60.9%)

day 13 65.6% 800,000 19,892 (67.5%) 63,523 (62.2%)

day 21 87.7% 800,000 83,151 (63.7%) 70,727 (56.9%)

L540 1st screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%)

day 5 64.9% 300,000 42,248 (76.1%) 63,360 (59.5%)

day 13 81.3% 350,000 31,785 (70.8%) 60,518 (59.1%)

day 21 86.1% 600,000 46,997 (74.7%) 86,789 (57.6%)

L540 2nd screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%)

day 5 87.9% 434,000 103,925 (72.1%) 105,118 (71.6%)

day 13 68.7% 759,000 305,908 (76.8%) 80,910 (83.7%)

(21)

Table 2. An overview of the sequencing results of the EV-BC pool infected samples before normalization for the second experiment

KM-H2 1st screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 73.8% 2,000,000 116,841(92.7%) 93,267 (92.4%) 135,002(92.5%) day 13 78.2% 2,000,000 157,463 (92.4%) 183,180 (92.4%) 149,366 (92.5%) day 21 63.9% 2,000,000 160,855 (93.0%) 148,527 (93.0%) 176,961 (92.9%) KM-H2 2nd screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 72.4% 2,000,000 55,124 (92.0%) 120,749 (92.9%) 140,012 (92.9%) day 13 80.2% 2,000,000 160,190 (92.6%) 137,003 (92.7%) 123,926 (92.7%) day 21 72.8% 2,000,000 81,761 (92.2%) 101,287 (92.8%) 89,748 (92.5%) L540 1st screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 77.3% 1,270,000 109,080 (92.3%) 120,355 (92.3%) 135,186 (92.4%) day 13 60.1% 1,200,000 111,375 (92.5%) 121,134 (92.1%) 107,458 (92.4%) day 21 90.8% 1,320,000 122,168 (92.2%) 89,641 (92.5%) 95,989 (91.8%) L540 2nd screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 77.9% 1,010,000 139,801 (92.7%) 92,307 (93.0%) 115,886 (91.8%) day 13 57.4% 1,100,000 133,505 (92.2%) 278,249 (92.6%) 158,827 (92.4%) day 21 90.9% 1,160,000 79,069 (92.0%) 128,486 (92.4%) 159,826 (92.3%)

Table 3. An overview of the number of EV-BC constructs with a change in read counts of more than 40% (increase/decrease) at day 21 compared to day 5 First experiment 1st infection (%) 2nd infection (%) Overlap (%)

KM-H2 increased 71 (17.3%) 56 (13.7%) 12 (2.9%)

KM-H2 decreased 106 (25.9%) 50 (12.2%) 15 (3.7%)

L540 increased 89 (21.7%) 75 (18.3%) 25 (6.1%)

L540 decreased 81 (19.8%) 125 (30.5%) 31 (7.6%)

Second experiment 1st infection (%) 2nd infection (%) Overlap (%)

KM-H2 increased 0 (0.0%) 3 (1.4%) 0 (0.0%)

KM-H2 decreased 0 (0.0%) 0 (0.0%) 0 (0.0%)

L540 increased 13 (6.1%) 12 (5.6%) 2 (0.9%)

(22)

3

Table 2. An overview of the sequencing results of the EV-BC pool infected samples before normalization for the second experiment

KM-H2 1st screen

(day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 73.8% 2,000,000 116,841(92.7%) 93,267 (92.4%) 135,002(92.5%) day 13 78.2% 2,000,000 157,463 (92.4%) 183,180 (92.4%) 149,366 (92.5%) day 21 63.9% 2,000,000 160,855 (93.0%) 148,527 (93.0%) 176,961 (92.9%) KM-H2 2nd screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 72.4% 2,000,000 55,124 (92.0%) 120,749 (92.9%) 140,012 (92.9%) day 13 80.2% 2,000,000 160,190 (92.6%) 137,003 (92.7%) 123,926 (92.7%) day 21 72.8% 2,000,000 81,761 (92.2%) 101,287 (92.8%) 89,748 (92.5%) L540 1st screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%) day 5 77.3% 1,270,000 109,080 (92.3%) 120,355 (92.3%) 135,186 (92.4%) day 13 60.1% 1,200,000 111,375 (92.5%) 121,134 (92.1%) 107,458 (92.4%) day 21 90.8% 1,320,000 122,168 (92.2%) 89,641 (92.5%) 95,989 (91.8%) L540 2nd screen (day) GFP%

Mapped read counts and percentage

Sorted cells PCR 1 N (%) PCR 2 N (%) PCR 3 N (%)

day 5 77.9% 1,010,000 139,801 (92.7%) 92,307 (93.0%) 115,886 (91.8%)

day 13 57.4% 1,100,000 133,505 (92.2%) 278,249 (92.6%) 158,827 (92.4%)

day 21 90.9% 1,160,000 79,069 (92.0%) 128,486 (92.4%) 159,826 (92.3%)

Table 3. An overview of the number of EV-BC constructs with a change in read counts of more than 40% (increase/decrease) at day 21 compared to day 5

First experiment 1st infection (%) 2nd infection (%) Overlap (%)

KM-H2 increased 71 (17.3%) 56 (13.7%) 12 (2.9%)

KM-H2 decreased 106 (25.9%) 50 (12.2%) 15 (3.7%)

L540 increased 89 (21.7%) 75 (18.3%) 25 (6.1%)

L540 decreased 81 (19.8%) 125 (30.5%) 31 (7.6%)

Second experiment 1st infection (%) 2nd infection (%) Overlap (%)

KM-H2 increased 0 (0.0%) 3 (1.4%) 0 (0.0%)

KM-H2 decreased 0 (0.0%) 0 (0.0%) 0 (0.0%)

L540 increased 13 (6.1%) 12 (5.6%) 2 (0.9%)

(23)

Table 4. Flag sequences of forward primers used as sample specific identifiers in the first experiment

No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’)

1 TTTATTGAGT 13 CCAGCATCTT 25 TATAATCCGT

2 AAACAAGATT 14 CCCGCCTGAT 26 TCAACACCTT

3 AACCACGCAT 15 CCGGCGTGCT 27 TCCACCCGAT

4 GGCTGCATAT 16 TTAATACTTT 28 TAAAAACATT

5 GGGTGGATCT 17 TTCATCGAAT 29 AGACGAGGTT

6 GTATTAATTT 18 TTGATGGACT 30 AGCCGCGTAT

7 GTCTTCCAAT 19 CACGACTCAT 31 AGGCGGGTCT

8 GTGTTGCACT 20 CAGGAGTCCT 32 TGTATGGTCTG

9 GTTTTTCAGT 21 CATGATTCGT 33 TATAAGCA

10 ACCCCCGGAT 22 AAGCAGGCCT 34 TTCCAAGTTTT

11 ACGCCGGGCT 23 AATCATGCGT 35 CCGACCCG

12 ACTCCTGGGT 24 ACACCAGCTT 36 CAAGAGTC

Flag-sequences 1-31 were coupled to the forward primer eGFPfwd3E and flag sequences 32-36 were coupled to forward primer eGFPfwd3L.

Table 5. Flag sequences used as sample specific identifiers on the forward primers in the PCR of the second experiment

No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’)

1 CATGATTCG 13 CTTGTTAAG 25 GTATTAATT

2 CCAGCATCT 14 GAATAAAAT 26 GTCTTCCAA

3 CCCGCCTGA 15 GACTACACA 27 GTGTTGCAC

4 CCGGCGTGC 16 GAGTAGACC 28 GTTTTTCAG

5 CCTGCTTGG 17 GATTATACG 29 TAAAAACAT

6 CGAGGATGT 18 GCATCAACT 30 TAGAAGCCC

7 CGCGGCTTA 19 GCCTCCAGA 31 TATAATCCG

8 CGGGGGTTC 20 GCGTCGAGC 32 TCAACACCT

9 CGTGGTTTG 21 GCTTCTAGG 33 TCCACCCGA

10 CTAGTATTT 22 GGATGAAGT 34 TCGACGCGC

11 CTCGTCAAA 23 GGCTGCATA 35 TCTACTCGG

(24)

3

Table 4. Flag sequences of forward primers used as sample specific identifiers in the first experiment

No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’)

1 TTTATTGAGT 13 CCAGCATCTT 25 TATAATCCGT

2 AAACAAGATT 14 CCCGCCTGAT 26 TCAACACCTT

3 AACCACGCAT 15 CCGGCGTGCT 27 TCCACCCGAT

4 GGCTGCATAT 16 TTAATACTTT 28 TAAAAACATT

5 GGGTGGATCT 17 TTCATCGAAT 29 AGACGAGGTT

6 GTATTAATTT 18 TTGATGGACT 30 AGCCGCGTAT

7 GTCTTCCAAT 19 CACGACTCAT 31 AGGCGGGTCT

8 GTGTTGCACT 20 CAGGAGTCCT 32 TGTATGGTCTG

9 GTTTTTCAGT 21 CATGATTCGT 33 TATAAGCA

10 ACCCCCGGAT 22 AAGCAGGCCT 34 TTCCAAGTTTT

11 ACGCCGGGCT 23 AATCATGCGT 35 CCGACCCG

12 ACTCCTGGGT 24 ACACCAGCTT 36 CAAGAGTC

Flag-sequences 1-31 were coupled to the forward primer eGFPfwd3E and flag sequences 32-36 were coupled to forward primer eGFPfwd3L.

Table 5. Flag sequences used as sample specific identifiers on the forward primers in the PCR of the second experiment

No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’) No. Flag Sequence (5’-3’)

1 CATGATTCG 13 CTTGTTAAG 25 GTATTAATT

2 CCAGCATCT 14 GAATAAAAT 26 GTCTTCCAA

3 CCCGCCTGA 15 GACTACACA 27 GTGTTGCAC

4 CCGGCGTGC 16 GAGTAGACC 28 GTTTTTCAG

5 CCTGCTTGG 17 GATTATACG 29 TAAAAACAT

6 CGAGGATGT 18 GCATCAACT 30 TAGAAGCCC

7 CGCGGCTTA 19 GCCTCCAGA 31 TATAATCCG

8 CGGGGGTTC 20 GCGTCGAGC 32 TCAACACCT

9 CGTGGTTTG 21 GCTTCTAGG 33 TCCACCCGA

10 CTAGTATTT 22 GGATGAAGT 34 TCGACGCGC

11 CTCGTCAAA 23 GGCTGCATA 35 TCTACTCGG

Referenties

GERELATEERDE DOCUMENTEN

MicroRNA expression and functional analysis in Hodgkin lymphoma ©copyright 2019 Ye Yuan. All

Small RNA cloning and subsequent sequencing analysis of 250 cancer samples including 4 cHL cell lines and various normal B-cell subsets revealed a high expression of miR-16,

MYC target genes have been shown to play a role in cell cycle, apoptosis, and cellular transformation.[47] On the one hand, overexpression of MYC has been associated with

A total of 41 constructs were generated, including one negative control (pCDH-NC, random 511bp part of RFP), 5 miRNAs within the top-10 most abundantly expressed in HL, 8 miRNAs

We included a total of 63 constructs in the pool, these were partly selected based on being highly abundant or differentially expressed in cHL compared to GC B cells (n=17) [16],

Of the miRNAs with a phenotype on HL growth upon miRNA inhibition, miR- 21-5p was the most abundant miRNA with significantly increased expression levels in HL cell lines compared

In dit proefschrift hebben we onderzocht (1) welke miRNAs een veranderde expressie hebben in cHL cellijnen ten opzichte van GC-B cellen, (2) welke miRNAs de groei van cHL

Archive for Contemporary Affairs University of the Free State