• No results found

University of Groningen Development of genetic manipulation tools in Macrostomum lignano for dissection of molecular mechanisms of regeneration Wudarski, Jakub

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen Development of genetic manipulation tools in Macrostomum lignano for dissection of molecular mechanisms of regeneration Wudarski, Jakub"

Copied!
25
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

Development of genetic manipulation tools in Macrostomum lignano for dissection of

molecular mechanisms of regeneration

Wudarski, Jakub

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Wudarski, J. (2019). Development of genetic manipulation tools in Macrostomum lignano for dissection of molecular mechanisms of regeneration. Rijksuniversiteit Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

CHAPTER 6

Flourescence activated cell sorting is

an efficient approach for establishing

transcriptional profiles of various

tissues in the flatworm

Macrostomum

lignano

Jakub Wudarski1, Stijn Mouton1, Eugene Berezikov1*

1European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen,

Groningen, The Netherlands. * Corresponding author

(3)

ABSTRACT

Flatworms are one of the most popular models to study regeneration. Detailed characterization of flatworm cell types is crucial for deciphering the mechanisms that govern this process. Despite the recent progress in establishing a planarian cell atlas using single-cell sequencing methods, the possibility to select the desired cell populations for in-depth analysis is still lacking. We applied fluorescence activated cell sorting (FACS) to several transgenic lines of the flatworm Macrostomum lignano expressing GFP in specific tissues and isolated cells from

gut, ovaries, testes, and a novel cell type. RNA sequencing of the selected cell populations resulted in the detail transcriptional profiles on these tissues and allowed identification of the tissue-specific signature genes and the refinement of the stem cell (neoblast) markers. This work demonstrates that FACS is a productive approach for isolating specific cells in M. ignano, and the generated data lays ground for future transcriptomic studies in this

animal, including singe cell sequencing.

INTRODUCTION

Flatworms have long been used as model organisms to study such important topics as regeneration and stem cell biology. Their amazing regeneration capabilities caught to the attention of scientists already in the late 19th century [1, 2]. One of the most intriguing aspects in the biology of the regenerating flatworms is the presence of adult stem cells called neoblasts. These small and round cells are the only proliferating cells of flatworms and possibly hold the key to the amazing abilities of these animals [3, 4]. Continuous attempts to characterize neoblasts revealed that their population is heterogeneous with a pluripotent fraction, referred to as clonogenic neoblasts (cNeoblasts), driving the regenerative potential [5–7]. The neoblast system is very complex and needs to be addressed together with a more thorough and detailed analysis of other cell types. Only then we can get insight on cell differentiation mechanisms that are tightly connected to the regenerative potential of stem cells. This topic has been studied for some time, but with the advent of single-cell RNA sequencing it was propelled to a new level. As a result the single-cell transcriptomics-based cell type atlas and lineage tree for Schmidtea mediterranea were recently generated [8, 9].

These single-cell RNA-seq discoveries, although valuable, suffer from the lack of the possibility to select the desired cell populations for in-depth follow-up or parallel functional analysis. Fluorescent Activated Cell Sorting (FACS) allows such selection and has been the method of choice in fields like hematopoiesis for decades [10]. In flatworms, the use of FACS is currently limited, and the best strategies are based on the use of DNA content as a marker for dividing cells and selecting them as neoblasts [11]. One of the major underlying reasons for this limitation is the lack of selection markers for FACS. This is due to the low number of available planarian antibodies and no methods to generate transgenic planarians, where specific cell types could be labeled. However, recent advances in the field of flatworm transgenesis present an opportunity to overcome the abovementioned hurdle, enabling

(4)

efficient selection and further in-depth characterization of flatworm cell populations, providing insight into the mechanisms of cell differentiation. The flatworm that gives access

to the transgenic techniques is the free-living, marine member of the Macrostomum genus

Macrostomum lignano. The worm offers a range of research opportunities and apart from

having well characterized neoblast and germline transcription signatures it also provides already established tissue-specific transgenic lines. Therefore, it is perfectly suited for the isolation and analysis of specific cell populations by FACS [12–14].

Here we provide a proof-of-principle for RNA sequencing of tissue-specific cell

populations isolated by FACS from transgenic M. lignano animals expressing GFP in gut,

testes, ovaries, and a newly identified cell cluster we named ‘PW75’. The results improve the current knowledge about the transcriptomic signatures in neoblasts and other cell types and pave the way for future development of the approach.

RESULTS

Transcriptomes of FACS-isolated cell populations

In order to characterize different cell populations of Macrostomum lignano we used transgenic

lines expressing GFP in specific tissues (Fig 1). We have selected four previously described lines with distinct patterns confirmed by both fluorescence imaging and in situ hybridization

[13]. As the positive control we picked the NL20 line where elongation factor alpha (EFA) promoter drives the expression of the H2B histone fused with eGFP. All the cells of the NL20 line are expressing GFP in their nuclei, because of the H2B histone and the fact that EFA is a ubiquitous promoter. We also chose the homolog of apolipoprotein B (APOB) with a gut-specific expression (line NL22), the embryonic lethal abnormal vision or ELAV homolog present in the testes (line NL21), and the NL23 line expressing GFP under the ovary specific promoter, a homolog of calcium binding protein 7 (CABP7). We also added one new transgenic line with an expression pattern that was difficult to attribute to a single tissue. The PW75 line was established using the same random integration approach as the previous lines. The promoter was selected based on the high level of gene expression in the developing embryo and the indication of specific expression in the neoblasts. The fluorescence pattern obtained in the adult worm is, however, unclear and seems to be present mainly in the epidermis (Fig 1). Finally, we used the wild type NL10 line as the negative control. Isolation of cells by FACS can influence their gene expression and introduce biases [15]. To estimate this influence, we compared transcriptomes from RNA isolated directly from whole animals (WHOLE), from whole animals macerated for FACS (MACER) but not sorted, and from bulk-sorted cells gating out only debris (BULK).

(5)

FACS Sorting

RNA-seq Maceration to single cells

Control EFA ELAV CABP7 PW75 APOB APOB

CABP7 ELAV

EFA

PW75

Figure 1. | FACS approach for establishing transcriptional profiles of tissues in M.lignano. (A) Strategy,

schematic outline. (B) Transgenic lines used, representative pictures. Scale bars are 100μm.

Worms were macerated into single cells and selected based on GFP signal using fluorescence activated cell sorting (FACS). The sorting procedure yielded between 5000 and 15000 cells. The material obtained in this way was insufficient to allow for standard RNA library preparation. To overcome this limitation we have switched to a single-cell RNA-seq protocol called CEL-Seq (Cell Expression by Linear amplification and Sequencing), which we previously successfully used to characterize neoblast and germline signatures in M. lignano

starting from very low amounts of RNA [12]. In this approach a primer with T7 promoter, Illumina 5' adaptor, unique molecular identifier (UMI), barcode and polyT stretch is used in a reverse transcription reaction. The reverse transcription is followed by in vitro transcription

and a bead clean up step. Next, in the second reverse transcription step with primers containing Illumina 3' adaptor and random hexamers RNA is converted to DNA. Finally, PCR selective for fragments containing both of the adaptors is used to create libraries and a paired-end sequencing is performed. CEL-seq is selective for polyadenylated fragments due to the polyT anchor in the primers. The advantage of this feature is low levels of rRNA and more specific selection of mRNA. The downside is the limited detection of alternative splice forms and the need to trim the polyA prior to mapping the reads [16, 17]. Most of the generated libraries had a very high read count largely exceeding the 1.5 million threshold typically used in single cell RNA-seq experiments (Table 1). In most libraries more than 90% of reads mapped to the minus strand of genes, as expected for this type of strand-specific libraries, and had a very low fraction of reads mapping to the ERCC (External RNA Controls Consortium) spike-ins,

(6)

thus confirming the quality of the libraries [17, 18]. Two of the libraries that did not pass the abovementioned quality thresholds were eliminated from further study (Table 1, highlighted in red). However, we have decided to keep PW75_1 and PW75_3 (Table 1, highlighted in blue) for the analysis, despite their rather low read count (891,699 and 1,280,119 respectively) and high ERCC spike-in fraction (21.30% and 25.81% respectively) compared to other libraries. This decision was made due to the lack of substitute material for the PW_75 samples and possible value coming from analyzing these two samples.

Table 1 Summary of generated libraries

Libary Raw reads Trimmed

polyA No polyA Short

Used for

mapping minus_tr plus_tr minus_ercc plus_ercc tot

APOB_1 4721919 1859904 2006739 855276 3866643 89,84% 9,96% 0,20% 0,00% 70,71% APOB_2 4190900 1268698 2492010 430192 3760708 93,58% 5,75% 0,67% 0,00% 70,25% APOB_3 19730122 6188758 11475466 2065898 17664224 94,97% 3,89% 1,14% 0,00% 69,34% BULK_1 7820775 2171122 5002850 646803 7173972 89,52% 10,47% 0,01% 0,00% 64,06% BULK_2 5260143 1488818 3282322 489003 4771140 92,05% 7,92% 0,03% 0,00% 68,53% BULK_3 5012963 1323619 3308172 381172 4631791 87,98% 11,96% 0,05% 0,00% 52,65% BULK_4 7588063 1958135 5038612 591316 6996747 90,75% 9,21% 0,04% 0,00% 64,00% CABP7_1 16884057 4157254 11274414 1452389 15431668 91,92% 8,07% 0,01% 0,00% 71,66% CABP7_2 2720486 740282 1721661 258543 2461943 88,74% 11,19% 0,08% 0,00% 63,77% CABP7_3 3604558 970067 2340270 294221 3310337 92,55% 7,39% 0,06% 0,00% 70,43% CABP7_4 3973126 1060215 2561261 351650 3621476 92,29% 7,67% 0,04% 0,00% 67,59% ELAV_1 371124 103595 181564 85965 285159 33,52% 2,71% 63,73% 0,03% 45,80% ELAV_2 5667452 1790023 3030898 846531 4820921 93,62% 5,67% 0,71% 0,00% 58,44% ELAV_3 26163541 7455049 15809161 2899331 23264210 96,60% 2,63% 0,77% 0,00% 71,24% EFA_1 4159699 1533369 1855580 770750 3388949 90,69% 5,68% 3,63% 0,00% 69,79% EFA_2 60280351 20733209 30243285 9303857 50976494 95,20% 4,71% 0,09% 0,00% 55,70% EFA_3 26466702 7971030 15623541 2872131 23594571 95,64% 3,72% 0,64% 0,00% 74,40% MACER_1 5584747 1829876 3256873 497998 5086749 93,43% 6,56% 0,01% 0,00% 59,98% MACER_2 9519230 2882028 5608669 1028533 8490697 93,08% 6,91% 0,00% 0,00% 69,27% MACER_3 8045080 2533687 4661834 849559 7195521 95,48% 4,52% 0,00% 0,00% 74,79% MACER_4 13671687 4129401 8007629 1534657 12137030 93,02% 6,98% 0,00% 0,00% 65,80% WHOLE_1 22449636 8464536 12220244 1764856 20684780 96,51% 3,49% 0,00% 0,00% 49,02% WHOLE_2 3075480 1032521 1644874 398085 2677395 92,16% 7,84% 0,01% 0,00% 72,75% WHOLE_3 2253433 667371 1367902 218160 2035273 94,42% 5,57% 0,01% 0,00% 69,82% WHOLE_4 12281284 3812844 7256170 1212270 11069014 93,73% 6,27% 0,00% 0,00% 54,90% PW75_1 1080603 365301 526398 188904 891699 74,34% 4,35% 21,30% 0,01% 51,34% PW75_2 185247 51999 112696 20552 164695 65,58% 4,17% 30,25% 0,00% 52,16% PW75_3 1484959 445851 834268 204840 1280119 70,97% 3,22% 25,81% 0,00% 64,66%

The efficiency of our approach is indicated by the number of detected transcripts for each of the sequenced libraries (Table 2). Out of the total of 50673 transcript clusters annotated in the current transcriptome assembly [19] the control libraries of WHOLE, MACER, BULK

(7)

and EFA had respectively 40255, 40995, 34615 and 42715 transcript clusters that we could detect, with 29247, 29733, 27822 and 29136 having more than 1 count per million (cpm). The latter numbers are much higher than the usual thresholds of a 1000 transcript clusters reported for single cell sequencing [7, 20, 21], which further improve the reliability of our approach. One can appreciate the fact that both WHOLE and MACER values are very similar and the difference between these two preparations and the sorted controls of EFA and BULK indicate the influence of the FACS procedure on the final sequencing result. We need to point out that the BULK and WHOLE controls lack almost 10000 transcript clusters as compared to the total number of annotated transcripts in the genome. We can attribute a part of the missed transcripts to the limitation of the CEL-seq protocol. Use of the poly(T) stretch eliminates the transcripts lacking the poly(A) tail. Some of the missed transcripts could be due to their temporally-restricted expression pattern (i.e. not expressed in the adult animals used in the analysis). Other transcripts, which are expressed in adults but still missing in our dataset, could be explained by the difficulties of correct annotation of 3'UTRs of the genes, where the 3'UTR ends of the transcripts might be split from the body of the genes due to the presence of e.g. repeats and AU-rich sequences, and hence annotated as independent transcripts. Since the CEL-seq protocol has strong 3' bias, it will miss such transcripts with incorrect 3'UTR annotations.

Table 2 Number of transcription clusters detected in each library

Library Transcript clusters with at least 1 CPM Total number of detected transcript clusters Total transcript clusters in the assembly APOB 22043 33574 50673 BULK 27822 34615 50673 CABP7 24307 32336 50673 EFA 29136 42715 50673 ELAV 20779 33502 50673 MACER 29733 40995 50673 PW75 17271 17271 50673 WHOLE 29247 40255 50673

*Average CPM were calculated for library replicates

We next performed principal component analysis (PCA) on all of the generated libraries to assess the relations between our samples (Fig. 1 A). All of the replicates clustered together providing evidence for good reproducibility of the approach. The first principal component explained 31.33% of the variation and separated the samples containing all tissues (EFA, BULK, MACER and WHOLE) from the samples where cells were selected based on their tissue specific expression of GFP (ELAV, CABP7, APOB, PW75). Important to notice is the clear difference in clustering between the sorted cells (EFA, BULK) and non-sorted

(8)

(MACER, WHOLE). The second principal component explained 22.89% of the variation and distinguished CABP7 and PW75 clusters as being different from other samples and from each other. This is consistent with the fact that the PW75 construct is expressed in the early developing embryo, even though it is absent in the ovaries of the adults. To further analyze the libraries we have visualized the results using heatmap and hierarchical clustering (Fig. 1 B). Once again, we can see the clustering of the samples containing all tissues and a distinction between the sorted and unsorted ones. Important to notice is that the germline samples (ELAV and CABP7) have clustered together and separately from the two other samples coming from different tissues (APOB – gut and PW75 – possibly epidermis).

−0.2 −0.1 0.0 0.1 0.2 0.3 −0.4 −0.2 0.0 0.2 0.4 PC 1 (31.33%) PC 2 (22.89%) APOB_1 APOB_2 APOB_3 ELAV_2 ELAV_3 PW75_1PW75_3 CABP7_1CABP7_2 CABP7_3 CABP7_4 EFA_1 EFA_2 EFA_3 BULK_1 BULK_2BULK_3 BULK_4 MACER_1 MACER_2 MACER_3 MACER_4WHOLE_1 WHOLE_2 WHOLE_3 WHOLE_4 A B

Figure 2. | Assessment of the relation between the samples. (A) Principal Component Analysis of all

generated libraries. Numbers following the underscores represent the technical replicates. (B) Heatmap with hierarchical clustering of all generated libraries.

Influence of cell isolation on gene expression

As indicated earlier, FACS procedure can influence the gene expression and introduce biases. We have compared the negative control samples in order to correct for substantial discrepancies that the sorting can inflict. For the statistical analysis we have used GLM models with quasi-likelihood F-test analysis, which is a stringent approach that provides a robust and reliable error rate control when the number of replicates is small [22]. As seen in the comparison of the fold change in the expression of the transcripts, the two unsorted preparations (MACER and WHOLE) did not reveal any significant, differentially expressed genes (Fig. 3 C). We therefore concluded that the cell maceration step does not introduce significant biases for the sequencing results. The comparison of the sorted cells (BULK) to macerated (MACER) resulted in 16 depleted and no enriched transcripts while comparison

(9)

of BULK to whole animals (WHOLE) gave a 100 depleted and 25 enriched transcripts. Of note, the majority of the affected transcripts are expressed at low levels. These relatively small differences show limited influence of the sorting procedure on the final outcome of the sequencing. -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(BULK/MACER)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in BULK, log2(CPM) Neoblasts Germline All

BULK - MACER Enriched: 0 Depleted: 16 -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(BULK/WHOLE)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in BULK, log2(CPM) Neoblasts Germline All

BULK - WHOLE Enriched: 25 Depleted: 100 -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(MACER/WHOLE)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in MACER, log2(CPM) Neoblasts Germline All

MACER - WHOLE

Enriched: 0 Depleted: 0

A B C

Figure 3. | Influence of cell isolation on gene expression. Comparison of RNA-seq data from

differ-ently treated worms. (A) Comparison of bulk sorted cells (BULK) and macerated unsorted cells (MACER). (B)

Comparison of bulk sorted cells (BULK) and whole worms (WHOLE). (C) Comparison of macerated unsorted cells (MACER) and whole worms (WHOLE).

Gene expression signatures of specific cell types

The EFA line expresses GFP under a ubiquitous promoter. We have used it to control for the potential differences caused by GFP expression and compared the transcript levels to the BULK sorted cells (Fig 4 A). Only 17 enriched and 11 depleted transcripts were detected, indicating low influence of GFP on the procedure. We have next compared each of the tissue-specific samples to the BULK control. All the samples had more than 1000 genes significantly enriched compared to the BULK, and between 500 and 2500 depleted genes (Fig 4 B-D). Interestingly, the most of depleted genes (2456) were observed in ovaries (Fig 4C). All of the specific genes that were used as selection markers were present and enriched in their respective comparisons to BULK indicating their reliability and confirming their presence in the selected tissues (Fig 4 B-D, green dots). Importantly, for the germline transgenic lines (CABP7 and ELAV markers) we observed a substantial number of transcripts previously annotated as germline-specific [12] among the enriched transcripts (Fig 4 B,C dark blue dots). We next overlapped the enriched genes from each of the selected groups, and only 13 genes were enriched in all four lines (Fig 4 E). Just as one might expect, the highest overlaps were between the two germline libraries. Interestingly, a relatively large overlap of 240 enriched transcript clusters between the gut specific APOB and the ovary specific CABP7 could be observed, pointing at new directions for future studies (Fig 4E)

(10)

-16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(EFA/BULK)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in EFA, log2(CPM)

EFA Neoblasts Germline All

EFA - BULK Enriched: 17 Depleted: 11 -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(ELAV/BULK)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in ELAV, log2(CPM)

ELAV ELAV-specific Neoblasts Germline All ELAV - BULK Specific: 26 Enriched: 1252 Depleted: 517 -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(CABP7/BULK)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in CABP7, log2(CPM)

CABP7 CABP7-specific Neoblasts Germline All CABP7 - BULK Specific: 61 Enriched: 1550 Depleted: 2456 -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2 (PW75/BULK) -4 -2 0 2 4 6 8 10 12 14 16 Counts in PW75, log2(CPM) PW75 PW75-specific Neoblasts Germline All PW75 - BULK Specific: 7 Enriched: 1071 Depleted: 611 -16 -14 -12 -10-8 -6 -4 -20 2 4 6 8 10 12 14 16

Fold change, log

2

(APOB/BULK)

-4 -2 0 2 4 6 8 10 12 14 16 Counts in APOB, log2(CPM)

APOB APOB-specific Neoblasts Germline All APOB - BULK Specific: 43 Enriched: 1310 Depleted: 961 708 12 790 55 13 13 9 16 798 204 14 87 355 930 126 APOB CABP7 ELAV PW75 708 12 790 55 13 13 9 16 798 204 14 87 355 930 126 APOB CABP7 ELAV PW75 A B C D E F

Figure 4. | Differential gene expression analysis of the isolated cell types. Comparison of RNA-seq

data from different tissues sorted based on the GFP signal. Light blue dots represent transcripts that are

significantly enriched or depleted. Dark blue dots represent transcripts previously described as germline specific. Orange dots point to the transcripts enriched only in the cells positive for the marker used. Green dot represents the gene used as selection marker. Red dots represent transcripts previously described as neoblast specific. (A) Comparison of EFA positive cells (EFA) and bulk sorted cells (BULK). (B) Comparison of testes specific line (ELAV) and BULK. (C) Comparison of ovary specific line (CABP7) and BULK. (D) Comparison of the PW75 line (PW75) and BULK. (D) Comparison of gut specific line (APOB) and BULK. (E) Venn diagram representing overlaps between the tested cell populations.

We next tested for the enrichment of the known cell markers across the available data sets from Macrostomum lignano and the planarian Schmidtea mediterranea (Table 3). We have used

the two previously published signature annotations, one for neoblast/germline in M. lignano -

‘Grudniewska2016’ [12] and the second, the S. mediterranea - ‘Wurtzel2015’ [23] classification.

We then supplemented the list with the two additional data sets from the recent publications on single cell sequencing in S. mediterranea from the labs of Peter Reddien (‘Fincher2018’)

and Nikolaus Rajewsky (‘Plass2018’) [8, 9]. Upregulated transcripts from CABP7 line could be picked up in the M. lignano germline list and had an 2.27 fold higher overlap then what is

expected by random distribution. Similarly, transcripts from the ELAV line had a 5.44 fold higher overlap with the germline list. Additionally, ELAV upregulated transcript list was 5.01 times more abundant in the ‘Wurtzel2016’ transcripts described as testes enriched. APOB enriched transcripts were significantly overlapping with transcripts considered specific for phagocytes (7.52 fold, ‘Plass2018’), late epidermal progenitors (2.48 fold, ‘Plass2018’), parenchymal cells (5.26 fold, ‘Plass2018’), intestine (3.57 fold, ‘Fincher2018’), gamma neoblasts

(11)

(5.85 fold, ‘Wurtzel2016’) and the gut (4.62 fold, ‘Wurtzel2016’). The latter results confirm the origin of the isolated cell populations and the specificity of our tissue specific promoters. However, we were unable to properly define the PW75 line. The upregulated transcripts significantly overlapped with ‘Grudniewska2016’ neoblasts (3.21 fold), ‘Plass2018’ epidermal progenitors (8.78 fold), parenchymal cell (10.73 fold), neurons (10.16 fold), neoblasts (5.85 fold) and phagocytes (2.76 fold), but also with ‘Wurtzel2016’ gamma neoblasts (4.17 fold), testes (4.55 fold) and the gut (1.97), providing no clear indication for their tissue specificity.

Table 3 Enrichment of known cell markers

APOB CABP7 ELAV PW75

Cell Type UP DOWN UP DOWN UP DOWN UP DOWN

Mlig2016 Germline 0.40 0.65 2.27 0.03 5.44 0.00 0.21 0.33 Neoblast ns 0.31 0.39 0.07 ns ns 3.21 ns Neoblast, stringent 0.05 2.35 ns 0.23 0.28 ns ns ns Plass2018 epidermis ns ns ns 4.39 ns ns ns ns otf+ cells 2 ns ns ns 3.12 ns ns ns ns GABA neurons ns 6.73 ns 7.02 ns 8.33 ns ns

pharynx cell type ns ns ns 2.07 ns ns 2.74 ns

early epidermal progenitors ns ns ns ns ns ns 8.78 ns

pgrn+ parenchymal cells ns ns ns ns ns ns 10.73 ns cav-1+ neurons ns ns ns 7.39 ns ns 10.16 ns goblet cells ns ns ns 4.13 ns ns ns ns neoblast 1 ns ns ns ns ns ns 5.85 ns secretory 4 ns ns ns 7.02 ns ns ns ns phagocytes 7.52 ns ns 2.67 ns ns 2.76 ns

late epidermal progenitors 2 2.48 ns ns ns ns ns ns ns

psap+ parenchymal cells 5.26 ns ns ns ns ns ns ns

secretory 3 ns 10.76 ns ns ns 13.33 ns ns Fincher2018 Intestine 3.57 ns ns ns ns ns ns ns Neoblast ns ns ns 0.19 ns ns ns ns Cathepsin+ cell ns ns ns 0.58 ns ns ns ns Neural 0.42 ns 0.68 1.32 0.56 ns ns ns Muscle ns ns ns 1.65 ns 2.07 ns ns Parapharyngeal ns 2.90 ns ns ns ns ns ns Grudniewska2016 Sigma neoblasts ns ns ns 2.15 ns ns ns ns Asexual_biased ns ns 0.67 1.29 0.68 ns ns ns Gamma neoblasts 5.85 ns ns 4.16 ns ns 4.17 ns Zeta neoblasts ns ns ns 2.53 ns ns ns ns Testis ns ns ns ns 5.01 ns 4.55 ns Sexual_biased 1.32 ns ns ns ns ns ns ns

(12)

Gut 4.62 0.21 ns 1.88 ns ns 1.97 ns Neoblast ns ns ns ns 2.69 ns ns ns Epidermis II ns ns ns 2.26 ns ns ns ns Parapharyngeal ns 2.76 ns 1.84 ns ns ns ns Muscle 0.27 ns ns 1.97 ns ns ns ns non significant epidermis DVb ns ns ns ns ns ns ns ns

aqp+ parenchymal cells ns ns ns ns ns ns ns ns

pigment ns ns ns ns ns ns ns ns

ChAT neurons 1 ns ns ns ns ns ns ns ns

secretory 2 ns ns ns ns ns ns ns ns

activated early epidermal progenitors ns ns ns ns ns ns ns ns

epidermis DVb neoblast ns ns ns ns ns ns ns ns muscle body ns ns ns ns ns ns ns ns neoblast 8 ns ns ns ns ns ns ns ns ldlrr-1+ parenchymal cells ns ns ns ns ns ns ns ns otf+ cells 1 ns ns ns ns ns ns ns ns ChAT neurons 2 ns ns ns ns ns ns ns ns neoblast 3 ns ns ns ns ns ns ns ns

late epidermal progenitors 1 ns ns ns ns ns ns ns ns

neoblast 2 ns ns ns ns ns ns ns ns

neoblast 7 ns ns ns ns ns ns ns ns

protonephridia ns ns ns ns ns ns ns ns

neoblast 11 ns ns ns ns ns ns ns ns

pharynx cell type progenitors ns ns ns ns ns ns ns ns

spp-11+ neurons ns ns ns ns ns ns ns ns psd+ cells ns ns ns ns ns ns ns ns gut progenitors ns ns ns ns ns ns ns ns secretory 1 ns ns ns ns ns ns ns ns glia ns ns ns ns ns ns ns ns neoblast 10 ns ns ns ns ns ns ns ns npp-18+ neurons ns ns ns ns ns ns ns ns Protonephridia ns ns ns ns ns ns ns ns Epidermal ns ns ns ns ns ns ns ns Phyarnx ns ns ns ns ns ns ns ns Sexual_specific ns ns ns ns ns ns ns ns Protonephridia ns ns ns ns ns ns ns ns Epidermis I ns ns ns ns ns ns ns ns Xins ns ns ns ns ns ns ns ns X2 ns ns ns ns ns ns ns ns X1 ns ns ns ns ns ns ns ns

(13)

Finally, we identified tissue-specific transcripts by requiring a significant enrichment of the transcript in a given tissues compared to the BULK, a relatively high level of expression of the transcript in the tissue with the minimum of 1 cpm, and a complete absence (0 reads) of the transcript in other tissues. We detected 61 such transcripts specific for CABP7 line, 26 for ELAV, 43 for APOB and 7 for PW75 (Fig 4 B-D). These transcripts represent tissue-specific transcriptional signatures (Table 4), and provide a useful resource for future functional studies in Macrostomum lignano.

Table 4 Transcriptional signatures of the isolated cell types

Cluster FC CPM Human Homolog Transcripts

APOB

MligTCv3c-12075.0 8,19 6,906 Mlig009612.g1

MligTCv3c-12043.0 6,27 4,407 NPDC1 Mlig008605.g1

MligTCv3c-6692.2 9,69 22,115 Mlig020967.g1

MligTCv3c-3516.2 5,5 2,568 SUSD2 Mlig021480.g1

MligTCv3c-8203.0 8,58 9,265 YSK4; ESPN; ULK3 Mlig050454.g1, Mlig012489.g7, Mlig055330.g2, Mlig020020.g1

MligTCv3c-8350.0 7,15 3,945 DYNLL2 Mlig026248.g2, Mlig026248.g3, Mlig026248.g1, Mlig014838.g1

MligTCv3c-6147.1 7,06 1,030 SLC16A12 Mlig020713.g1

MligTCv3c-562.620 7,58 15,808 Mlig025132.g1

MligTCv3c-10177.2 6,55 1,817 Mlig040913.g2, Mlig026440.g5, Mlig006363.g11

MligTCv3c-9345.1 8,93 11,540 AMY1B Mlig019390.g1

MligTCv3c-562.7808 6,02 5,399 FGG Mlig020063.g6, Mlig020063.g1, Mlig020063.g7

MligTCv3c-11032.1 9,41 15,141 IFI30 Mlig007477.g1, Mlig002590.g1

MligTCv3c-8625.0 9,66 23,435 SLC6A1 Mlig030040.g3

MligTCv3c-9617.0 7,51 4,264 GLIPR2 Mlig019666.g1

MligTCv3c-562.2235 6,69 18,423 Mlig005528.g1, Mlig005528.g2, Mlig015418.g1

MligTCv3c-19933.0 11,86 40,533 C6orf168 Mlig002469.g1

MligTCv3c-8796.1 9,53 14,421 Mlig012851.g2

MligTCv3c-17663.1 5,51 20,007 Mlig015580.g2, Mlig034043.g1

MligTCv3c-7595.0 10,07 31,848 AC091742.1 Mlig033733.g4

MligTCv3c-8234.0 8,21 7,291 EML2 Mlig000089.g1, Mlig005285.g2

MligTCv3c-1324.0 6,51 8,550 Mlig018718.g1, Mlig034504.g1

MligTCv3c-3463.0 7,33 4,238 Mlig014858.g1, Mlig007780.g8

MligTCv3c-17367.0 5,99 16,408 CAPN9 Mlig002403.g2

MligTCv3c-562.2825 7,68 5,235 C11orf54 Mlig022135.g1

MligTCv3c-562.10112 9,8 14,654 AC010859.1 Mlig047157.g1

MligTCv3c-562.2339 6,92 10,407 SLK Mlig022558.g2

MligTCv3c-8004.0 9,05 13,935 Mlig055744.g1

MligTCv3c-12659.2 9,39 20,754 Mlig003743.g2

MligTCv3c-7253.0 10,69 28,975 Mlig025591.g2, Mlig025591.g1, Mlig014196.g2

MligTCv3c-20390.0 9,28 11,517 Mlig006568.g1, Mlig023270.g3, Mlig032900.g1

(14)

MligTCv3c-562.6455 7,96 4,100 Mlig008096.g3

MligTCv3c-3018.0 6,79 2,084 Mlig027552.g4

MligTCv3c-562.4540 10,14 24,269 Mlig005797.g1

MligTCv3c-13409.0 8,24 6,912 ECEL1 Mlig014428.g2, Mlig014428.g3

MligTCv3c-562.3260 8,41 3,835 CA9 Mlig023444.g1

MligTCv3c-17178.0 9,17 8,109 RAC1 Mlig001392.g2

MligTCv3c-3455.0 7,82 3,766 Mlig045262.g1

MligTCv3c-16506.0 9,31 12,095 CHRNA3 Mlig014979.g2, Mlig014979.g1

MligTCv3c-16308.2 8,52 8,149 Mlig001967.g1

MligTCv3c-562.1882 8,13 4,244 Mlig048587.g1, Mlig046484.g2, Mlig017292.g6

MligTCv3c-9856.1 7,73 4,992 Mlig001094.g1, Mlig001094.g8

MligTCv3c-10342.0 10,15 23,116 Mlig028263.g1

CABP7

MligTCv3c-15903.2 12,54 47,340 Mlig021384.g2, Mlig021384.g3

MligTCv3c-14520.0 7,35 1,092 C10orf27 Mlig021258.g7, Mlig005401.g1, Mlig021258.g2

MligTCv3c-15047.0 7,88 1,381 Mlig012046.g1, Mlig012046.g2

MligTCv3c-562.6793 8,47 2,374 PCDH9; PCDH19 Mlig016424.g3, Mlig016424.g4

MligTCv3c-562.7209 10,71 11,010 BEST4 Mlig032475.g1

MligTCv3c-11952.2 10,19 9,810 PPARG Mlig010675.g2

MligTCv3c-562.6963 6,68 1,472 SH3GLB2 Mlig049889.g1

MligTCv3c-6677.0 9,06 2,883 Mlig053171.g1

MligTCv3c-7218.0 9,43 12,918 Mlig051984.g1, Mlig051963.g1, Mlig047696.g1

MligTCv3c-7001.1 4,7 103,216 GSK3B; GSK3A Mlig023110.g6, Mlig023110.g3, Mlig031187.g3, Mlig031187.g2

MligTCv3c-20242.0 8,02 1,472 AKTIP Mlig038026.g2, Mlig058743.g1

MligTCv3c-11913.0 10,03 4,497 Mlig013659.g7

MligTCv3c-20034.3 8,05 2,468 ANKRD44 Mlig013365.g1

MligTCv3c-5429.1 6,26 1,043 SCP2 Mlig019212.g3, Mlig019212.g6

MligTCv3c-12430.0 8,33 19,273 Mlig002429.g3, Mlig035464.g1, Mlig058004.g1

MligTCv3c-7001.0 4,77 38,118 GSK3B Mlig031187.g1

MligTCv3c-7226.0 7,99 10,832 ZFP36L1 Mlig037170.g2, Mlig028414.g1

MligTCv3c-15441.0 10,65 70,423 ZC3H4 Mlig021954.g3

MligTCv3c-8785.0 7,38 1,333 PHKG2 Mlig012191.g4

MligTCv3c-5558.0 10,46 3,845 Mlig030385.g2

MligTCv3c-20417.0 10,18 27,628 Mlig022570.g2, Mlig022570.g1

MligTCv3c-562.7412 7,01 4,382 Mlig046759.g1

MligTCv3c-562.5961 7,75 74,024 Mlig029014.g3

MligTCv3c-7922.1 8,23 1,785 Mlig035953.g1, Mlig028523.g1

MligTCv3c-562.9345 6,07 1,049 C20orf27 Mlig049036.g1

MligTCv3c-562.1820 9,6 20,617 SLC47A1 Mlig015378.g2, Mlig015378.g3

MligTCv3c-10662.0 9,35 43,062 Mlig049219.g1, Mlig015656.g2

MligTCv3c-13912.0 9,76 1,984 Mlig049871.g1

MligTCv3c-562.6723 7,76 1,472 Mlig056482.g2

(15)

MligTCv3c-16104.0 7,63 2,508 Mlig054778.g1

MligTCv3c-7372.0 8,73 2,174 Mlig013762.g1, Mlig013762.g3, Mlig013762.g2

MligTCv3c-16608.0 7,54 13,597 ARHGAP11B Mlig004668.g3

MligTCv3c-11310.2 9,92 5,118 EIF4E1B Mlig001559.g2

MligTCv3c-2724.0 9,41 2,590 ZFP36L1 Mlig046228.g1, Mlig046227.g1

MligTCv3c-562.10165 8,65 1,902 TRIM3 Mlig028814.g2

MligTCv3c-6519.0 9,88 2,810 MPDZ Mlig050271.g1, Mlig042432.g2, Mlig009575.g2

MligTCv3c-562.8485 9,07 4,513 Mlig020178.g2, Mlig020178.g1

MligTCv3c-4032.0 7,88 2,344 Mlig003087.g2, Mlig003087.g1, Mlig003087.g3

MligTCv3c-7912.3 11,04 11,607 Mlig055325.g1, Mlig041663.g1

MligTCv3c-20289.1 9,04 2,786 Mlig020048.g1

MligTCv3c-6103.0 11,83 16,405 Mlig017850.g4

MligTCv3c-6901.1 7,83 25,514 Mlig001781.g1, Mlig001781.g3

MligTCv3c-4838.0 8,13 1,526 Mlig031152.g3, Mlig004042.g3, Mlig058530.g1, Mlig053095.g1

MligTCv3c-18275.0 7,48 4,178 WDR72 Mlig003212.g21

MligTCv3c-8789.0 10,07 4,606 GINS1 Mlig017777.g1

MligTCv3c-16665.2 7,25 4,521 Mlig039668.g1, Mlig057119.g1

MligTCv3c-17417.0 12,17 54,774 Mlig009351.g1

MligTCv3c-9239.0 9,74 10,149 GIN1 Mlig043815.g1

MligTCv3c-10984.0 6,9 3,385 Mlig019965.g1

MligTCv3c-4463.0 8,52 5,153 Mlig019737.g1

MligTCv3c-6712.0 8,91 1,595 ZC4H2 Mlig001061.g6, Mlig039001.g1, Mlig001506.g33, Mlig042659.g1

MligTCv3c-12977.1 8,55 1,655 C9orf21 Mlig003404.g1, Mlig031782.g1

MligTCv3c-10907.0 7,42 1,564 STAC2 Mlig039231.g1

MligTCv3c-16496.0 8,63 5,060 Mlig007893.g8

MligTCv3c-562.826 7,22 7,837 PIF1 Mlig041668.g1

MligTCv3c-562.4204 7,67 4,798 Mlig043733.g1, Mlig054021.g2, Mlig051014.g1

MligTCv3c-7340.0 8,6 3,725 Mlig008249.g2, Mlig008249.g1, Mlig008249.g3

MligTCv3c-18534.0 7,61 1,472 ABHD4 Mlig032339.g2

MligTCv3c-15903.0 8,3 4,039 Mlig021384.g1

MligTCv3c-9885.4 7,04 1,280 PCIF1 Mlig010768.g5, Mlig049987.g1

ELAV

MligTCv3c-14996.0 8,97 4,584 Mlig019890.g2, Mlig019890.g1, Mlig016297.g1

MligTCv3c-2460.0 12,21 28,775 Mlig037783.g1, Mlig047960.g1, Mlig044611.g1

MligTCv3c-18992.0 13,2 6,625 Mlig054964.g1

MligTCv3c-12810.1 12,89 12,883 Mlig008040.g1

MligTCv3c-3813.0 12,39 7,195 Mlig030222.g1

MligTCv3c-562.5586 9,06 6,021 Mlig045715.g1, Mlig000957.g6

MligTCv3c-14394.0 14,11 37,885 Mlig009897.g1

MligTCv3c-562.3458 10,47 2,573 Mlig017201.g1, Mlig057258.g1

(16)

MligTCv3c-562.8815 10,43 7,527 Mlig034325.g2, Mlig046840.g1, Mlig047349.g1

MligTCv3c-12810.0 10,48 15,284 Mlig014177.g2, Mlig014177.g1

MligTCv3c-11443.1 17,09 19,583 Mlig026800.g4, Mlig003893.g2

MligTCv3c-14948.0 14,44 5,218 ACCN2 Mlig001027.g10, Mlig042765.g1, Mlig049074.g1

MligTCv3c-15369.0 11,1 3,863 RABGGTA Mlig033390.g3

MligTCv3c-3744.1 11,25 1,820 Mlig048023.g1, Mlig036126.g1

MligTCv3c-17286.0 10,19 4,480 Mlig009336.g3

MligTCv3c-17892.1 12,96 6,982 Mlig039416.g1

MligTCv3c-9649.0 13,55 14,740 Mlig007616.g1

MligTCv3c-13630.0 10,72 4,887 Mlig034391.g1

MligTCv3c-3889.0 11,68 11,365 ESRRB; NR4A1; NR1D1 Mlig006912.g3, Mlig027806.g1, Mlig051033.g1

MligTCv3c-533.0 14,07 17,251 Mlig048648.g1, Mlig026890.g5

MligTCv3c-15066.0 14,25 7,475 TCF7L2 Mlig042368.g1

MligTCv3c-6705.4 13,88 11,662 Mlig031488.g2, Mlig031488.g7, Mlig032494.g2, Mlig013976.g10

MligTCv3c-562.2844 12,28 21,841 Mlig001484.g4, Mlig035998.g1

MligTCv3c-562.1661 10,69 1,497 Mlig001907.g1

MligTCv3c-7590.0 11,35 5,566 ZBED4 Mlig002362.g2, Mlig002362.g1

PW_75

MligTCv3c-562.8230 14,69 121,899 CRB2 Mlig035376.g1

MligTCv3c-17542.0 10,81 13,626 ANKUB1 Mlig021153.g2

MligTCv3c-9611.1 12,35 27,208 Mlig057497.g1

MligTCv3c-17862.0 8,75 90,445 ANKS1A Mlig024756.g1, Mlig024756.g2

MligTCv3c-18151.0 6,3 175,152 CAPN9 Mlig013791.g1

MligTCv3c-14146.0 10,81 30,859 Mlig037355.g1

MligTCv3c-562.10075 10,92 63,776 Mlig029598.g3

Refinement of neoblast classification

Neoblasts are the adult stem cells of the regenerating flatworms and are considered the only dividing somatic cells [4]. However, distinguishing neoblasts from germline cells that also proliferate in the adult animal can be very difficult. Current approaches to characterize neoblasts were based on sorting out the dividing cells, depleting the neoblasts using irradiation or performing single cell sequencing [7, 11, 12]. The approach that we present has some obvious advantages over the abovementioned ones. FACS isolation based on GFP signal is a method of high specificity [24]. We have used CABP7 and ELAV, specific tissue markers that allowed us to efficiently isolate and sequence germline cells. Since some of the previously classified ‘stringent neoblast’ genes are enriched in the latter cell types, we can confidently filter out and reclassify them. In total, only 26 transcript clusters previously classified as ‘stringent neoblast’ and 78 classified as ‘neoblast’ are detected in the analyzed tissues, confirming an overall good stringency of the previous classification and providing further refinements (Table 5). This will allow for a better identification of neoblasts, including the prospect of tagging these important cells.

(17)

Table 5 Reclassification of the ‘Stringent neoblast’ genes

This study

Previous classification

(Grudniewska 2016) Cluster Transcripts

PW75 Neoblast, stringent MligTCv3c-10934.1 Mlig010454.g2

CABP7;ELAV Neoblast, stringent MligTCv3c-11775.0 Mlig015387.g2

PW75 Neoblast, stringent MligTCv3c-12558.0 Mlig011148.g5

CABP7 Neoblast, stringent MligTCv3c-12632.0 Mlig014817.g1

CABP7 Neoblast, stringent MligTCv3c-12632.1 Mlig031617.g1

APOB;PW75 Neoblast, stringent MligTCv3c-12991.1 Mlig001299.g5

PW75 Neoblast, stringent MligTCv3c-12991.3 Mlig001299.g12

CABP7 Neoblast, stringent MligTCv3c-13584.1 Mlig024481.g6, Mlig030448.g3

CABP7 Neoblast, stringent MligTCv3c-14113.4 Mlig027683.g4

PW75 Neoblast, stringent MligTCv3c-14199.0 Mlig013954.g2, Mlig013954.g3, Mlig013954.g1

CABP7 Neoblast, stringent MligTCv3c-14993.0 Mlig006966.g3, Mlig006966.g2, Mlig006966.g1

CABP7 Neoblast, stringent MligTCv3c-16607.0 Mlig009805.g1

PW75 Neoblast, stringent MligTCv3c-16964.1 Mlig009917.g3, Mlig009917.g2

CABP7 Neoblast, stringent MligTCv3c-17072.0 Mlig012115.g2

PW75 Neoblast, stringent MligTCv3c-17447.0 Mlig028945.g1, Mlig011801.g1, Mlig028945.g2

CABP7 Neoblast, stringent MligTCv3c-17646.0 Mlig033384.g2, Mlig033384.g1

PW75 Neoblast, stringent MligTCv3c-17650.1 Mlig015287.g3

ELAV Neoblast, stringent MligTCv3c-18459.1 Mlig007542.g2

ELAV Neoblast, stringent MligTCv3c-19203.2 Mlig018329.g3, Mlig018329.g1

CABP7;ELAV;PW75 Neoblast, stringent MligTCv3c-19651.1 Mlig015935.g1

ELAV Neoblast, stringent MligTCv3c-19653.2 Mlig030213.g2

PW75 Neoblast, stringent MligTCv3c-20335.0 Mlig032040.g5, Mlig032040.g3, Mlig032040.g1

PW75 Neoblast, stringent MligTCv3c-562.1499 Mlig003338.g1, Mlig024774.g6

PW75 Neoblast, stringent MligTCv3c-6068.0 Mlig033462.g1, Mlig012679.g2

PW75 Neoblast, stringent MligTCv3c-6189.1 Mlig024991.g2

CABP7 Neoblast, stringent MligTCv3c-9967.2 Mlig013271.g2

PW75 Neoblast MligTCv3c-10579.1 Mlig017282.g2, Mlig017282.g1

APOB;ELAV Neoblast MligTCv3c-10730.1 Mlig016053.g1

ELAV Neoblast MligTCv3c-11597.1 Mlig034073.g3, Mlig003527.g4

APOB;CABP7;ELAV Neoblast MligTCv3c-11847.4 Mlig006428.g2

PW75 Neoblast MligTCv3c-12081.3 Mlig012117.g1

PW75 Neoblast MligTCv3c-12264.0 Mlig028327.g4, Mlig028327.g6, Mlig028327.g2

PW75 Neoblast MligTCv3c-12827.0 Mlig009629.g2, Mlig009629.g1

CABP7;ELAV Neoblast MligTCv3c-13041.0 Mlig002955.g2

CABP7 Neoblast MligTCv3c-13321.1 Mlig003937.g4

PW75 Neoblast MligTCv3c-13371.1 Mlig000869.g1, Mlig003095.g1

APOB;PW75 Neoblast MligTCv3c-13371.2 Mlig009577.g1

PW75 Neoblast MligTCv3c-13462.0 Mlig013926.g2

PW75 Neoblast MligTCv3c-13462.1 Mlig013926.g1, Mlig013926.g3

PW75 Neoblast MligTCv3c-13653.1 Mlig021607.g4, Mlig021607.g2, Mlig021607.g1

PW75 Neoblast MligTCv3c-13692.0 Mlig034397.g3, Mlig012357.g4, Mlig018348.g1

APOB;PW75 Neoblast MligTCv3c-13880.2 Mlig000285.g6

(18)

PW75 Neoblast MligTCv3c-14039.2 Mlig033784.g3

PW75 Neoblast MligTCv3c-14127.1 Mlig001018.g3

PW75 Neoblast MligTCv3c-14258.1 Mlig004678.g1

CABP7;ELAV Neoblast MligTCv3c-14656.1 Mlig022589.g5

ELAV;PW75 Neoblast MligTCv3c-15036.0 Mlig010034.g1, Mlig010034.g2, Mlig010811.g5

APOB;PW75 Neoblast MligTCv3c-15079.1 Mlig009193.g3, Mlig009193.g2

PW75 Neoblast MligTCv3c-15364.0 Mlig012350.g4, Mlig012350.g1, Mlig012350.g3

PW75 Neoblast MligTCv3c-15432.7 Mlig018787.g1

APOB;PW75 Neoblast MligTCv3c-15493.20 Mlig022192.g2

PW75 Neoblast MligTCv3c-15493.38 Mlig022192.g1

PW75 Neoblast MligTCv3c-15683.0 Mlig032548.g2, Mlig032548.g1

APOB Neoblast MligTCv3c-15761.1 Mlig001606.g3, Mlig028397.g3

PW75 Neoblast MligTCv3c-15761.2 Mlig008434.g1

PW75 Neoblast MligTCv3c-16252.1 Mlig013126.g1, Mlig013126.g3

PW75 Neoblast MligTCv3c-17095.0 Mlig019578.g1

PW75 Neoblast MligTCv3c-17095.1 Mlig009105.g1, Mlig026192.g1

PW75 Neoblast MligTCv3c-17106.1 Mlig031719.g2, Mlig033452.g2

CABP7 Neoblast MligTCv3c-17343.2 Mlig004759.g5, Mlig004759.g6, Mlig004759.g4

APOB Neoblast MligTCv3c-17425.0 Mlig010061.g5

PW75 Neoblast MligTCv3c-17585.0 Mlig013573.g3, Mlig020541.g1

CABP7 Neoblast MligTCv3c-17613.0

Mlig008175.g1, Mlig003008.g4, Mlig003008.g1, Mlig003008.g2

APOB Neoblast MligTCv3c-17943.2 Mlig013282.g2

CABP7;ELAV Neoblast MligTCv3c-17984.0 Mlig012058.g4, Mlig012058.g2, Mlig012058.g1

PW75 Neoblast MligTCv3c-18212.1 Mlig014021.g3, Mlig014021.g2

PW75 Neoblast MligTCv3c-18228.2 Mlig015545.g1, Mlig015545.g2

APOB Neoblast MligTCv3c-18460.1 Mlig032511.g1

APOB Neoblast MligTCv3c-18562.0 Mlig015479.g1

PW75 Neoblast MligTCv3c-18610.1 Mlig023668.g4, Mlig023668.g2

CABP7 Neoblast MligTCv3c-19380.1 Mlig018915.g2

APOB Neoblast MligTCv3c-19563.1 Mlig004624.g1

PW75 Neoblast MligTCv3c-19740.0 Mlig027701.g3, Mlig027701.g1

PW75 Neoblast MligTCv3c-19740.4 Mlig027701.g2

PW75 Neoblast MligTCv3c-19820.0 Mlig034055.g3, Mlig034055.g2, Mlig034055.g1

APOB Neoblast MligTCv3c-19846.4 Mlig014446.g2

PW75 Neoblast MligTCv3c-20106.0 Mlig033886.g8, Mlig033886.g4, Mlig033886.g3

PW75 Neoblast MligTCv3c-20147.0 Mlig011727.g4, Mlig017609.g3

APOB;PW75 Neoblast MligTCv3c-2016.6 Mlig009967.g1, Mlig010697.g1

APOB;PW75 Neoblast MligTCv3c-4819.2 Mlig027547.g1, Mlig027547.g4

PW75 Neoblast MligTCv3c-5545.0 Mlig027541.g2, Mlig027541.g1

PW75 Neoblast MligTCv3c-562.1547 Mlig004498.g1

PW75 Neoblast MligTCv3c-562.1548 Mlig004498.g4

CABP7 Neoblast MligTCv3c-562.2503 Mlig028427.g1

ELAV;PW75 Neoblast MligTCv3c-562.379 Mlig032511.g4

PW75 Neoblast MligTCv3c-562.4244 Mlig020888.g1

(19)

PW75 Neoblast MligTCv3c-562.899 Mlig024657.g1, Mlig047945.g1, Mlig011396.g1

APOB Neoblast MligTCv3c-562.9187 Mlig027071.g3

PW75 Neoblast MligTCv3c-6068.2 Mlig006812.g3

PW75 Neoblast MligTCv3c-6305.0 Mlig010075.g2, Mlig010075.g1

PW75 Neoblast MligTCv3c-6634.2 Mlig019152.g2

PW75 Neoblast MligTCv3c-7279.0 Mlig031723.g3

PW75 Neoblast MligTCv3c-7583.1 Mlig023292.g4, Mlig023292.g2

APOB Neoblast MligTCv3c-7824.1 Mlig025031.g3

CABP7 Neoblast MligTCv3c-8123.0 Mlig035811.g1, Mlig051641.g1

PW75 Neoblast MligTCv3c-8151.3

Mlig014777.g1, Mlig013865.g1, Mlig014777.g2, Mlig004290.g1

APOB;ELAV;PW75 Neoblast MligTCv3c-8315.0 Mlig002721.g6

PW75 Neoblast MligTCv3c-8388.0 Mlig043477.g1, Mlig023195.g1

PW75 Neoblast MligTCv3c-9379.0 Mlig003872.g2, Mlig003872.g1

PW75 Neoblast MligTCv3c-9620.6 Mlig014067.g2

PW75 Neoblast MligTCv3c-9866.0 Mlig018910.g3

PW75 Neoblast MligTCv3c-9866.1 Mlig018910.g10, Mlig018910.g6

DISCUSSION

In this work we aimed at improving the understanding of the transcription profiles of different cell types in flatworms. We have also introduced a new approach that will facilitate future research in this field. Using tissue-specific expression of the fluorescent protein we were able to isolate desired cells, sequence them and calculate their expression profiles based on the transcript enrichment. Finally, we have updated the known ‘neoblast’ list by reclassifying germline upregulated genes.

The FACS approach that we chose proved to be effective. The comparison of differently treated cells enabled us to efficiently correct for any bias introduced during the sorting step. The M. lignano transgenic lines chosen for this research proved to be reliably expressing GFP

in a specific manner. APOB, CABP7 and ELAV lines had their respective tissue-specific marker genes upregulated, confirming the efficiency of the method. We could select the transcripts enriched in these tissues and by cross-checking with each other, identify ones that could be used as transcriptomic signatures. We used these unique markers to improve the known data sets mainly by eliminating genes specific for germline, because of their presence in either ELAV or CABP7. The improved ‘neoblast stringent’ gene list points the way for future research using M. lignano and enforces the advantage given by the availability of the

transgenic techniques in this model organism.

Additionally, we were able to identify a wealth of potentially important genes in which tissue specificity was not clearly visible. The fourth transgenic line used was not specific for the epidermis, making it a valuable addition to the tested markers. The most interesting finding was high overlap of the PW75 transcripts with the neoblast-specific cell markers from previous studies. This was more intriguing due to the non-significant overlap between PW75 and ‘neoblast stringent’ transcripts, suggesting that the selected cells might be in the

(20)

process of differentiation into somatic cells.

All of the presented findings confirm the great potential of using transgenic flatworms to improve the currently available methods and data sets. We believe that with the development in the field of flatworm transgenics M. lignano can soon become an invaluable model organism

for flatworm research.

METERIALS AND METHODS

M. lignano lines and cultures. NL10 wild type line as well as NL20, NL21, NL22 and NL23

were previously described [13]. Animals were cultured under laboratory conditions in plastic Petri dishes (Greiner), filled with nutrient-enriched artificial sea water (Guillard's f/2 medium). Worms were fed ad libitum on the unicellular diatom Nitzschia curvilineata

(Heterokontophyta, Bacillariophyceae) (SAG). Climate chamber conditions were set on 25 ˚C with constant aeration, a 14/10 h day/night cycle.

Cloning of the PW75 plasmid

PW75 plasmid was made by Cloning the GFP::3'UTR fragment from the optiMac plasmid [13] into pGEM-T vector (Promega) and cloning the 741 bp long upstream region of the Mlig032396.g1 transcript at the 5' of GFP using HindIII and XbaI restriction sites.

Establishing the PW75 transgenic line

The PW75 line was established from a single transgenic individual crossed with a single wild type NL10. The transgenic worm was obtained by microinjecting the PW75 plasmid (no irradiation was used) using the previously described microinjection protocol [13]. The positive progeny was selected and backcrossed to select homozygous individuals, that were used to establish the transgenic line.

FACS and RNA isolation. Each sample was prepared from 50 worms

Worms were starved for 20–24 h prior to FACS and RNA isolation to minimize the possibility of diatom RNA contamination. Starved worms were macerated into single cells by pipetting in the CMF-ASW medium (31g NaCl, 0.8g KCl, 1.6g Na2SO4, 0.2g NaHCO3,in 1l ddH2O, pH8) followed by filtration through a 35 µm nylon mesh (BD). The cells were then sorted based on their size and intensity of the signal in the green channel. A gating strategy was developed to eliminate dead cells and debris. 5000 to 15000 cells per sample were collected in RNase-free tubes. RNA was extracted using Direct-zol RNA Kits (Zymo Research) following the manufacturer's manual.

Preparation and sequencing of RNA-seq libraries

RNA-Seq libraries were made using the CEL-Seq method [12, 17] and paired-end sequenced using Illumina NextSeq machine (Illumina), with 12 cycles for Read1, which contained the

(21)

CEL-Seq barcode sequences, and 80 cycles for Read2. No Illumina Index read was used. The resulting raw reads were split into samples using custom scripts and Read1 barcode data.

Differential expression analysis of RNA-Seq data

The raw reads were mapped to the Mlig_3_7 genome assembly [13] using STAR software version 2.6.0c [25] and gene models from the Mlig_RNA_3_7_v3 transcriptome assembly [19]. Read counts were assigned to the previously established transcript clusters (sets of overlapping transcripts) using Corset v. 1.06 [26]. Transcript clusters with at least 1 cpm in at least 3 samples were used in the analysis, and before calculating differential expression the read counts were corrected using RUVSeq package (k=1) to remove unwanted variation. For the differential gene expression analysis generalized linear models with quasi-likelihood F-test were used, as implemented in the glmQLFit and glmQLFTest functions in the edgeR package [22]. False discovery rate (FDR) of 0.05 was used as a significance cutoff. Heatmap was constructed using heatmap.2 function from the gplots R package. Principle component analysis was performed with plotPCA function from the BiocGenerics R package [27].

(22)

REFERENCES

[1] T. H. Morgan, “Experimental studies of the regeneration of Planaria maculata,”

Arch. fur Entwickelungsmechanik der Org., vol. 7, no. 2–3, pp. 364–397, 1898.

[2] H. Randolph, “Observations and experiments on regeneration in Planarians,”

Arch. für Entwicklungsmechanik der Org., pp. 352–372, 1897.

[3] M. Morita, J. B. Best, and J. Noel, “Electron microscopic studies of planarian regeneration,” J. Ultrastruct. Res., vol. 27, no. 1–2, pp. 7–23, 1969.

[4] J. C. Rink, “Stem cell systems and regeneration in planaria,” Development Genes

and Evolution. 2013.

[5] D. E. Wagner, I. E. Wang, and P. W. Reddien, “Clonogenic neoblasts are

pluri-potent adult stem cells that underlie planarian regeneration,” Science (80-. )., vol. 332, no. 6031, pp. 811–816, 2011.

[6] J. C. van Wolfswinkel, D. E. Wagner, and P. W. Reddien, “Single-cell analysis

reveals functionally distinct classes within the planarian stem cell compartment.,” Cell Stem Cell, vol. 15, no. 3, pp. 326–39, Sep. 2014.

[7] A. Zeng, H. Li, L. Guo, X. Gao, S. McKinney, Y. Wang, Z. Yu, J. Park, C.

Sem-erad, E. Ross, L. C. Cheng, E. Davies, K. Lei, W. Wang, A. Perera, K. Hall, A. Peak, A. Box, and A. Sánchez Alvarado, “Prospectively Isolated Tetraspanin+Neoblasts Are Adult Pluripotent Stem Cells Underlying Planaria Regeneration,” Cell, pp. 1593–1608, 2018. [8] M. Plass, J. Solana, F. Alexander Wolf, S. Ayoub, A. Misios, P. Glažar, B. Ober-mayer, F. J. Theis, C. Kocks, and N. Rajewsky, “Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics,” Science (80-. )., vol. 360, no. 6391, 2018.

[9] C. T. Fincher, O. Wurtzel, T. de Hoog, K. M. Kravarik, and P. W. Reddien, “Cell

type transcriptome atlas for the planarian Schmidtea mediterranea,” Science (80-. )., vol. 360, no. 6391, 2018.

[10] M. V. A. N. D. E. Run, S. Heimfeld, G. J. Spangrude, and I. L. Weissman, “Mouse

hematopoietic stem-cell antigen Sca-1 is a member of the Ly-6 antigen family,” vol. 86, no. June, pp. 4634–4638, 1989.

[11] J. Solana, D. Kao, Y. Mihaylova, F. Jaber-Hijazi, S. Malla, R. Wilson, and A. Aboobaker, “Defining the molecular profile of planarian pluripotent stem cells using a combinatorial RNA-seq, RNA interference and irradiation approach,” Genome Biol., vol. 13, no. 3, p. R19, 2012.

[12] M. Grudniewska, S. Mouton, D. Simanov, F. Beltman, M. Grelling, K. de Mulder,

W. Arindrarto, P. M. Weissert, S. van der Elst, and E. Berezikov, “Transcriptional signa-tures of somatic neoblasts and germline cells in Macrostomum lignano,” Elife, vol. 5, pp. 1–23, 2016.

[13] J. Wudarski, D. Simanov, K. Ustyantsev, K. de Mulder, M. Grelling, M.

Grud-niewska, F. Beltman, L. Glazenburg, T. Demircan, J. Wunderer, W. Qi, D. B. Vizoso, P. M. Weissert, D. Olivieri, S. Mouton, V. Guryev, A. Aboobaker, L. Schärer, P. Ladurner, and E. Berezikov, “Efficient transgenesis and annotated genome sequence of the regenerative flatworm model Macrostomum lignano,” Nat. Commun., vol. 8, no. 1, p. 2120, 2017.

[14] S. Mouton, J. Wudarski, M. Grudniewska, and E. Berezikov, “The regenerative

flatworm Macrostomum lignano , a model organism with high experimental potential,” Int. J. Dev. Biol., vol. 558, pp. 551–558, 2018.

[15] J. C. Boisset, J. Vivié, D. Grün, M. J. Muraro, A. Lyubimova, and A. Van

Oude-naarden, “Mapping the physical network of cellular interactions,” Nat. Methods, vol. 15, no. 7, pp. 547–553, 2018.

[16] T. Hashimshony, F. Wagner, N. Sher, and I. Yanai, “CEL-Seq: Single-Cell

RNA-Seq by Multiplexed Linear Amplification,” CellReports, vol. 2, no. 3, pp. 666–673, 2012.

[17] T. Hashimshony, N. Senderovich, G. Avital, A. Klochendler, Y. De Leeuw, L.

Anavy, D. Gennert, S. Li, K. J. Livak, O. Rozenblatt-rosen, Y. Dor, and A. Regev, “CEL-Seq2 : sensitive highly-multiplexed single-cell RNA-Seq,” Genome Biol., pp. 1–7, 2016.

[18] The External RNA Controls Consortium, “The External RNA Controls

(23)

[19] M. Grudniewska, S. Mouton, M. Grelling, A. H. G. Wolters, J. Kuipers, B. N. G. Giepmans, and E. Berezikov, “A novel flatworm-specific gene implicated in reproduction in Macrostomum lignano,” Sci. Rep., vol. 8, no. 1, pp. 1–10, 2018.

[20] D. Grün, L. Kester, and A. Van Oudenaarden, “Validation of noise models for

single-cell transcriptomics,” Nat. Methods, vol. 11, no. 6, pp. 637–640, 2014.

[21] D. Grün, A. Lyubimova, L. Kester, K. Wiebrands, O. Basak, N. Sasaki, H. Clevers,

and A. Van Oudenaarden, “Single-cell messenger RNA sequencing reveals rare intestinal cell types,” Nature, vol. 525, no. 7568, pp. 251–255, 2015.

[22] M. D. Robinson, D. J. McCarthy, and G. K. Smyth, “edgeR: A Bioconductor

pack-age for differential expression analysis of digital gene expression data,” Bioinformatics, vol. 26, no. 1, pp. 139–140, 2010.

[23] O. Wurtzel, L. E. Cote, A. Poirier, R. Satija, P. W. Reddien, O. Wurtzel, L. E. Cote, A. Poirier, R. Satija, A. Regev, and P. W. Reddien, “A Generic and Cell-Type-Spe-cific Wound Response Precedes Regeneration in Planarians,” Dev. Cell, vol. 35, no. 5, pp. 632–645, 2015.

[24] H. M. Shapiro, Practical Flow Cytometry. Wiley, 2005.

[25] A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, and T. R. Gingeras, “STAR: Ultrafast universal RNA-seq aligner,” Bioinformat-ics, vol. 29, no. 1, pp. 15–21, 2013.

[26] N. M. Davidson and A. Oshlack, “Corset: Enabling differential gene expression analysis for de novo assembled transcriptomes,” Genome Biol., vol. 15, no. 7, pp. 1–14, 2014.

[27] and M. M. Wolfgang Huber1, Vincent J. Carey2, 3, Robert Gentleman4, Simon

Anders1, Marc Carlson5, Benilton S. Carvalho6, Hector Corrada Bravo7, Sean Davis8, Lau-rent Gatto9, Thomas Girke10, Raphael Gottardo11, Florian Hahne12, Kasper D. Hansen13, 14, Rafael A. Iriza, “Orchestrating high-throughput genomic analysis with Bioconductor,” Nat Methods., vol. 12, no. 2, pp. 115–121, 2015.

(24)
(25)

Referenties

GERELATEERDE DOCUMENTEN

Work over the past several years has led to the development of molecular resources and tools, including high-quality genome and transcriptome assemblies, transcriptional profiling

Development of genetic manipulation tools in Macrostomum lignano for dissection of molecular mechanisms of regeneration..

Development of genetic manipulation tools in Macrostomum lignano for dissection of molecular mechanisms of regeneration..

In addition, we discuss the experimental potential of this model organism for different research questions related to regeneration and stem cell

Since genome-guided transcriptome assemblies are generally more accurate than de novo transcriptome assemblies, we generated a new transcriptome assembly based on the Mlig_3_7

Absence of specific sequence requirements for the insert, ease of use, high activity and high cargo capacity (i.e. can be used to insert large DNA fragments into the hosts'

lignano, elevated temperatures of 25˚C-30˚C substantially speed-up the development, lead to faster manifestation of RNAi phenotypes, and regeneration time and increase

In chapter 5 we introduced heat shock inducible transgenic line, a valuable addition to the Macrostomum genetic manipulation toolkit that is being presented in this thesis.