Finding messenger RNA without a poly(A) tail An argument for using rRNA depletion over mRNA enrichment for RNA-‐Seq sample preparation

(1)

Finding messenger RNA without a poly(A) tail

An argument for using rRNA depletion over mRNA enrichment for RNA-‐Seq

sample preparation

Abstract

RNA-‐Seq is a new technology in the field of transcriptomics. This technology has several benefits over existing hybridization techniques, where the previous hybridization techniques require prior knowledge of the region of the genome to be tested RNA-‐Seq does not. However due to large amount of ribosomal RNA present in total RNA it is necessary to prepare the samples in such a way that most of the rRNA is removed otherwise most of the reads generated by RNA-‐seq are from the same rRNA. Two methods exist to achieve this; mRNA enrichment, where the poly(A) tails of mRNA are used to remove and enrich the mRNA from the sample and rRNA depletion where specific probes are designed to target and remove rRNA from the total RNA.

In the following experiment we tested RiboNix, a rRNA removal method. We also investigated if there are mRNA’s present in Arabidopsis thaliana that lack a poly(A) tail. It was found that A. thaliana indeed has several mRNA’s without a poly(A) tail, this evidence makes rRNA removal as a sample preparation step for RNA-‐Seq the preferred choice since this give a less biased view of the genome.

Remy Jorna 5998352

Green Student Lab

(2)

Introduction

The understanding of the genome has been a hot topic in biological research during the last decades. Transcriptomics is defined by Nature as a specific study within genomics which investigates the complete set of RNA transcripts produced by the genome under specific circumstances or in a specific cell (Nature Publishing Group, 2015). The knowledge of the transcriptome can lead to the identification of genes that are being expressed under specific conditions or in certain cells. To be able to identify these conditions can lead to more insight in the mechanism of stem and cancer cells. Furthermore, transcriptomics is used to understand the molecular reactions underlying embryonic development and could therefore eventually be a rich resource in making embryo selection in in vitro fertilization (Schwanhausser et al., 2011). On top of that, the knowledge of all stages of gene expression can help finding biomarkers to use in the risk assessment of specific compounds (Szabo, 2014).

Several technologies have been developed to be able to quantify the transcriptome, such as microarrays and sequence-‐based methods (Wang, Gerstein, & Snyder, 2009) The microarray approach, also known as hybridization-‐based method, is based on the technology of Southern blotting, in which fragments of DNA are attached to a substrate and then probed with a known DNA sequence (Maskos & Southern, 1992) . The same principle is applied in microarrays. Fluorescently labelled messenger RNA (mRNA) is incubated with customized microarrays and after scanning the fragments of mRNA present in the original sample can be evaluated (Wang et al., 2009). This method is not very time consuming, it is inexpensive and the equipment necessary for this assay is often already available in the laboratory (Gundogdu & Elmi, 2015) . However, as is applicable for most assays, this approach also has a few limitations such as the dependence on current knowledge about genome sequence, as the used probe determines which mRNA sequences will be found (Wang et al., 2009). Furthermore, the background levels of cross-‐hybridization might be high, saturation of the probes might occur and therefore there is a limited range of dynamic detection (Wang et al., 2009).

The mentioned limitations of the hybridization-‐based method are not applicable to the sequence-‐based method. In this approach, no prior knowledge of the sequence is needed because no probe is used for sequencing. However due to the fact that the vast majority of RNA (>90%) consists of ribosomal RNA (rRNA) the vast majority of reads are from the same ribosomal RNA if no sample preparation is performed (Wilhelm & Landry, 2009) . Two of these preparation methods are: mRNA enrichment and rRNA depletion. In the first method, mRNA enrichment, the poly(A) tails of the mRNA are targeted with biotinylated poly d(T) probes and then removed using magnetism. The benefits of this procedure is that the same probes can be used for a wide range of organisms, the only requirement is that they have a poly(A) tail. This can also be a problem as not all mRNA have a poly(A) tail and some are even found to be bimorphic (Yang, Duff, Graveley, Carmichael, & Chen, 2011) This in turn can lead to a loss of information (Cui et al., 2010) . In the second method, rRNA depletion, specifically designed probes are used to target the rRNA, which is then removed from the total RNA by magnetism. This approach is more costly due to the fact that the probes have

(3)

to be specifically designed for each species tested, this also makes it a less flexible approach as basic knowledge of the genome is required in order to design the probes.

The mentioned issue associated with the mRNA enrichment method in the sequence-‐based approach is being examined in this study. It is studied whether all mRNA is actually polyadenylated using the microarray method. This method is used by creating two identical rRNA depleted fractions, the first fraction will be treated with poly-‐A-‐polymerase, adding poly(A) tails to all the present mRNA. The second fraction will not receive this enzyme possibly leaving several mRNA’s without a poly(A) tail. During the following IVT reaction these poly(A) tail lacking mRNA’s will not be labelled or amplified. When results from the microarray are compared we suspect to find genes present on the poly-‐a-‐polymerase treated fraction that are absent on the fraction not treated with this enzyme. The possibility of designing species aspecific probes for the rRNA depletion method is investigated.

Materials and Methods

Probe Design: First the species were checked for their homology, this was done by using the

annotated rRNA sequences of the large subunit (25S, 5.8S and 5S), the small subunit (18S) and of the chloroplast (23S and 16S) of Arabidopsis thaliana found on the TAIR website (TAIR, 2015). Which were then BLASTed against the entire genomes of both Cucumis sativus and Solanum lycopersicum. The obtained sequences were then aligned using CLC-‐ Workbench. On the aligned sequences of both the cholorplast small-‐ and large subunits several probes were designed, aiming for a good spread of their target locations.

RNA Isolation: Total RNA from the samples of Solanum lycopersicum, Cucumis sativus and Arabidopsis thaliana was isolated using the RNeasy Kit from Qiagen and stored at -‐40° C

until use in the rRNA depletion.

rRNA depletion: RiboNix was performed on 1 µg of total RNA for all three chosen species.

RiboNix consists of several steps. In the first step, the hybridization step, the biotinylated probes are hybridized with their target rRNA. The probe-‐total RNA mix is incubated at 70° C for 5 minutes and are then cooled down with 0.02° C/s to 37° C. After the hybridization step the capture step follows, during this step the biotin of the probes are bound to the streptavidin coated on the magnetic beads. Finally in the depletion step, the rRNA-‐probe complexes are separated from the total RNA using magnetism.

Polyadenylation: The rRNA depleted samples of Arabidopsis thaliana were then

polyadenylated, half of the samples were treated with Poly(A)polymerase ensuring the presence of a poly(A) tail and half of the samples were not treated with Poly(A)polymerase. This polyadenylation step results in two samples, a poly(A)+ sample and poly(A)-‐ sample.

IVT: An in vitro transcription was performed on both the poly(A)+ and poly(A)-‐ sample. This

was done to create cDNA and to fluorescently label the samples to be able to detect and quantify the samples on the Microarray. Only poly d(T) primers were used. This insured that

(4)

mRNA’s lacking a poly (A) tail would not be amplificated and labelled and therefor should not be found on the Micro Array.

Microarray: Both of the IVT products are analyzed by hybridizing the products on two

seperate GeneChip® Arabidopsis ATH1 Genome Arrays from affymetrix.

Results and Discussion

Probe Design: The homology of rRNA in A. thaliana, C. sativus and S. lycopersicum is high, as

can be seen from the aligned sequences of the 18S rRNA (17S for S. lycopersicum) in figure 1.

Figure 1 aligned 18S sequences of A. thaliana, C. sativus and S. lycopersicum.

We were unable to design probes for the smaller 5.8S and 5S rRNA’s. This was due to their small size, 156 nt and 121 nt respectively (TAIR, 2015), with such a limited size the chance to find a suitable probe of an appropiate length is quite small. Therefore it came as no surprise that we were unable to find a good probe for the these rRNA’s. For the remaining 4 rRNA’s 9 probes were designed (table 1) ensuring a decent spread in the position of the probe on the target rRNA.

Table 1 Overview of the probes designed with CLC-‐Workbench on the aligned sequences of A. thaliana, C.

sativus and S. lycopersicum.

RiboNix, rRNA depletion: The results from the rRNA depletion are shown in figures 2-‐5.

Figure 2 shows the tapestation results for all three species, for this particular run only the

Set Probe Sequence Target

rRNA Probe lenght (nt) Target length (nt) Position of probe 1 1 5` CTGTCCCTGTTAATCATTACTCC 3' 18S (17S) 23 1902 938-‐960 1 2 5` CAAATCGCTCCACCAACTAAGAA 3` 18S (17S) 23 1902 1370-‐1392 1 3 5' GTTTCTTTTCCTCCGCTTATTG 3' 25S 22 4310 892-‐913 1 4 5' CTTCCCTTGCCTACATTGTTCC 3' 25S 22 4310 2740-‐2761 1 5 5' CCACTCTGCCACTTACAATACC 3' 25S 22 4310 4186-‐4207 2 6 5' CTTTTGCTTTCTTTTCCTCTGGCTACT 3' 23S 27 2850 189-‐215 2 7 5' TTTCACCCCTAACCACAACTCATCC 3' 23S 25 2850 771-‐795 2 8 5' TTTCCAGCTGTTGTTCCCCTCCC 3' 16S 23 1514 133-‐155 2 9 5' GTGCTTTCGCCGTTGGTGTTCTT 3' 16S 23 1514 699-‐722

(5)

first set of probes were used (see table 1). As we expected we only saw the depletion of 25S rRNA and 18S rRNA which were the target sequences of the used probes.

Figure 2 Tapestation results from the rRNA depletion method where only the first probe set was used (see table 1)

Figure 3 shows the results from the rRNA depletion using both sets of probes on A. thaliana. We expected to see a depletion of all the major rRNA’s. However we only saw a slight depletion of the 23S and 16S rRNA. The results are also shown in an electropherogram (Figure 4). Note that the concentrations of the sampled RNA fractions (total RNA – rRNA and mRNA) are not the same, this means that only the relative peak height can be used to show any succes of depletion. The electropherogram shows that quite some of 25S and 18S rRNA is being depleted, however only very little of 23S and 16S rRNA can be found in the rRNA fraction, whereas a relatively high amount of these rRNA’s can still be found in ‘rRNA depleted fraction’. This indicates a low removal rate of these rRNA’s. Even after several other rRNA removal attemps we were unable to deplete these rRNA’s.

Figure 3 Gel / tapestation results from the rRNA removal procedure performed on A. thaliana, using both sets of probes.

(6)

Figure 4 Electropherogram of the rRNA depletion on A. thaliana showing total RNA (blue), rRNA depleted RNA (red) and the captured rRNA fraction (green).

IVT and polyadenylation: The rRNA depleted samples of Arabidopsis thaliana were used for

the final part of the experiment. Figure 6 shows the electropherogram of the IVT product of both the polyadenylated fraction (poly(A)+) and the nonpolyadenylated fraction (poly(A)-‐).

Figure 5 Electropherogram of the IVT products for both the poly(A)+ fraction (blue) and the poly(A)-‐ fraction (red).

Microarray: Figure 6 shows the Micro Array results. Every dot represents a gene on the array

where the value on the X-‐axis is the found intensity for the poly(A)-‐ fraction and the value on the Y-‐axis that for the poly(A)+ fraction. Showing the results in this manner gives a clear view of genes that lack a poly(A) tail, these genes will have a high signal on the Y-‐axis while having a low value on the X-‐axis. Several genes were used as controls, it was found that both mitochondrial and chloroplast mRNA does not have poly(A) tails unless it is targeted for degredation (Schuster, Lisitsky, & Klaff, 1999; Slomovic, Laufer, Geiger, & Schuster, 2005) making these genes a suitable positive control for this experiment (red dots). The negative

(7)

controls (green dots), genes that are known to have a poly(A) tail, are plentiful (Yang et al., 2011) and a selection of these were used.

Figure 6 The measured intensity from both samples were plotted against each other. Showing several genes with a significantly higher intensity in the polyadenylated sample. Indicating a lacking poly(A) tail.

Table 2 shows an overview of the mRNA’s lacking a poly(A) tail, most of these were expected, as mentioned before genes located on the chloroplast are expected to lack a poly(A) tail (Schuster et al., 1999) . The same can be said for genes located on chromosome 2, these are transferred there from the mitochondrial DNA (mtDNA) (Huang, Ayliffe, & Timmis, 2003) .

0 2 4 6 8 10 12 14 16 0 5 10 15 Inte nsity polyade nylate d tr eatm ent

Intensity non polyadenylated treatment

Expression comparisson poly+ vs poly-‐ with

controls and outliers highlighted

Poly A+ and Poly A-‐ Expression Comparisson

Non Polyadenylated Controls Polyadenylated Controls Non Polyadenylated mRNA's Lineair A+ = A-‐

(8)

Table 2 Overview of the genes with a significant higher value for the poly(A)+ treatment then for the poly(A)-‐ treatment.

Conclusion

We have shown that it is possible to use specific probes for several species as long as these species are homologous (Figure 2). We have also shown that using rRNA removal as a sample preparation step for RNA-‐Seq, to ensure a low noise level, gives a more accurate profile of the genome. This is because mRNA with a small or without a poly(A) tail will not be present in the sample if a mRNA enrichment procedure is chosen. However, we also encountered several problems with the rRNA depletion method. Firstly we were unable to design probes for the 5S and 5.8S rRNA’s of A. thaliana, C. sativus and S. lycopersicum. This was to be expected due to their relatively small size. The second problem we encountered was when using both sets of probes the success of rRNA removal diminished, even after several attempts we were unable to remove the rRNA. When looking at the tapestation results we still saw clear rRNA bands returning in the ‘rRNA-‐depleted’ samples as well as some mRNA signals appearing in the ‘Captured rRNA’ samples. This shows that there is also some unintended aspecific capturing going on. Although we were only slightly succesfull in removing the rRNA from total RNA we still feel that rRNA removal, as a technique, is superior to mRNA enrichment. Using mRNA enrichment sample preparation will give a now proven bias in the data obtained.

(9)

References

Cui, P., Lin, Q., Ding, F., Xin, C., Gong, W., Zhang, L., . . . Yang, J. (2010). A comparison between ribo-‐minus RNA-‐sequencing and polyA-‐selected RNA-‐sequencing. Genomics,

96(5), 259-‐265.

Gundogdu, O., & Elmi, A. (2015). Microarray overview. Retrieved from

http://grf.lshtm.ac.uk/microarrayoverview.htm

Huang, C. Y., Ayliffe, M. A., & Timmis, J. N. (2003). Direct measurement of the transfer rate of chloroplast DNA into the nucleus. Nature, 422(6927), 72-‐76.

Maskos, U., & Southern, E. M. (1992). Oligonucleotide hybridizations on glass supports: A novel linker for oligonucleotide synthesis and hybridization properties of

oligonucleotides synthesised in situ. Nucleic Acids Research, 20(7), 1679-‐1684. Nature Publishing Group. (2015). Transcriptomics. Retrieved from

http://www.nature.com/subjects/transcriptomics#close

Schuster, G., Lisitsky, I., & Klaff, P. (1999). Polyadenylation and degradation of mRNA in the chloroplast. Plant Physiology, 120(4), 937-‐944.

Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., . . . Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature, 473(7347), 337-‐342.

Slomovic, S., Laufer, D., Geiger, D., & Schuster, G. (2005). Polyadenylation and degradation of human mitochondrial RNA: The prokaryotic past leaves its mark. Molecular and

Cellular Biology, 25(15), 6427-‐6435. doi:25/15/6427 [pii]

Szabo, D. T. (2014). Chapter 62 -‐ transcriptomic biomarkers in safety and risk assessment of chemicals. In R. C. Gupta (Ed.), Biomarkers in toxicology (pp. 1033-‐1038). Boston: Academic Press. doi:http://dx.doi.org/10.1016/B978-‐0-‐12-‐404630-‐6.00062-‐2 TAIR. (2015). The arabidopsis information resource. Retrieved from

http://www.arabidopsis.org/index.jsp

Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-‐seq: A revolutionary tool for transcriptomics. Nature Reviews. Genetics, 10(1), 57-‐63.

Wilhelm, B. T., & Landry, J. (2009). RNA-‐seq—quantitative measurement of expression through massively parallel RNA-‐sequencing. Methods, 48(3), 249-‐257.

Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G., & Chen, L. (2011). Genomewide characterization of non-‐polyadenylated RNAs. Genome Biol, 12(2), R16.

Finding messenger RNA without a poly(A) tail An argument for using rRNA depletion over mRNA enrichment for RNA-­‐Seq sample preparation