• No results found

Finding messenger RNA without a poly(A) tail An argument for using rRNA depletion over mRNA enrichment for RNA-­‐Seq sample preparation

N/A
N/A
Protected

Academic year: 2021

Share "Finding messenger RNA without a poly(A) tail An argument for using rRNA depletion over mRNA enrichment for RNA-­‐Seq sample preparation"

Copied!
9
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Finding  messenger  RNA  without  a  poly(A)  tail  

An   argument   for   using   rRNA   depletion   over   mRNA   enrichment   for   RNA-­‐Seq  

sample  preparation  

             

Abstract  

RNA-­‐Seq   is   a   new   technology   in   the   field   of   transcriptomics.   This   technology   has   several   benefits  over  existing  hybridization  techniques,  where  the  previous  hybridization  techniques   require   prior   knowledge   of   the   region   of   the   genome   to   be   tested   RNA-­‐Seq   does   not.   However   due   to   large   amount   of   ribosomal   RNA   present   in   total   RNA   it   is   necessary   to   prepare  the  samples  in  such  a  way  that  most  of  the  rRNA  is  removed  otherwise  most  of  the   reads  generated  by  RNA-­‐seq  are  from  the  same  rRNA.  Two  methods  exist  to  achieve  this;   mRNA   enrichment,   where     the   poly(A)   tails   of   mRNA   are   used   to   remove   and   enrich   the   mRNA  from  the  sample  and  rRNA  depletion  where  specific  probes  are  designed  to  target  and   remove  rRNA  from  the  total  RNA.  

In   the   following   experiment   we   tested   RiboNix,   a   rRNA   removal   method.   We   also   investigated  if  there  are  mRNA’s  present  in  Arabidopsis  thaliana  that  lack  a  poly(A)  tail.  It   was  found  that  A.  thaliana  indeed  has  several  mRNA’s  without  a  poly(A)  tail,  this  evidence   makes  rRNA  removal  as  a  sample  preparation  step  for  RNA-­‐Seq  the  preferred  choice  since   this  give  a  less  biased  view  of  the  genome.  

          Remy  Jorna   5998352  

Green  Student  Lab  

(2)

Introduction  

The  understanding  of  the  genome  has  been  a  hot  topic  in  biological  research  during  the  last   decades.   Transcriptomics   is   defined   by   Nature   as   a   specific   study   within   genomics   which   investigates   the   complete   set   of   RNA   transcripts   produced   by   the   genome   under   specific   circumstances   or   in   a   specific   cell   (Nature   Publishing   Group,   2015).   The   knowledge   of   the   transcriptome  can  lead  to  the  identification  of  genes  that  are  being  expressed  under  specific   conditions  or  in  certain  cells.  To  be  able  to  identify  these  conditions  can  lead  to  more  insight   in   the   mechanism   of   stem   and   cancer   cells.   Furthermore,   transcriptomics   is   used   to   understand  the  molecular  reactions  underlying  embryonic  development  and  could  therefore   eventually   be   a   rich   resource   in   making   embryo   selection   in   in   vitro   fertilization   (Schwanhausser  et  al.,  2011).  On  top  of  that,  the  knowledge  of  all  stages  of  gene  expression   can   help   finding   biomarkers   to   use   in   the   risk   assessment   of   specific   compounds   (Szabo,   2014).      

Several  technologies  have  been  developed  to  be  able  to  quantify  the  transcriptome,  such  as   microarrays   and   sequence-­‐based   methods     (Wang,   Gerstein,   &   Snyder,   2009)       The   microarray  approach,  also  known  as  hybridization-­‐based  method,  is  based  on  the  technology   of   Southern   blotting,   in   which   fragments   of   DNA   are   attached   to   a   substrate   and   then   probed   with   a   known   DNA   sequence     (Maskos   &   Southern,   1992)   .   The   same   principle   is   applied   in   microarrays.   Fluorescently   labelled   messenger   RNA   (mRNA)   is   incubated   with   customized  microarrays  and  after  scanning  the  fragments  of  mRNA  present  in  the  original   sample  can  be  evaluated  (Wang  et  al.,  2009).  This  method  is  not  very  time  consuming,  it  is   inexpensive   and   the   equipment   necessary   for   this   assay   is   often   already   available   in   the   laboratory     (Gundogdu   &   Elmi,   2015)   .   However,   as   is   applicable   for   most   assays,   this   approach   also   has   a   few   limitations   such   as   the   dependence   on   current   knowledge   about   genome   sequence,   as   the   used   probe   determines   which   mRNA   sequences   will   be   found   (Wang   et   al.,   2009).     Furthermore,   the   background   levels   of   cross-­‐hybridization   might   be   high,  saturation  of  the  probes  might  occur  and  therefore  there  is  a  limited  range  of  dynamic   detection  (Wang  et  al.,  2009).    

The   mentioned   limitations   of   the   hybridization-­‐based   method   are   not   applicable   to   the   sequence-­‐based  method.  In  this  approach,  no  prior  knowledge  of  the  sequence  is  needed   because  no  probe  is  used  for  sequencing.  However  due  to  the  fact  that  the  vast  majority  of   RNA  (>90%)  consists  of  ribosomal  RNA  (rRNA)  the  vast  majority  of  reads  are  from  the  same   ribosomal  RNA  if  no  sample  preparation  is  performed    (Wilhelm  &  Landry,  2009)  .  Two  of   these  preparation  methods  are:  mRNA  enrichment  and  rRNA  depletion.  In  the  first  method,   mRNA   enrichment,   the   poly(A)   tails   of   the   mRNA   are   targeted   with   biotinylated   poly   d(T)   probes  and  then  removed  using  magnetism.  The  benefits  of  this  procedure  is  that  the  same   probes  can  be  used  for  a  wide  range  of  organisms,  the  only  requirement  is  that  they  have  a   poly(A)  tail.  This  can  also  be  a  problem  as  not  all  mRNA  have  a  poly(A)  tail  and  some  are   even  found  to  be  bimorphic  (Yang,  Duff,  Graveley,  Carmichael,  &  Chen,  2011)    This  in  turn   can  lead  to  a  loss  of  information    (Cui  et  al.,  2010)  .  In  the  second  method,  rRNA  depletion,   specifically  designed  probes  are  used  to  target  the  rRNA,  which  is  then  removed  from  the   total  RNA  by  magnetism.  This  approach  is  more  costly  due  to  the  fact  that  the  probes  have  

(3)

to  be  specifically  designed  for  each  species  tested,  this  also  makes  it  a  less  flexible  approach   as  basic  knowledge  of  the  genome  is  required  in  order  to  design  the  probes.  

The  mentioned  issue  associated  with  the  mRNA  enrichment  method  in  the  sequence-­‐based   approach   is   being   examined   in   this   study.   It   is   studied   whether   all   mRNA   is   actually   polyadenylated  using  the  microarray  method.  This  method  is  used  by  creating  two  identical   rRNA   depleted   fractions,   the   first   fraction   will   be   treated   with   poly-­‐A-­‐polymerase,   adding   poly(A)   tails   to   all   the   present   mRNA.   The   second   fraction   will   not   receive   this   enzyme   possibly   leaving   several   mRNA’s   without   a   poly(A)   tail.   During   the   following   IVT   reaction   these  poly(A)   tail   lacking  mRNA’s  will   not   be   labelled  or  amplified.  When  results  from  the   microarray   are   compared   we   suspect   to   find   genes   present   on   the   poly-­‐a-­‐polymerase   treated  fraction  that  are  absent  on  the  fraction  not  treated  with  this  enzyme.  The  possibility   of  designing  species  aspecific  probes  for  the  rRNA  depletion  method  is  investigated.    

 

Materials  and  Methods  

 

Probe  Design:  First  the  species  were  checked  for  their  homology,  this  was  done  by  using  the  

annotated  rRNA  sequences  of  the  large  subunit  (25S,  5.8S  and  5S),  the  small  subunit  (18S)   and   of   the   chloroplast   (23S   and   16S)     of   Arabidopsis   thaliana   found   on   the   TAIR   website   (TAIR,  2015).  Which  were  then  BLASTed  against  the  entire  genomes  of  both  Cucumis  sativus   and   Solanum   lycopersicum.   The   obtained   sequences   were   then   aligned   using   CLC-­‐ Workbench.   On   the   aligned   sequences   of   both   the   cholorplast   small-­‐   and   large   subunits   several  probes  were  designed,  aiming  for  a  good  spread  of  their  target  locations.    

RNA   Isolation:   Total   RNA   from   the   samples   of   Solanum   lycopersicum,   Cucumis   sativus   and   Arabidopsis   thaliana   was     isolated   using   the   RNeasy   Kit   from   Qiagen   and   stored   at   -­‐40°   C  

until  use  in  the  rRNA  depletion.  

rRNA  depletion:  RiboNix  was  performed  on  1  µg  of  total  RNA  for  all  three  chosen  species.  

RiboNix  consists  of  several  steps.  In  the  first  step,  the  hybridization  step,    the  biotinylated   probes  are  hybridized  with  their  target  rRNA.  The  probe-­‐total  RNA  mix  is  incubated  at  70°  C   for  5  minutes  and  are  then  cooled  down  with  0.02°  C/s  to  37°  C.  After  the  hybridization  step   the   capture   step   follows,   during   this   step   the   biotin   of   the   probes   are   bound   to   the   streptavidin   coated   on   the   magnetic   beads.   Finally   in   the   depletion   step,   the   rRNA-­‐probe   complexes  are  separated  from  the  total  RNA  using  magnetism.    

Polyadenylation:   The   rRNA   depleted   samples   of   Arabidopsis   thaliana   were   then  

polyadenylated,   half   of   the   samples   were   treated   with   Poly(A)polymerase   ensuring   the   presence  of  a  poly(A)  tail  and  half  of  the  samples  were  not  treated  with  Poly(A)polymerase.   This  polyadenylation  step  results  in  two  samples,  a  poly(A)+  sample  and  poly(A)-­‐  sample.      

IVT:  An  in  vitro  transcription  was  performed  on  both  the  poly(A)+  and  poly(A)-­‐  sample.    This  

was  done  to  create  cDNA  and  to  fluorescently  label  the  samples  to  be  able  to  detect  and   quantify  the  samples  on  the  Microarray.  Only  poly  d(T)  primers  were  used.  This  insured  that  

(4)

mRNA’s  lacking  a  poly  (A)  tail  would  not  be  amplificated  and  labelled  and  therefor  should   not  be  found  on  the  Micro  Array.  

Microarray:   Both   of   the   IVT   products   are   analyzed   by   hybridizing   the   products   on   two  

seperate  GeneChip®  Arabidopsis  ATH1  Genome  Arrays  from  affymetrix.  

Results  and  Discussion  

Probe  Design:  The  homology  of  rRNA  in  A.  thaliana,  C.  sativus  and  S.  lycopersicum  is  high,  as  

can  be  seen  from  the  aligned  sequences  of  the  18S  rRNA  (17S  for  S.  lycopersicum)  in  figure  1.    

 

Figure  1  aligned  18S  sequences  of  A.  thaliana,  C.  sativus  and  S.  lycopersicum.  

We  were  unable  to  design  probes  for  the  smaller  5.8S  and  5S  rRNA’s.  This  was  due  to  their   small  size,  156  nt  and  121  nt  respectively  (TAIR,  2015),  with  such  a  limited  size  the  chance  to   find  a  suitable  probe  of  an  appropiate  length  is  quite  small.  Therefore  it  came  as  no  surprise   that  we  were  unable  to  find  a  good  probe  for  the  these  rRNA’s.  For  the  remaining  4  rRNA’s  9   probes  were  designed  (table  1)  ensuring  a  decent  spread  in  the  position  of  the  probe  on  the   target  rRNA.  

Table  1  Overview  of  the  probes  designed  with  CLC-­‐Workbench  on  the  aligned  sequences  of  A.  thaliana,  C.  

sativus  and  S.  lycopersicum.  

 

 

RiboNix,   rRNA   depletion:   The   results   from   the   rRNA   depletion   are   shown   in   figures   2-­‐5.  

Figure  2  shows  the  tapestation  results  for  all  three  species,  for  this  particular  run  only  the  

Set Probe Sequence Target  

rRNA Probe   lenght  (nt) Target   length  (nt) Position  of   probe 1 1 5`  CTGTCCCTGTTAATCATTACTCC  3'   18S  (17S) 23 1902 938-­‐960 1 2 5`  CAAATCGCTCCACCAACTAAGAA  3` 18S  (17S) 23 1902 1370-­‐1392 1 3 5'  GTTTCTTTTCCTCCGCTTATTG  3' 25S 22 4310 892-­‐913 1 4 5'  CTTCCCTTGCCTACATTGTTCC  3' 25S 22 4310 2740-­‐2761 1 5 5'  CCACTCTGCCACTTACAATACC  3' 25S 22 4310 4186-­‐4207 2 6 5'  CTTTTGCTTTCTTTTCCTCTGGCTACT  3' 23S 27 2850 189-­‐215 2 7 5'  TTTCACCCCTAACCACAACTCATCC  3' 23S 25 2850 771-­‐795 2 8 5'  TTTCCAGCTGTTGTTCCCCTCCC  3' 16S 23 1514 133-­‐155 2 9 5'  GTGCTTTCGCCGTTGGTGTTCTT  3' 16S 23 1514 699-­‐722

(5)

first  set  of  probes  were  used  (see  table  1).  As  we  expected  we  only  saw  the  depletion  of  25S   rRNA  and  18S  rRNA  which  were  the  target  sequences  of  the  used  probes.      

 

 

Figure  2  Tapestation  results  from  the  rRNA  depletion  method  where  only  the  first  probe  set  was  used  (see   table  1)  

Figure   3   shows   the   results   from   the   rRNA   depletion   using   both   sets  of  probes  on  A.  thaliana.  We  expected  to  see  a  depletion  of   all  the  major  rRNA’s.  However  we  only  saw  a  slight  depletion  of   the   23S   and   16S   rRNA.   The   results   are   also   shown   in   an   electropherogram  (Figure  4).  Note  that  the  concentrations  of  the   sampled  RNA  fractions  (total  RNA  –  rRNA  and  mRNA)  are  not  the   same,  this  means  that  only  the  relative  peak  height  can  be  used   to   show   any   succes   of   depletion.   The   electropherogram   shows   that  quite  some  of  25S  and  18S  rRNA  is  being  depleted,  however   only   very   little   of   23S   and   16S   rRNA   can   be   found   in   the   rRNA   fraction,   whereas   a   relatively   high   amount   of   these   rRNA’s   can   still   be   found   in   ‘rRNA   depleted   fraction’.   This   indicates   a   low   removal   rate   of   these   rRNA’s.   Even   after   several   other   rRNA   removal  attemps  we  were    unable  to  deplete  these  rRNA’s.  

 

Figure  3  Gel  /  tapestation  results  from  the  rRNA  removal  procedure  performed  on  A.  thaliana,  using  both   sets  of  probes.  

(6)

 

Figure   4   Electropherogram   of   the   rRNA   depletion   on   A.   thaliana   showing   total   RNA   (blue),   rRNA   depleted   RNA  (red)  and  the  captured  rRNA  fraction  (green).  

IVT  and  polyadenylation:  The  rRNA  depleted  samples  of  Arabidopsis  thaliana  were  used  for  

the  final  part  of  the  experiment.  Figure  6  shows  the  electropherogram  of  the  IVT  product  of   both  the  polyadenylated  fraction  (poly(A)+)  and  the  nonpolyadenylated  fraction  (poly(A)-­‐).      

 

Figure  5  Electropherogram  of  the  IVT  products  for  both  the  poly(A)+  fraction  (blue)  and  the  poly(A)-­‐  fraction   (red).  

Microarray:  Figure  6  shows  the  Micro  Array  results.  Every  dot  represents  a  gene  on  the  array  

where  the  value  on  the  X-­‐axis  is  the  found  intensity  for  the  poly(A)-­‐  fraction  and  the  value  on   the  Y-­‐axis  that  for  the  poly(A)+  fraction.  Showing  the  results  in  this  manner  gives  a  clear  view   of  genes  that  lack  a  poly(A)  tail,  these  genes  will  have  a  high  signal  on  the  Y-­‐axis  while  having   a   low   value   on   the   X-­‐axis.   Several   genes   were   used   as   controls,   it   was   found   that   both   mitochondrial   and   chloroplast   mRNA   does   not   have   poly(A)   tails   unless   it   is   targeted   for   degredation    (Schuster,  Lisitsky,  &  Klaff,  1999;  Slomovic,  Laufer,  Geiger,  &  Schuster,  2005)     making  these  genes  a  suitable  positive  control  for  this  experiment  (red  dots).  The  negative  

(7)

controls  (green  dots),  genes  that  are  known  to  have  a  poly(A)  tail,  are  plentiful    (Yang  et  al.,   2011)    and  a  selection  of  these  were  used.    

 

Figure  6  The  measured  intensity  from  both  samples  were  plotted  against  each  other.  Showing  several  genes   with  a  significantly  higher  intensity  in  the  polyadenylated  sample.  Indicating  a  lacking  poly(A)  tail.  

Table   2   shows   an   overview   of   the   mRNA’s   lacking   a   poly(A)   tail,   most   of   these   were   expected,   as   mentioned  before  genes  located  on  the  chloroplast  are  expected  to  lack  a  poly(A)  tail    (Schuster  et   al.,  1999)  .  The  same  can  be  said  for  genes  located  on  chromosome  2,  these  are  transferred  there   from  the  mitochondrial  DNA  (mtDNA)    (Huang,  Ayliffe,  &  Timmis,  2003)  .    

0   2   4   6   8   10   12   14   16   0   5   10   15   Inte nsity   polyade nylate d   tr eatm ent  

Intensity  non  polyadenylated  treatment  

Expression  comparisson  poly+  vs  poly-­‐  with  

controls  and  outliers  highlighted  

Poly  A+  and  Poly  A-­‐  Expression   Comparisson  

Non  Polyadenylated  Controls   Polyadenylated  Controls   Non  Polyadenylated  mRNA's   Lineair  A+  =  A-­‐  

(8)

Table  2  Overview  of  the  genes  with  a  significant  higher  value  for  the  poly(A)+  treatment  then  for  the  poly(A)-­‐   treatment.  

 

Conclusion  

We  have  shown  that  it  is  possible  to  use  specific  probes  for  several  species  as  long  as  these   species   are   homologous   (Figure   2).   We   have   also   shown   that   using   rRNA   removal   as   a   sample   preparation   step   for   RNA-­‐Seq,   to   ensure   a   low   noise   level,   gives   a   more   accurate   profile  of  the  genome.  This  is  because  mRNA  with  a  small  or  without  a  poly(A)  tail  will  not  be   present   in   the   sample   if   a   mRNA   enrichment   procedure   is   chosen.   However,   we   also   encountered  several  problems  with  the  rRNA  depletion  method.  Firstly  we  were  unable  to   design  probes  for  the  5S  and  5.8S  rRNA’s  of  A.  thaliana,  C.  sativus  and  S.  lycopersicum.  This   was  to  be  expected  due  to  their  relatively  small  size.  The  second  problem  we  encountered   was   when   using   both   sets   of   probes   the   success   of   rRNA   removal   diminished,   even   after   several   attempts   we   were   unable   to   remove   the   rRNA.   When   looking   at   the   tapestation   results   we   still   saw   clear   rRNA   bands   returning   in   the   ‘rRNA-­‐depleted’   samples   as   well   as   some  mRNA  signals  appearing  in  the  ‘Captured  rRNA’  samples.  This  shows  that  there  is  also   some  unintended  aspecific  capturing  going  on.    Although  we  were  only  slightly  succesfull  in   removing   the   rRNA   from   total   RNA   we   still   feel   that   rRNA   removal,   as   a   technique,   is   superior  to  mRNA  enrichment.  Using  mRNA  enrichment  sample  preparation  will  give  a  now   proven  bias  in  the  data  obtained.  

       

(9)

References  

 

Cui,  P.,  Lin,  Q.,  Ding,  F.,  Xin,  C.,  Gong,  W.,  Zhang,  L.,  .  .  .  Yang,  J.  (2010).  A  comparison   between  ribo-­‐minus  RNA-­‐sequencing  and  polyA-­‐selected  RNA-­‐sequencing.  Genomics,  

96(5),  259-­‐265.  

Gundogdu,  O.,  &  Elmi,  A.  (2015).  Microarray  overview.  Retrieved  from  

http://grf.lshtm.ac.uk/microarrayoverview.htm  

Huang,  C.  Y.,  Ayliffe,  M.  A.,  &  Timmis,  J.  N.  (2003).  Direct  measurement  of  the  transfer  rate   of  chloroplast  DNA  into  the  nucleus.  Nature,  422(6927),  72-­‐76.  

Maskos,  U.,  &  Southern,  E.  M.  (1992).  Oligonucleotide  hybridizations  on  glass  supports:  A   novel  linker  for  oligonucleotide  synthesis  and  hybridization  properties  of  

oligonucleotides  synthesised  in  situ.  Nucleic  Acids  Research,  20(7),  1679-­‐1684.   Nature  Publishing  Group.  (2015).  Transcriptomics.  Retrieved  from  

http://www.nature.com/subjects/transcriptomics#close  

Schuster,  G.,  Lisitsky,  I.,  &  Klaff,  P.  (1999).  Polyadenylation  and  degradation  of  mRNA  in  the   chloroplast.  Plant  Physiology,  120(4),  937-­‐944.  

Schwanhausser,  B.,  Busse,  D.,  Li,  N.,  Dittmar,  G.,  Schuchhardt,  J.,  Wolf,  J.,  .  .  .  Selbach,  M.   (2011).  Global  quantification  of  mammalian  gene  expression  control.  Nature,  473(7347),   337-­‐342.  

Slomovic,  S.,  Laufer,  D.,  Geiger,  D.,  &  Schuster,  G.  (2005).  Polyadenylation  and  degradation   of  human  mitochondrial  RNA:  The  prokaryotic  past  leaves  its  mark.  Molecular  and  

Cellular  Biology,  25(15),  6427-­‐6435.  doi:25/15/6427  [pii]  

Szabo,  D.  T.  (2014).  Chapter  62  -­‐  transcriptomic  biomarkers  in  safety  and  risk  assessment  of   chemicals.  In  R.  C.  Gupta  (Ed.),  Biomarkers  in  toxicology  (pp.  1033-­‐1038).  Boston:   Academic  Press.  doi:http://dx.doi.org/10.1016/B978-­‐0-­‐12-­‐404630-­‐6.00062-­‐2   TAIR.  (2015).  The  arabidopsis  information  resource.  Retrieved  from  

http://www.arabidopsis.org/index.jsp  

Wang,  Z.,  Gerstein,  M.,  &  Snyder,  M.  (2009).  RNA-­‐seq:  A  revolutionary  tool  for   transcriptomics.  Nature  Reviews.  Genetics,  10(1),  57-­‐63.  

Wilhelm,  B.  T.,  &  Landry,  J.  (2009).  RNA-­‐seq—quantitative  measurement  of  expression   through  massively  parallel  RNA-­‐sequencing.  Methods,  48(3),  249-­‐257.  

Yang,  L.,  Duff,  M.  O.,  Graveley,  B.  R.,  Carmichael,  G.  G.,  &  Chen,  L.  (2011).  Genomewide   characterization  of  non-­‐polyadenylated  RNAs.  Genome  Biol,  12(2),  R16.

 

Referenties

GERELATEERDE DOCUMENTEN

Nieuwe methoden als de sumwing en het outriggen leiden niet alleen tot brandstof - besparing maar zijn ook minder schadelijk voor het bodemleven.... 10 < syscope > 11

All EA stakeholders to acknowledge and understand the human role in integration of organisational business, information management and technology support; Humans providing

Als eerste proeffactor werden twee substraten gebruikt om verschil- len in water/lucht-verhoudingen en zodoende zuurstofbeschikbaarheid aan te leggen; perliet in twee

Indeed, RNA secondary structure analysis of the SARS-CoV genomic 3' UTR identified a hairpin structure that overlaps with a pseudoknot (Fig. 2-2) and is similar to the structures

For further characterisation through RNA-seq, these stem cells were isolated by FACS from transgenic hairless mice bearing an EGFP-Ires-CreERT2 reporter cassette inserted into exon

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

In this study, we used RNA-sequencing (RNA-Seq) to analyze the following human in vitro liver cell models in comparison to human liver tissue: cancer-derived cell lines (HepG2,

-Soort aminozuur - Aantal aminozuren - Volgorde aminozuren Codon: groepje nucleotiden dat. codeert voor