• No results found

Chromatin Immunoprecipitation and High throughput sequencing (ChIP-Seq): Tips and tricks regarding the laboratory protocol and initial downstream data analysis

N/A
N/A
Protected

Academic year: 2021

Share "Chromatin Immunoprecipitation and High throughput sequencing (ChIP-Seq): Tips and tricks regarding the laboratory protocol and initial downstream data analysis"

Copied!
30
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

Chromatin Immunoprecipitation and High throughput sequencing (ChIP-Seq) Patten, Darren K. ; Corleone, Giacomo ; Magnani, Luca

Published in: Epigenome Editing

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Final author's version (accepted by publisher, after peer review)

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Patten, D. K., Corleone, G., & Magnani, L. (2018). Chromatin Immunoprecipitation and High throughput sequencing (ChIP-Seq): Tips and tricks regarding the laboratory protocol and initial downstream data analysis. In M. G. Rots, & A. Jeltsch (Eds.), Epigenome Editing: Methods and Protocols (pp. 271-288). (Methods in Molecular Biology; Vol. 1767). Springer.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Chromatin Immunoprecipitation and High throughput

sequencing (ChIP-Seq): Tips and tricks regarding the

laboratory protocol and initial downstream data analysis

Darren K. Patten

1,2

, Giacomo Corleone

1

and Luca Magnani

1

1 Department of Surgery and Cancer, Imperial College London, Du Cane Road,

London, W12 0NN, U.K

2 Department of Emergency General Surgery, Homerton University Hospital,

Homerton Row, E9 6SR, U.K.

E-mail: l.magnani@imperial.ac.uk; darren.patten@imperial.ac.uk

Abstract

Chromatin immunoprecipitation coupled with high throughput sequencing (ChIP-Seq) has become an essential tool for epigenetic scientists. ChIP-Seq is used to map protein-DNA interactions and epigenetic marks such as histone modifications at the genome wide level. Here we describe a complete ChIP-seq laboratory protocol (tailored towards processing tissue samples as well as cell lines), and the bioinformatic pipelines utilised for handling raw sequencing files through to peak calling.

(3)

Keywords Chromatin immunoprecipitation and high throughput sequencing,

ChIP-Seq, Antibodies, DNA library assembly, ChIP-Seq data processing, Bioinformatics, Bioinformatic pipelines, Genome alignment, Peak calling

Running title: The ChIP-Seq laboratory protocol

1 Introduction

Mapping of genome-wide protein-DNA interactions is an extremely powerful tool to provide insights into the process of transcriptional regulation in cells. Obtaining the binding sites for transcription factors (TFs), along with core transcriptional factors/co-factors and other DNA-binding interactions, enables the detection and deciphering of the gene regulatory machinery required to regulate important biological processes. In parallel, mapping histones is an essential step in the annotation of the genome into functional domains (chromatin states), including gene bodies, promoters and enhancers. Chromatin states can affect transcription by either altering the packaging of DNA to allow or prevent access for proteins to bind DNA, or change the nucleosome surface to facilitate or inhibit the recruitment of effector protein complexes [1]. It has been suggested that the interplay between chromatin and transcription is a dynamic process and is more complicated that once postulated [2]. Profiling differences in epigenomes (in the cell type of interest) at different time points or conditions, using ChIP-Seq, also provides key information about developmental processes and pathological [3].

(4)

ChIP is a form of immunoprecipitation technique, which is used to study the interplay between proteins and DNA in cells. ChIP aims to identify the association of proteins with genomic loci. ChIP-Seq is directly derived from ChIP-chip [4-8] (i.e. DNA

hybridisation to microarray) whereby the former has been coupled with sequencing of the enriched DNA fragments [9-11]. ChIP-Seq allows biologists to sequence myriads of small fragments of DNA in a single sequencing run, allowing for large-scale experiments to be conducted [1]. The ultimate goal of ChIP-Seq is to completely map genome-wide enriched loci (i.e. TF binding sites, histone

modifications and nucleosome positioning and other protein-DNA interactions) with maximal signal-to-noise ratio [12]. In this Chapter we describe firstly the laboratory protocol for the cross-linked version of ChIP (i.e. the use of formaldehyde for

crosslinking proteins to DNA) which can be applied to tissue and cell lines. Note the cross-linking is not performed in native ChIP often used to study the distribution of histone PTMs. Secondly, we describe how to process ChIP-Seq data for peak calling and downstream analysis.

2 Materials

All solutions should be prepared with ultrapure water (e.g., MilliQ® integral water

purification system, for ultrapure water; 18.2 MΩ.cm at 25 oC) and high purity

analytical grade reagents. All reagents and prepared solutions must be kept at room temperature unless stated otherwise. Good laboratory practice and institute-guided methods of waste disposal of reagents must be adhered to at all times. The ChIP-Seq laboratory protocol described in this Chapter has been adapted from Schmidt et al. [13].

(5)

2.1 Reagents required for ChIP-Seq

Table 1 highlights the composition and storage temperatures of the reagents

required to conduct the ChIP aspect of the ChIP-Seq experiment. Before starting the ChIP-Seq experiment, it is advisable to prepare all required reagents in advance due to the fact that some reagents must be stored and kept at 4 oC prior to usage (Table

1).

In addition, the following equipment and reagents are also required: 1. Calibrated pipettes (see Note 1).

2. Sterile clear (frost-free) microcentrifuge tubes (see Note 2).

3. Magnetic rack for 1.5 mL microcentrifuge tubes. 4. Phase-lock microcentrifuge tubes.

5. Sterile scalpel for tissue processing. 6. Sterile petri-dishes (10 cm2).

7. Horizontal electrophoresis system for DNA electrophoresis (i.e. including combs and power pack).

8. Electrophoresis loading dye (e.g., 6X) 9. GelRed™

10. TAE buffer (1X).

11. TE buffer (1X), pH 8.0. 12. Triton-X (10 %).

13. 100 base pair (bp) DNA ladder (marker).

14. Sonicator that is either stored in a 4 oC cold room or one that has an inbuilt

(6)

Bioruptor® Pico). If using a bench top sonicator, please ensure the correct type of tubes are used in line with manufacturer instructions. For example, using frosted tubes for sonicating DNA will prevent ultrasonic waves reaching the DNA and results in highly inefficient sonication.

15. Dry ice.

16. SYBR Green™ 17. Nuclease-free water.

18. Climate controlled microcentrifuge kept at 4 oC.

19. Molecular grade ethanol. Store at -20 oC.

20. RNase A (1 mg/mL). Store at -20 oC.

21. Proteinase K (20 mg/mL). Store at -20 oC.

22. Protease inhibitor cocktail (EDTA-free) (1X) dissolved in ultrapure water. 23. Phenol-chloroform. Store at 4 oC.

24. Sodium Chloride (5 M).

25. Liquid nitrogen + Dewar flask for safe transportation of liquid nitrogen. 26. DNA quantification kits (i.e., Qubit™ and/or Quant-iT™Picogreen™). 27. Magnetic beads (e.g., Dynabeads™). Please check the ligand type (i.e.

Protein A or Protein G) associated with the Dynabeads™ to ensure maximal crosslinking with the ChIP antibody of interest. (Dynabeads™ should be stored at 4 oC).

28. Phosphate buffered saline (PBS)/Bovine serum albumin (BSA) solution (5 mg/mL); i.e., dissolve 250 mg of BSA into 50 mL of cold PBS by vortexing the mixture until clear. This solution should be stored at 4 oC (see Note 3).

29. Antibodies used for a ChIP-Seq experiment should ideally be termed “ChIP grade” by the manufacturer and validated for purpose. It is advisable to divide

(7)

the stock antibody into 4 µg or 8 µg aliquots and store at manufacturer

recommended temperatures to prevent freezing/thawing of the stock antibody. 30. Library kit for preparation of ChIP samples for ChIP-Seq for various

sequencers. For example, the NEBNext® ultra™/ultra II™ kits are used for the Illumina® sequencer (see Note 4).

Table 1 Relative compositions and storage temperatures of reagents used for ChIP-Seq

3 Methods

Before tissue samples and cells are to be used for ChIP, it is good practice to thoroughly clean the laboratory bench and associated equipment with 70 % ethanol

Reagents Composition/Concentrations Storage

Solution A + Formaldehyde

50 mM Hepes-KOH, 100mM NaCl, 1mM EDTA, 0.5mM EGTA, 1% formaldehyde Room temperature Glycine 1M 4 oC Lysis Buffer 1 50 mM Hepes-KOH, pH 7.5; 140 mM NaCl; 1mM EDTA; 10%

Glycerol; 0.5% NP-40 or Igepal CA-630; 0.25% Triton X-100

4 oC

Lysis Buffer 2 10 mM Tris-HCL, pH8.0; 200 mM NaCl; 1 mM EDTA;

0.5 mM EGTA 4 oC

Lysis Buffer 3

10 mM Tris-HCl, pH 8; 100 mM NaCl; 1 mM EDTA; 0.5 mM EGTA; 0.1% Na-Deoxycholate; 0.5% N- lauroylsarcosine 4 oC RIPA buffer 50 mM Hepes-KOH, pH 7.5; 500 mM LiCl; 1 mM EDTA; 1% NP-40

or Igepal CA-630; 0.7% Na- Deoxycholate

Room temperature

De-crosslinking

buffer 1% SDS, 0.1M NaHCO3

Room temperature

(8)

to ensure sterility. Some biologists use DNAZap™ or RNAseZap™ solutions for cleaning laboratory equipment prior to ChIP which is encouraged after an initial clean with 70 % ethanol. Furthermore, aseptic techniques should be applied throughout the course of the experiment. All steps should be performed over ice unless stated.

3.1 Incubation of antibody with magnetic beads

This step is designed to pair magnetic beads used in immunoprecipitation with the antibody of choice. Antibody-bead concentration should be optimized empirically for each experiment. The amount indicated in this protocol works well with histone modifications, but might be underperforming for some transcription factors.

1. The Dynabeads™ should be vortexed (at least 30 s) when initially taken from the fridge to allow the beads to homogenise within solution.

2. 50 µL of Dynabeads™ are used per ChIP experiment. The beads are placed into a 1.5 mL microcentrifuge tube and 1 mL of PBS/BSA is added and the contents pipetted up and down to ensure thorough washing of the beads. The tube is then placed on a magnetic rack and the beads are allowed to collect to one side of the tube. The supernatant is then discarded and the process is repeated two more times. 3. The beads are then suspended in 150 µL PBS/BSA (i.e. 150 µL of PBS/BSA per 50 µL of magnetic beads) and the tube placed in ice to keep the suspension cool. 4. Antibody is added at 4 µg-8 µg per 50 µL of beads (see Note 5).

5. Once the antibody is added to the suspension of magnetic beads, seal the microcentrifuge tube with Parafilm™ and place on a rotating platform for at least 6 hours.

(9)

3.2 Preparation of tissue for ChIP

This step allows for direct processing of tissues including freshly collected material such as surgical or diagnostic biopsies.

1. Determine the amount of Solution A required by calculating 1 mL per sample and add an extra 1 mL to account for volume loss during pipetting. To the total amount of Solution A (Table 1), add formaldehyde to make 1% final formaldehyde

concentration by volume. The mixture is then quickly vortexed and placed for at least 15 minutes in a water bath at 37 oC.

2. Tissue samples should ideally be snap-frozen (see Note 6) and stored

immediately at -80 oC, at the time of tissue collection, to prevent DNA degradation

which will inevitably affect any downstream experiments. The tissue sample should be placed onto a sterile petri-dish which is positioned over dry ice to maintain the tissue at sub-zero temperatures. Using a disposable scalpel macroscopic adipose tissue, which is yellow in colour, should be carefully dissected from the tissue and disposed appropriately. The tissue is then finely cut into small shavings and

transferred to a chilled, sterile 1.5 mL microcentrifuge tube which is placed in dry ice. 3. Add the warmed Solution A in 1 % formaldehyde to the 1.5 mL microcentrifuge tube making the volume reach 1mL. The tubes are then vortexed every 5 minutes for 30 seconds for a total duration of 20 minutes. Cold 1 M glycine is added to the fixed tissue at 1/10 the volume (i.e., 100 µL) and the sample is incubated at 4 oC for 10

minutes. This process quenches the fixation reaction. After 10 minutes, the sample is vortexed and centrifuged at 4 oC at a speed of 2000 x g for 5 minutes and the

(10)

4. The tissue fragments are then placed into a ceramic mortar (pre-chilled over dry ice). Liquid nitrogen is then carefully poured (approximately 30 to 40 mL) over the tumour fragments and the tissue homogenised using a pestle until a fine powder consistency is obtained. The homogenised powder is then placed into a new sterile 1.5 mL microcentrifuge and placed in ice until required.

3.3 Preparation of cell lines for ChIP

1. Cell lines are allowed to reach 75-85 % confluency on a 15 cm2 petri-dish before

harvesting for ChIP which equates to approximately 8-10 x 106 cells.

2. Solution A with 1 % formaldehyde should be pre-warmed (37 oC) as explained

above. For each 15 cm2 petri-dish, 10 mL of warmed Solution A with 1 %

formaldehyde is required. Again, calculate an extra 10 mL in addition to the total volume of Solution A with 1 % formaldehyde required for the experiment. Carefully aspirate the culture media and then add 10 mL of Solution A with 1 % formaldehyde to each petri-dish containing cells (see Note 7). Place the petri-dish into a sterile

incubator at 37 oC for 10 minutes. After this, add 1/10 the volume of 1 M cold glycine

(i.e. 1 mL) to the petri-dish. Add the Glycine to the side of the petri-dish so as not to disrupt the adherent fixed cells. Place the petri-dish in a 4 oC fridge for 10 minutes.

3. The supernatant is then carefully discarded and the dish is washed with cold autoclaved PBS 3 times; ensure that the PBS covers the dish. Take precaution when performing the washing steps by adding the cold PBS to the side of the dish and not directly to the cells. Discard the supernatant and perform this step another two times. 4. 500 µL of PBS + (1X) protease inhibitor cocktail is then added to the dish and the cells are scraped until all the cell content has been collected to one side of the dish.

(11)

Using a pipette, aspirate the cells and transfer to a sterile 1.5 mL microcentrifuge tube. The latter is then microcentrifuge d at 2000 x g at 4 oC for 5 minutes. The

supernatant is then discarded and the tube placed in ice to keep cool.

3.4 ChIP

Before proceeding to the lysis stage, it is advised that 1 mL of lysis buffer is prepared for each sample. Again, include an extra 1 mL to account for volume losses. In

addition, add 1X protease inhibitor cocktail to all three aliquots of lysis buffers (LB1/LB2/LB3) (Table 1) and vortex for 30 s and place in ice until required.

1. The 1.5 mL microcentrifuge tube containing the fixed cells is taken from the ice box and lysis buffer 1 (LB1) (Table 1.) is added up to 1 mL. The tube is then

vortexed for 30 s and then placed on a rotating platform for 10 minutes. The sample is then microcentrifuged (2000 x g) at 4 oC for 5 minutes and the supernatant

discarded.

2. Lysis buffer 2 is then added up to 1 mL and the tube vortexed and subsequently placed on a rotating platform for 10 minutes. The sample is then microcentrifuged (2000 xg) at 4 oC for 5 minutes and the supernatant discarded.

3. If the cell pellet reaches more then 1/3 the volume of the 1.5 mL tube, split the samples in half and add lysis buffer 3 ensuring that the volume in each tube reaches 300 µL. If using the Diagenode Bioruptor® Pico machine, ensure that the

manufacturer supplied tubes are used and not the standard 1.5 mL frost-free microcentrifuge tubes.

(12)

4. Tumour tissue is usually sonicated for a minimum of 20 cycles (30 s on and 30 s off) whereas cell lines are sonicated for 12-15 cycles (30 s on and 30 s off) at high frequency settings.

5. The tube is then briefly microcentrifuged and 30 µL of 10% Triton-X is added. The suspension is mixed by vortexing for 20 s and the tube microcentrifuge at full speed at 4 oC for 10 minutes.

6. Three new sterile 1.5 mL microcentrifuge tubes are prepared and labelled for a) The DNA-gel-sample to assess for sonication efficiency b) the ChIP-sample; which is incubated with antibody, and c) Input-sample which is not incubated with antibody (i.e., the internal control). 5 µL of supernatant is carefully added to the DNA-gel-sample tube, 15 µL of supernatant is added to the Input-DNA-gel-sample tube and 280 µL of the supernatant is added to the ChIP-sample tube. Do not disturb the cell pellet when aspirating and transferring the above volumes to the new tubes.

7. The magnetic bead-antibody complex is taken from the rotating platform at 4 oC

and is then placed onto the magnetic rack. The supernatant is discarded and the beads washed three times in PBS/BSA as described in Section 3.1. 100 µL of LB3 + 1X protease inhibitor cocktail is added to the beads and the tube is placed carefully in ice.

8. 800 µL of LB3 + 1X protease inhibitor cocktail is added to the ChIP-sample lysate + 90 µL of 10 % Triton-X + the 100 µL of antibody-bead complex (in LB3) prepared above. Both ChIP and Input samples are then placed on a rotating platform at 4 oC

overnight.

9. The DNA-gel sample must undergo de-crosslinking. 100 µL of de-crosslinking buffer (Table 1) is added. The tube is then vortexed for 30 s and placed in a water

(13)

bath at 65 oC. For the first 30 minutes, the sample should be vortexed for 30 s every

5 minutes and then left in the water bath overnight.

10. Phenol-chloroform extraction of DNA is performed the following day. 100 µL of TE buffer is added to the DNA-gel sample. Phenol-chloroform is added at a ratio of 1:1 (i.e. 205 µL in this case). This process should be performed in a fume cupboard and not on the laboratory bench. The sample is then vortexed for at least 30 s to allow the phenol to mix thoroughly with the DNA-gel sample. The tube is then placed in a microcentrifuge and spun at full speed (4 oC) for 5 minutes. The contents of the

tube will have formed two layers; carefully extract the upper aqueous layer containing the DNA and place it into a new sterile 1.5 mL microcentrifuge tube. Discard the remainder of the phenol-chloroform mixture

11. The sample can now undergo DNA precipitation using the following formula, per sample: (X) µL of DNA-gel sample + (X/10) µL NaCL (5M) + 3(X) µL of cold

molecular grade ethanol. The sample is then vortexed for 30 s and placed in -80 oC

conditions for at least 30 minutes; but can usually be left overnight. After a minimum of 30 minutes, the sample is microcentrifuged at full speed (4 oC) for 30 minutes. The

supernatant is carefully removed and a small pellet should be observed to one side of the 1.5 mL tube. The pellet is washed by adding (do not mix) 300 µL with 70 % cold molecular grade ethanol of the latter followed by microcentrifugation (full speed) for 5 minutes (4 oC). The supernatant is discarded and the pellet is allowed to air dry

for 10-15 minutes. The DNA pellet is resuspended in ultrapure nuclease-free water and loading dye (e.g., 6X). The final volume of the suspension should equate to 12 µL (i.e., 2 µL of loading dye and 10 µL of water). A 1% agarose gel with GelRed™ added should be prepared using TAE buffer (1X). The DNA sample should then be loaded into the DNA gel well which is covered by TAE buffer. Adjust power settings

(14)

to 70 volts for 40 minutes after which, the gel can be viewed using a Gel imaging system. Ideally most of the sonicated DNA content should be below 500 bp (Figure 1).

Figure 1. DNA gel electrophoresis of three independent tumour samples (labelled in red) undergoing sonication assessment efficiency. The DNA is required to be below 500 bp which is highlighted by the 100 bp DNA ladder. Samples 2 and 3 have been more effectively sonicated compared to sample 1. For sample 1, resonication can be performed to obtain more DNA fragments below 500 bp.

12. Providing that most of the DNA, visualised by electrophoresis imager, is below 500 bp, the ChIP and Input samples can be processed. This is owing to the fact that, small base pair inserts (i.e. between 200 and 300 bp) will be size selected, during the library preparation phase, which can be recognised by the Illumina™ sequencer (see Note 8).

13. The ChIP and Input samples are then removed after 12 to 18 hours (18 hours being the upper limit of incubation time) from the rotator and placed on ice. The ChIP-sample is vortexed then placed on a magnetic rack to allow the beads to collect to one side. The supernatant is then discarded and the beads washed with

(15)

RIPA buffer (Table 1) with the tube off the magnetic rack. 300 µL of RIPA buffer is added to the beads and the suspension is pipetted up and down 5 to 6 times. The tube is the placed back on the magnetic rack and the supernatant discarded. This process is repeated 5 more times. 300 µL of TE buffer is then added to wash the beads in the same way for the RIPA buffer washes. This step is repeated once. 100 µL of decrosslinking buffer is added to both Input and ChIP samples. Both samples are vortexed for 30 s and placed on a shaking heat block set at 65 oC for at least 6

hours or overnight.

14. The following day or after 6 hours both Input and ChIP samples are vortexed and briefly microcentrifuged to collect drops from the lid. The ChIP sample is placed on a magnetic rack and after 2-5 minutes, the supernatant is collected in a new sterile 1.5 mL microcentrifuge tube. 200 µL of TE buffer is added to both samples after which, 8 µL of RNase (1 mg/mL) is added to both samples. The samples are then incubated at 37 oC for 30 minutes to 1 hour. 4 µL of proteinase K is added to each sample and

incubated at 55 oC for 1-2 hours.

15. Phenol-chloroform extraction is performed as described above but now

transferring the supernatants to phase-lock tubes. This ensures that no mixing of the aqueous (top layer) phase with the organic phase (bottom layer) occurs. Ensure that new sterile 1.5 mL microcentrifuge tubes are used when transferring the top layer supernatant from the phase-lock tubes. DNA precipitation followed by washing of the DNA pellet are carried out as described in step 11, Section 3.4. (DNA-gel sample purification).

16. Once the DNA of both Input and ChIP-samples are obtained, the ChIP sample undergoes quantification using either the Qubit™ system (more appropriate for

(16)

histone marks) or Picogreen™ assay (used either when there is a low output reading from Qubit or when performing ChIP-Seq for TFs). Adhere to the manufacturer protocols for both of the DNA quantification methods.

17. To control for efficient immune precipitation, ChIP-qPCR should be performed as described by Schmidt et al. [13](Schmidt et al. 2009). Briefly, reactions should be carried out in 10 µL volumes. A three-step cycle programme and a melting analysis need to be applied. An example of the cycling steps are as follows: 10 s at 95 oC, 30

s at 60 oC and 30 s at 72 oC, repeated 40 times. ChIP-qPCR results should be

normalised to the measured DNA concentrations of the samples and the corresponding Input samples.

18. Compared to negative control ChIP-qPCR primers used, a minimum enrichment of 1.5 fold should be observed for positive control amplicons (i.e. positive target regions) in the ChIP-sample compared to the Input sample.

19. Once adequate enrichment is seen in the ChIP-sample over the Input sample, library preparation and subsequent sequencing is carried out. Library preparation is not described here owing to the various types of library kits available. Our group has successfully used the NEB Ultra 2 Kit for ChIP-seq for low DNA inputs. Regardless of the method used, it is worth noting that after library preparation, a repeat ChIP-qPCR should be performed to ensure that there is no loss of enrichment which can occur after library preparation. After enrichment has been confirmed (via ChIP-qPCR), and the desired fragment size captured (e.g., measured using Agilent™ Bioanalyser), samples can be submitted for sequencing (see Note 9).

(17)

The general workflow of the data analysis described in this chapter is depicted in Figure 2.

Figure 2. ChIP-Seq analysis workflow. This is the workflow suggested in this chapter. The tools used at each step are coded in orange.

3.5.1 Data pre-processing

The visualization of the enrichment of a genomic location, for a specific antibody or histone mark, is the end point of a ChIP-Seq experiment. Once the library samples have been processed and sequenced, raw reads are produced and ready for downstream analysis. Quality control, alignment to the genome, peak calling and visualization are the main steps of ChIP-Seq data processing. The sequencer generates raw reads in a format called “FASTQ” [14]. They are packed with their matching read name, optional descriptions and the PHRED quality score of all the nucleobases. Raw reads have to be quality checked in order to identify any possible factors that might reduce the performance of the raw read alignment to the references

(18)

genome assembly. FASTQC (Andrews 2010) is a publicly available and user-friendly tool which performs a quality analysis of the raw reads. It provides basic statistics such as total number of reads, average read length, quality encoding, GC content and summary graphs with which, the user is able to evaluate the data. A quick look at the “per base sequence” content and quality (see Note 10) can suggest whether the

removal of portion of reads (trimming), is necessary. A significant amount of overrepresented sequences suggests the presence of contaminants or, more frequently, adapters. It is strongly suggested to delete reads with a poor PHRED quality score throughout, while a drop in the first and last bases suggests a trimming is required using procedure described below. In general, each experiment should contain more than 20 million uniquely mapped fragments (see Note 11). Saturation

plots (see Note 12) are useful tools to estimate the actual coverage and are performed

by subsampling the reads and calculating the number of peaks acquired. The correlation curve between number of reads and number of called peaks should plateau after the optimal depth of sequencing has been reached. If the experiment is underpowered, it is always possible to re-sequence the libraries to reach the desired depth level and optimal saturation.

Many tools [15] have been developed for read and adapter trimming. Here, we give an introduction to Trimmomatic [16], a java based trimmer, highly flexible and efficient. It provides a broad range of options which allow fine tuning of the reads trimming process. Based on the results of the QC analysis, the user can easily set the most effective parameters in order to improve the quality of the dataset, enhancing the performance of the read alignment (see Note 13). Common practice dictates that a

(19)

to the selected assembly genome. Once satisfactory results are obtained from FASTQC, the next step is to proceed to alignment, i.e. alignment of the FASTQ file to the reference genome of interest. The choice of the alignment program and genome assembly (see Note 14) are the two main decisions the users need to take into

consideration moving forward, since this will affect peak calling and further downstream analysis. For human samples, the GRCh37/hg19 genome assembly is the standard reference, which at present, is the most comprehensive and annotated. However, the GRCh38/hg38 reference genome is the most recently published version but yet lacks in annotations. If required, for a particular pipeline of downstream analysis, conversion of one genome assembly version to the other is possible using a liftover tool, CrossMap [17].

3.5.2 Alignment to the genome of choice

In recent years, a wide range of alignment programs has been developed for next generation sequencing [18] in order to accommodate various research ideas. ChIP sequencing generates short reads of 50 to 75 bp length which are expected to map uniquely to the reference genome. The percentage of reads aligned, in humans, below 80%, should be reason for concern due to the high specificity of the DNA fragments and further investigation is recommended. Bowtie1 [19] is developed for the aligning of short reads, of less than 50bp while, Bowtie2 [20] is highly efficient for longer reads of 75bp (see Note 15). Bowtie1 and 2 store the aligned reads in “SAM”-format. It is a

human readable format containing the specification of each aligned reads. SAMTOOLS [21] is a powerful software, which enables the users to process SAM files, which can be several gigabytes in size. The best way to reduce the size of the file is

(20)

to convert the “SAM” file into “BAM” file, a binary format which contains the same information as the SAM file. SAMTOOLS view function, writes the new BAM that will be used for the analysis. It is suggested to sort and then index the BAM file using SAMTOOLS with command sort and index.

3.5.3 Peak-calling

Once all the sorted BAM files have been obtained, it is possible to call ChIP-Seq peaks. All the available methods developed for ChIP-Seq peak calling, process the aligned reads in order to identify the peaks, separating the ChIP signal from the background and then assigning them statistical significance. Although many algorithms are available [22], the most widely used is MACS and in particular, the updated version, MACS2 [23-24]. MACS2 is designed to build a peak model from the mapped reads taking into account the strand of the reads. Then, MACS2 calls the peaks, identifies the summits within each peak and computes p-value and q-value. Peaks are called by the callpeak function. This is the main function of MACS2 and accepts a treatment (t), which is required, and a control (c) as input parameters. The “t” is the BAM file contained within the immunoprecipitated reads while the “c” file contains the sonicated DNA fragments. A wide range of parameters, which convey more power to the user for a custom design of the analysis, can be set before performing the peak call. However, the default parameters work efficiently in most cases (see Note 17) When the latter parameters are combined, the MACS2 output is

normalized per 1 million reads, allowing a direct visualization of peak shape amongst samples. In addition, the parameters “--call-summits” allow the user to easily identify the subpeak summits. MACS2 will produce 6 output files of which, the “.xls”, “bdg” and

(21)

the “.narrowPeaks” are predominantly used for a standard analysis. The “.xls” file provides a header followed by the peak calls. The header contains all the information about the commands used to produce the run and useful statistics, which can be checked for further quality control. The peak calls table shows the location of each peak and other statistical information which are explained in the MACS2 documentation. The “.narrowPeak” and “.bdg” can be imported directly to UCSC genome browser [25] or Integrative Genomics Viewer (IGV) [26], in order to easily visualize the location and the shape of the peak. Although the UCSC genome browser is widely used and integrated with thousands of genomic databases, we recommend for the user to visualize the called peaks utilizing IGV. It provides a user-friendly interface, allowing the comparison between replicates calls. Furthermore, a wide variety of data can be integrated to allow for an interactive exploration of the genome. Once obtained, ChIP-Seq peaks that fulfill the QC criteria, the data pre-process can be considered completed and the results ready for the second line analyses.

3.5.4 Second Line analyses

The comparison between two different conditions is a common type of analysis in ChIP-Seq experiments. The evaluation of changes, at the epigenetic level, between cells in a control and treatment states, is the purpose of this experimental procedure. In the last few years, many algorithms [27] have been designed to systematically identify differences between ChIP-Seq samples. This has produced a great variety of approaches which, unfortunately, generate a poor agreement of results. Here we propose a computational pipeline called Ranking Indexing (RI). RI determines the variation of peaks between samples, using the assumption that the enrichment of a

(22)

peak is proportional to the number of cells in the sample carrying that particular information. The main output of the pipeline is the assignment of a RI to each peak call, allowing the identification of peaks varying in size between conditions.

The following are the steps for calculating the RI of each peak call:

1. Remove duplicates from each “.bam” file that contains the immunoprecipitated reads. This can be achieved using Picard “MarkDuplicates” with the parameter “REMOVE_DUPLICATES= true” then, sort and index the obtained “.bam” with SAMTOOLS.

2. Count the number of reads in the “.bam” file obtained from step 1. The reads can be counted using “SAMTOOLS view -f 0x904 TOT_READS.bam”.

3. Calculate the read coverage breadth and depth of each peak call to obtain the “COV_VALUE”. The count can be performed with “BEDTOOLS multicov tool” [28] (Quinlan and Hall 2010) with the standard parameters. The command requests a “.bed” and a “bam”. as input. The “.bed” should contain the coordinates of the peak call of a sample, while the “bam” file should be the one obtained in step 1 from the same sample.

4. Calculate the “LENGTH” of each peak call subtracting the “end” coordinate of the peaks with their “start” coordinate.

5. Normalize the peak calls, calculating the “NORM_SCORE”.

NORM_SCOREi= ((COV_VALUEi / LENGTHi)⋅106))* 103/ TOT_READSi

6. Sort the peak calls according to the “NORM_SCORE” from the highest to the lowest.

(23)

7. Assign to each “NORM_SCORE” the corresponding percentile value. This value is the RI associated to each peak.

Figure 3. Visual example of second line analysis. After assigning the RI, following the commands in italic, it will be possible to identify the lost peaks, the acquired peaks and the peaks which show differences between 2 conditions.

Once calculated the RI in all samples, a table with the significant peak coordinates and the associated RI is available and ready to be compared with the others. This approach is very useful to monitor the changes of enrichment of the same peak in different samples and also to identify peaks which are acquired or lost between the different conditions. Again, BEDTOOLS suite is helpful to sort this out, “BEDTOOLS intersect -wa -wb -a Control.bed -b Treatment.bed” will identify the matching peaks, reporting the RI. The peaks that are lost between control and condition, will be identified adding to the above command, the option “-v”, while the peaks acquired in the treatment are determined using “-v” and switching the Treatment to “-a” and the

(24)

Control to “-b”.

4 Notes

1. Pipettes should be calibrated prior to performing the ChIP experiment to ensure accurate measuring and transfer of micro-volumes.

2. Frost-free microcentrifuge tubes ensure more efficient sonication.

3. This should be prepared freshly and can be used 7 days prior to disposal. 4. Check with the sequencing facility that you will be using to sequence your

samples and order the appropriate library assembly kit. It is also worthwhile checking which reagents are required to use alongside the library kits for preparation of samples for ChIP-Seq. For example, SPRIselect® or magnetic beads are used in collaboration with NEBNext® library kits for DNA size selection.

5. It is advised to start with 4 µg of antibody initially. This can be titrated in prospective experiments following the assessment of the ChIP-qPCR results. 6. If the collected tissue is not used straight away, snap-freezing and storage in

at least -80 oC conditions, until future use, will ensure maximal DNA

preservation.

7. Add the solution to the side of the petri-dish so as not to disrupt the adherent cells.

8. If there is little or no DNA fragments below 500 bp, the samples may require further sonication or the experiment repeated but with a higher number of sonication cycles.

(25)

9. Ensure that pipettes are calibrated and sterilised for the library preparation phase.

10. The average quality read score higher than 30 (Illumina 1.9 encoding) is considered as very good quality while between 20 to 29 is reasonable quality. A quality score under 20 is considered very poor.

11. This should be higher for histone modifications but it might be lower for transcription factors. In general, the higher is the expected number of peaks, the higher should be the number of mapped reads.

12. Saturation plots can be easily obtained using ACT suite [29].

13. Generally, the removal of adapters, indices, 5bps at the 5’ and 3', and the reads showing less than 30 PHRED score throughout, are sufficient to obtain satisfactory results.

14. It is possible to download the chosen genomic assembly from Illumina’s Igenome

(https://support.illumina.com/sequencing/sequencing_software/igenome.html).

15. Both tools are open source and default parameters work well in most cases. A good practice is to carefully examine and save the report of the mapping statistics produced at the end of the alignment.

16. A full explanation of all the SAM format specifications is available on SAMTOOLS webpage.

(26)

References

1. Park PJ (2009) ChIP-seq: advantages and challenges of a maturing technology. Nature reviews Genetics 10 (10):669-680.

2. Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H, Zhang X, Green RD, Lobanenkov VV, Stewart R, Thomson JA, Crawford GE, Kellis M, Ren B (2009) Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459 (7243):108-112.

3. Tolstorukov MY, Kharchenko PV, Goldman JA, Kingston RE, Park PJ (2009) Comparative analysis of H2A.Z nucleosome organization in the human and yeast genomes. Genome research 19 (6):967-977.

4. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA (2000) Genome-wide location and function of DNA binding proteins. Science (New York, NY) 290 (5500):2306-2309.

5. Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, Brown PO (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409 (6819):533-538.

6. Lieb JD, Liu X, Botstein D, Brown PO (2001) Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association. Nat Genet 28 (4):327-334.

7. Horak CE, Snyder M (2002) ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods in enzymology 350:469-483

(27)

8. Weinmann AS, Yan PS, Oberley MJ, Huang TH, Farnham PJ (2002) Isolating human transcription factor targets by coupling chromatin immunoprecipitation and CpG island microarray analysis. Genes & development 16 (2):235-244.

9. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S (2007) Genome-wide profiles of STAT1 DNA association using

chromatin immunoprecipitation and massively parallel sequencing. Nature methods 4 (8):651-657.

10. Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science (New York, NY) 316 (5830):1497-1502. 11. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G,

Chepelev I, Zhao K (2007) High-resolution profiling of histone methylations in the human genome. Cell 129 (4):823-837.

12. Landt SG, Marinov GK, Kundaje A, Kheradpour P, PaµLi F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L,

Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE,

Rozowsky J, Shoresh N, Sidow A, Slattery M, StamatoyannopoµLos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome research 22 (9):1813-1831.

(28)

13. Schmidt D, Wilson MD, Spyrou C, Brown GD, Hadfield J, Odom DT (2009) ChIP-seq: using high-throµghput sequencing to discover protein-DNA interactions. Methods (San Diego, Calif) 48 (3):240-248.

14. Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM (2009) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ

variants. Nucleic Acids Research 38:1767–1771.

15. Fabbro CD, Scalabrin S, Morgante M, Giorgi FM (2013) An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis. PLoS ONE. 16. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.

17. Zhao H, Sun Z, Wang J, Huang H, Kocher J-P, Wang L (2013) CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30:1006–1007.

18. Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11:473–483.

19. Langmead B (2010) Aligning Short Sequencing Reads with Bowtie. Current Protocols in Bioinformatics.

20. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357–359.

21. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079.

(29)

22. Wilbanks EG, Facciotti MT (2010) Evaluation of Algorithm Performance in ChIP-Seq Peak Detection. PLoS ONE.

23. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nussbaum C, Myers RM, Brown M, Li W, Liu XS (2008) Model-based Analysis of ChIP-Seq (MACS). Genome Biology.

24. Feng J, Liu T, Qin B, Zhang Y, Liu XS (2012) Identifying ChIP-seq enrichment using MACS. Nature Protocols 7:1728–1740.

25. Tyner C, Barber GP, Casper J, Clawson H, Diekhans M, Eisenhart C, Fischer CM, Gibson D, Gonzalez JN, Guruvadoo L, Haeussler M, Heitner S, Hinrichs AS, Karolchik D, Lee BT, Lee CM, Nejad P, Raney BJ, Rosenbloom KR, Speir ML, Villarreal C, Vivian J, Zweig AS, Haussler D, Kuhn RM, Kent WJ (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45 (D1):D626-D634. 26. Thorvaldsdottir H, Robinson JT, Mesirov JP (2012) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 14:178–192.

27. Steinhauser S, Kurzawa N, Eils R, Herrmann C (2016) A comprehensive comparison of tools for differential ChIP-seq analysis. Briefings in Bioinformatics. 28. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842.

29. Jee J, Rozowsky J, Yip KY, Lochovsky L, Bjornson R, Zhong G, Zhang Z, Fu Y, Wang J, Weng Z, Gerstein M (2011) ACT: aggregation and correlation toolbox for analyses of genome tracks. Bioinformatics 27:1152–1154.

(30)

Referenties

GERELATEERDE DOCUMENTEN

Now the problem of exact non interacting control by measurement feedback in the "classical" context is said to be solvable if there exists a compensator with

a) duplicerende research plaatl vindt. Door samenwerking kan deze verlpilling worden voorkomen. b) veel nieuwe kennis naar de concurrent weglekt, zodat

Therefore the third term is interpreted as the rate at which kinetic energy is drawn from ttie mean motion of ttie flow and converted into turbulence kinetic energy.

occurrence of the verb tla come which takes a locative goal complement, and the occurrence of two default arguments D-ARG, realised as prepositional ka phrase, ka koloi by car

Figure 1a shows a cross-sectional SEM image of sample 1 formed under the normal growth conditions, in which vertically aligned nanowire arrays about 80 nm in diameter and 3.5 μm

Our response to this is in the negative, based on the clear elaboration of the underlying principles which seem to be common both to restorative justice and to the philosophy

• great participation by teachers and departmental heads in drafting school policy, formulating the aims and objectives of their departments and selecting text-books. 5.2

Disorganization of elastin and a changed organization of collagen fibers were also observed in our PCLS model following treatment with elastase, demonstrating that elastase