• No results found

Advances in Data Dependent and Data Independent Acquisition for data analysis in proteomic research

N/A
N/A
Protected

Academic year: 2021

Share "Advances in Data Dependent and Data Independent Acquisition for data analysis in proteomic research"

Copied!
53
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

MSc Chemistry

Analytical Sciences

Literature Thesis

Advances in Data Dependent and Data Independent

Acquisition for data analysis in proteomic research

by

Florian L. R. Lucas

11198877

September 2016

12 EC

Supervisor:

Examiner:

dr. Irena Dapic

dr. Garry L. Corthals

dr. Garry L. Corthals

dr. Wim Th. Kok

(2)

i

Abstract

Modern proteomic research is commonly based upon liquid chromatography – tandem mass spectrometry1. Generally, the proteins are digested and the peptides

are ionized by electrospray ionization2. The peptide ions are sent to the first mass

analyser where a window of ions is selected to be fragmentized and analysed by the second mass analyser1,2. The sequence that is used to obtain a spectrum is called

acquisition. Two main streams of acquisition exist, data dependent (DDA) and data independent (DIA)1–3. The main differences between DDA and DIA is that the window selection of the first mass analyser is dynamic during DDA, while it is used to scan the complete spectrum during DIA.

As with the acquisition, the analysis consists of two main streams of methods. The data base search which compares the measured spectra with a known data base and the de novo search where a spectrum is built from unknown sequences to match the measured spectrum2,3.

In this review, several methods of DDA and DIA are discussed in order to show which methods are best used and what type of acquisition is expected to become the main method in the years to come. The analysis methods are also compared on usability, however no further prospects are given.

The parallel accumulation-serial fragmentation (PASEF) method is shown to be best in both identification and quantification; however, sequential windowed acquisition of all theoretical fragmented ion mass spectrometry (SWATH MS) is shown to have the best trade-of traits when comparing accessibility.

In total, the Andromeda analysis is shown to be most widely applicable, easy to use and accepted.

The general conclusion that can be drawn, is that DDA is fully developed and DIA is still in its final developing phase. The advantages that DIA can give over classical DDA approaches on modern measurement devices is both visible in identification and quantification. Because of this, a rise in applications of DIA methods is sure to arise in the coming years.

(3)

ii

Abbreviations

ESI : electrospray ionization

HPLC : high-performance liquid chromatography

MS : mass spectrometer

DDA : data-dependent acquisition

AMEx : accurate mass exclusion-based data-dependent acquisition LC-MS : liquid chromatography – mass spectrometry

m/z : mass to charge

SWATH MS : sequential window acquisition of all theoretical fragmentent ion

mass spectra

PASEF : parallel accumulation-serial fragmentation DIA : data-independent acquisition

TOF : time-of-flight

QMF : quadrupole mass filter

FT-ICR : fourier transform ion cyclotron resonance TIMS : trapped ion mobility spectroscopy

ETD : electron transfer dissociation

XDIA : extended data-independent acquisition CID : collision-induced dissociation

AIF : all ion fragmentation

HCD : high-energy C-trap dissociation

qTOF : quadrupole time-of-flight (exhilarating only quadrupole) TQMF : triple quadrupole mass filter

FID : free induction decay FDR : false discovery rate NRP : nonribosomal peptides HDA : hybrid data acquisition

DT DDA : decision tree-driven data-dependent acquisition PAcIFIC : precursor acquisition independent from ion count

qPAcIFIC : quantitation precursor acquisition independent from ion count PQD : pulse-Q-dissociation

BSA : bovine serum albumin UPS : universal protein sample LIT-Orbitrap : linear ion trap – orbitrap

FT-ARM : fourier transform all reaction monitoring MSX : multiplexed data independent acquisition pSMART : hybrid data acquisition and processing strategy EDTA : ethylenediaminetetraacetic acid

QTOF : quadrupole time-of-flight

topN : list with number of counts for given m/z OMSSA : open mass spectrometry search algorithm

(4)

iii

List of figures

FIGURE 1TIMELINE BETWEEN 2007 AND 2016 ... 2

FIGURE 2TRAPPED ION MOBILITY SPECTROMETER ... 7

FIGURE 3 SCHEME FOR THE PASS THROUGH OF THE PRIMARY MASS SPECTROMETER IN DDA AND DIA ... 9

FIGURE 4SCHEMATIC WORKFLOW OF DATABASE SEARCH ALGORITHMS ... 11

FIGURE 5SCHEMATIC WORKFLOW OF DE NOVO SEARCH ALGORITHMS ... 12

FIGURE 6SCHEME OF DT PROBABILISTIC DECISION TREE ... 14

FIGURE 7COMPARISON OF AMEX AND STANDARD DDA WORKFLOW ... 15

FIGURE 8SCHEME OF ORBITRAP MS, REGULAR AND AIF ... 17

FIGURE 9SWATHMS SCHEME ... 19

FIGURE 10 PSMART SEQUENCE ... 21

FIGURE 11SCHEME OF PASEF ... 23

(5)

iv

Table of Contents

Abstract ... i

Abbreviations ... ii

List of figures ... iii

1. Introduction ... 1

1.1 Aim of the study ... 4

1.2 MS instrumentation ... 4

Fragmentation methods ... 5

Mass analyzers ... 6

1.3 Data acquisition in proteomics: an introduction ... 9

1.4 Data analysis in proteomics: a general overview ... 10

2. Advances in LC-MS/MS data acquisition and analysis ... 13

2.1 Advances in LC-MS/MS data Acquisition ... 13

2.1.1 DT DDA (2008) [DDA] ... 14 2.1.2 AMEx (2009) [DDA] ... 15 2.1.3 PAcIFIC (2009/2011) [DIA]... 16 2.1.4 AIF (2010)[DIA] ... 17 2.1.5 XDIA (2010) [DIA] ... 18 2.1.6 SWATH-MS (2012) [DIA] ... 19 2.1.7 FT-ARM (2012) [DIA] ... 20 2.1.8 MSX (2013) [DIA] ... 20 2.1.9 pSMART (2014) [HDA] ... 21 2.1.10 PASEF (2015) [DIA] ... 22

2.2 Database search algorithms ... 24

2.2.1 SEQUEST/Comet (1994/2013) ... 25

2.2.2 Andromeda (2011) ... 26

2.2.3 MassWiz (2011) ... 27

(6)

v

2.2.5 X!Tandem (2004 renewed in 2008) ... 29

2.3 De novo search algorithms ... 30

2.3.1 MSNovo (2007) ... 30 2.3.2 ADEPT (2010) ... 30 2.3.3 Antilope (2012) ... 31 2.3.4 pNovo+ (2013) ... 31 2.3.5 UniNovo (2013) ... 31 3. Discussion ... 32

3.1 Comparison of acquisition methods ... 33

3.2 Comparison of data analysis methods ... 35

4. Conclusions ... 36

5. Future proposals ... 37

Acknowledgements ... 37

Literature ... 38

(7)

1

1. Introduction

The field of proteomics is relatively young, with the term ‘proteome’ being coined approximately two decades ago by Wilkins et. al.4,5. The proteome is defined as the

entire protein complement expressed by a genome, or by a cell or tissue type4,6.

Measuring of proteins using mass spectrometry became feasible after the introduction of electrospray ionization (ESI) for proteins in 19887. ESI is a soft

ionization technique, which allows large molecules to be ionized with few fragmentations1. This allows ESI to reproducibly ionize peptides, resulting in two

main streams of peptide study.

A method for structural analysis of peptides by high-performance liquid chromatography (HPLC) combined with ESI - tandem mass spectrometry (MS) was proposed by Griffin et. al. in 19918. This method grew to what is now known as data

dependent analysis (DDA), however, it was several years away from becoming a useful method.

Wilm and Mann showed another promising method for peptide analysis by exploiting ESI capillary in 19949. Wilm and Mann separated a peptide mixture by loading it into

a capillary and used field evaporation to separate the pepetides. This method showed several flaws compared to the method described by Griffin et. al. and is of minor use in modern proteomics.

Improvements for HPLC and reduction of costs of a HPLC instrument allowed the DDA method to become the standard for proteomic research. Several methods based on the original concept described by Griffin et. al. were proposed in recent years8,10.

The Google Scholar search engine shows a steady increase in publications in proteomic journals (see Supplementary material 1). Proteomics is said to be the large-scale study of proteins and systematic study of protein structure1,10–12. A lot of proteomic research from the last decade is based upon liquid chromatography - mass spectrometry (LC-MS)1,10–12. The most common form of performing MS in the field of proteomics is tandem MS as described by Griffin et. al.8,10.

(8)

2 Recently, two major directions for proteomic LC-MS/MS are available; namely, top-down and bottom-up LC-MS/MS. During a top-top-down measurement, the complete protein is ionized and analyzed by the MS2,13. A bottom-up LC-MS/MS measurement

consists of firstly digesting the protein(s) in question and analysis of the protein fragments (peptides) by LC-MS/MS2. This review is focused on the acquisition and

analysis of peptide based (bottom-up) proteomics.

The fragmentation of the proteins can be done in numerous ways and provides vast amounts of data that are analyzed by analytical methods specifically developed for proteomic research (see Chapter 2.1). The analysis of proteins requires a tremendous amount of operations and calculations, with smaller proteins already consisting of several hundred to thousands amino acids14. The operations that are

performed (acquisition) and the post-processing calculations in order to acquire protein structure have seen a steady increase in availability15. Figure 1 shows a

timeline of several acquisition and analysis techniques developed in the last decade.

Figure 1. Timeline between 2007 and 2016 with several major data acquisition and analysis techniques. The

header between brackets shows the type of technique and the sub header shows the abbreviated name of the method in question.

(9)

3 In order to get a clear understanding of the techniques developed and discussed here, the basic workings of the MS will be briefly discussed. Here, ESI is used as a standard ion source, unless stated otherwise.

The acquisition of a useful LC-MS/MS spectrum is tedious. The first mass analyzer serves as a feed for parent ions to be fragmented, here a window of parent ions is allowed to undergo fragmentation. If the window of parent ions is too wide, too many fragments will reach the detector on the second mass analyzer. The high amount of fragments have a high probability of overlapping mass to charge (m/z), rendering the spectrum useless for identification/quantification. If the window of parent ions becomes narrower the spectrum will be more selective. The selectiveness of a smaller parent window can become problematic as there is a high risk of losing valuable data. This loss of information is mainly caused by the fact that less parent ions are analyzed in a certain time frame.

To address this issue, over the last decade, many publications show the use of new (or renewed) techniques to both acquire and analyze LC-MS/MS spectra for proteomics11,16–19.

As will be shown in Chapter 2, both the acquisition and data analysis play a major role in modern proteomics. Acquisition techniques like sequential window acquisition of all theoretical fragmentent ion mass spectra (SWATH MS or SWATH) (Chapter 2.1.6) and parallel accumulation-serial fragmentation (PASEF) (Chapter 2.1.10), show that not all problems can be solved using a single technique. Even if one technique can be used, the data analysis algorithms could make a difference. Chapters 2.2 and 2.3 show that there are many ways to solve a LC-MS/MS spectrum, not always leading to the same result. How to combine acquisition and analysis will be shown in Chapter 3 where all techniques are critically compared. The final part of this review is focused on addressing the question, ‘which acquisition and analysis technique(s) are best suitable for solving generic proteomic problems’.

(10)

4

1.1 Aim of the study

This study is intended to bring the reader up to speed with the advances in the period of 2006 till mid-2016 in data dependent acquisition (DDA) and data independent acquisition (DIA) as well as their respective data analysis techniques. Both the acquisition and analysis techniques are compared to select the correct acquisition and analysis technique for a given experimental question, allowing the rapid selection of experimental setup. The central topic of this research is to show what has been published in the last decade and what can be expected within the next few years.

1.2 MS instrumentation

The first step in acquiring a MS spectrum is ionization of the molecules followed by a mass analysis of the different ions1,10. The most widely used MS ionization technique

in analysis of peptides is ESI20. After ionization, the peptides are analyzed by mass

analyzers like time-of-flight (TOF)21, quadrupole mass filter (QMF)22, Fourier

transform ion cyclotron resonance (FT-ICR)23 or (trapped) ion mobility devices

(TIMS) in combination with TOF1,10,22.

A regular mass spectrometer makes use of a detector that registers the number of counts for any given m/z in a pre-defined window1. In proteomics a tandem MS or

MS/MS is widely used to identify peptides by their fragment (and parent) ions1. The

tandem MS, in proteomics, uses the first mass analyzer to distinguish the parent ion by passing only a window of m/z to the detector or second mass analyzer. In the case of a second MS, the parent ion is fragmented to be measured by the second mass analyzer. The fragmentation spectrum recorded allows to determine the structure of the parent ion; thus effectively determine the peptide eluted from the LC10.

(11)

5 Electrospray ionization (ESI)

In ESI molecules are ionized by evaporating the elute by means of decreasing atmospheric pressure of the evaporation chamber3,24. The droplet size is decreased

due to coulombic fission, providing possibility of ionization transfer from the matrix vapour to the analyte. The charged peptides are exhilarated towards the exit slit of the ESI (entry slit of the mass analyzer)3,24, a scheme of ESI is shown in

Supplementary material 2.

Fragmentation methods

Electron transfer dissociation (ETD)

In order to fragment large, multiply-charged molecules in the gas phase, ETD can be used. ETD uses radical anions that are mixed with the positively charged target peptides. Upon electron transfer between the negatively charged radicals with the positive target peptides, the bonds of the relatively weak backbone of the peptides can break25. Due to the rather direct breaking of the bond, post translational

information is kept intact, which is required for extended data-independent acquisition (XDIA) (Chapter 2.1.5)26.

Collision-induced dissociation (CID)

Another method for fragmentation of target peptides in gas phase is CID. During CID fragmentation, target peptides are accelerated (usually by an electrical potential) to gain a high kinetic energy. When the target peptides reach a high enough energy, they are allowed to collide with neutral gas molecules. CID has the advantage over ETD when dealing with singly charged molecules, and allows for better guidance in fragment size27. CID is used in approximately half of the acquisition techniques

described in Chapter 2.1.

During all ion fragmentation (AIF) (Chapter 2.1.4), a special type of CID called high-energy collisional dissociation (HCD, also high-high-energy C-trap dissociation) is used. HCD is specific for orbitrap MS, where the fragmentation is done externally from the orbitrap28,29.

(12)

6

Mass analyzers

Time-of-flight (TOF)

The separation of m/z after ionization is performed by a mass analyzer. The most widely used mass analyser is TOF. A TOF in its most simplistic form (linear) exhilarates ions by one (or multiple) electric field towards a detector. The velocity of the ions in the free flight path is a function of their m/z, thus the time between the point of exhilaration and detection can be used to calculate the corresponding m/z of the ionized peptides. This method is relatively quick, as a spectrum can be measured within milliseconds21; as will be shown in Chapter 2.1.10, the PASEF method uses

sub-millisecond detection19. Another advantage is that a spectrum can be measured

with different exhilaration speeds, providing higher resolution spectra as the error of exhilaration is averaged to zero21.

Quadrupole mass filter (QMF)

QMFs are one of the most commonly used mass analysers10. QMF is commonly

coupled to a TOF (qTOF). A quadrupole mass analyzer consists of four charged rods in parallel position, so that ions can travel down the z direction. The charge on the parallel rods is varied with a given radio frequency, making most m/z instable in the z direction. An instable m/z will make a molecule collide with the charged rods. Only the molecules with m/z that are stable throughout the z directory reach the detector30.

A QMF that selects a given m/z and accelerates ions is abbreviated as Q, an accelerating only (stable for a wide range of m/z) QMFs are abbreviated as q.

QMFs allow to be coupled and are commonly found as triple QMF (TQMF). TQMF consists of three QMFs coupled in serial, commonly found in QqQ setting to allow CID (Chapter 1.4).

The technique has the advantage to be more precise in m/z selection than TOF, however, TOF can measure a broader range in less time.

(13)

7 Fourier transform ion cyclotron resonance (FT-ICR)

The third most common mass analyzer type is FT-ICR23. While FT-ICR itself is a

rather complicated technique, it can be simplified to several basic concepts31. The

(incoherent) ions are first excited by a uniform field rotating at the cyclotron frequency of the m/z to be measured (excitation field), making the ions rotate in phase (coherent). After removal of the excitation field, the ions fall back to their ground state, which is measured as free induction decay (FID). Using Fourier transform, the time domain of the FID can be converted into frequency, which is directly correlated to the m/z of the species in the sample31. Complex forms of FT-ICR are used in

acquisition techniques for the primary MS1, as this technique is essentially nondestructive. Accurate mass exclusion-based data-dependent acquisition (AMEx as shown in Chapter 2.2.2) and XDIA (Chapter 2.2.5) use an Orbitrap, which is similar to FT-ICR, in order to quickly acquire the MS1 spectrum.

(Trapped) ion mobility spectrometer (TIMS)

Figure 2. Trapped ion mobility spectrometer; the (fragmented) ions enter the TIMS device from the left side,

where they are brought into an increasing negative field gradient (purple) by means of the positive electric field (green). Pressure is build up in the TIMS chamber and the valve at the right side of the TIMS is opened and the charged in the negative and positive poles is decrease, releasing the ions in order of mass and charge.19

Another mass analyzer, used in fewer numbers applications than quadrupole is TIMS19. TIMS are simplistically performed as follows: first, the protein(s) eluting from

the LC are ionized and fragmentized, upon which the fragment ions are transferred into a vacuum system and focussed towards the entrance slit of the TIMS tunnel. The tunnel is sealed and consists of pairs of electrodes. Upon entering, the ions

- +

(14)

8 experience a drag of the incoming gas, as well as a counteracting electric field that slows down the fragment ions, allowing separation in ion mobility. The accumulation is performed by closing the TIMS tunnel and inversing the potential of the defection plate. Ions in the tunnel are released by lowering the potential of the electrode pairs. The resulting output of the TIMS can be detected by a TOF, as the ions pass the output of the TIMS at different times depending on their mass and charge, not m/z as the mass and charge are not linearly dependent on each other in this case.

TIMS as described here allow for faster mass analysis, which is required by PASEF in order to achieve sub-millisecond analysis as introduced in Chapter 2.1.10.

(15)

9

1.3 Data acquisition in proteomics: an introduction

Figure 3. Scheme for the pass through of the primary mass spectrometer in DDA (A) and DIA (B). Here, consider

a single m/z channel to be most abundant between t1 and t8, In A, the m/z window is focussed on a single m/z

window that can move given pre-defined rules. In B, the m/z window is alternated regardless of the data.

In general, LC-MS/MS proteomic data acquisition can be divided into two main classes: data-dependent acquisition (DDA) and data-independent acquisition (DIA)11,32,33.

Many DDA measurements acquire a spectrum in the following way; first, the operational tandem MS selects the m/z window of interest from the first MS to be fragmented in the second MS; this selection usually performed dynamically by pre-defined rules, e.g. highest signal intensity. Only the m/z window that complies with the pre-defined rules is recorded, thus a large portion of the lower abundant ions are usually not recorded. This imposes a large bias, while it should be increasing sensitivity as a single (parent) peptide can be measured for a relative long time. As mentioned, the selectiveness of DDA methods can make lower abundant peptides to be ignored by the detector34–37, making it less suitable for most qualitative proteomics.

When dealing with quantification of proteins, generally, the DDA method is expected to provide an increased sensitivity compared to its data-independent counterpart. As shown in Figure 3A, the m/z window is selected given a known window in time.

(16)

10 On contrary to DDA measurements, the DIA method does not require prior knowledge of the (protein) composition of the sample. This makes it a useful way for the identification of “unknown proteins”; hence it is also called de novo search (translated as new search)32. A DIA measurement works similar to a DDA

measurement, however, in DIA the m/z window in the primary MS pre-defined to dynamically scan through the whole m/z spectrum for fragmentation, effectively allowing the measurement of the complete mixture, rather than just high intensity fragments32. In Figure 3B, a SWATH MS analysis is shown (Chapter 2.1.6), where

the open window is moved in time.

The advantage of the DIA method in identification measurements is clear, as the likelihood of not observing anomalies from the expected is lower than in DDA; however, in quantification, the DIA method can pose low sensitivity, as the complete spectrum must be scanned, reducing the acquisition time per data point.

1.4 Data analysis in proteomics: a general overview

With different acquisition techniques come distinctive analysis techniques of the obtained data. The two main groups in proteomic spectral data analysis algorithms are database search and de novo search algorithms38.

Database search algorithms make use of a pre-defined database of known protein sequences. Many algorithms have been suggested to provide the highest selectivity and efficiency in assignment of spectra. Examples include the Andromeda algorithm as implemented in the MaxQuant environment16 (Chapter 2.2.2) and the Comet

algorithm39 (Chapter 2.2.1).

The general work flow of these algorithms is as follows. Firstly, a database with possible spectra is generated or loaded, these spectra are known as the target spectra. The second step involves the comparison of target with the measured spectrum using a scoring equation. The scoring provides a list of peptides that are more probable in the sample. This work flow is schematically drawn in Figure 4.

(17)

11

Figure 4. Schematic workflow of database search algorithms. The target spectra are loaded and compared to the

measured spectrum to produce a list of probable peptides in the sample.

While these algorithms provide rapid qualification of proteins, they are subject to errors. The false discovery rate (FDR)17 shows the number of proteins that are

identified, but known not to be present in the sample. Therefore, the FDR correlates to the probability that proteins have been falsely assigned. While the FDR is a good indicator of the mean false assignment, it is unable to show which peptides have been assigned falsely.

In an attempt to determine the FDR in database search algorithms, target-decoy search algorithms can be used17. Target-decoy algorithms generate (or include)

decoy spectra to the target database. The decoy spectra are similar to the target spectra, but are assumed not to be present in the sample40,41. The number of

assignments in the decoy spectra can be monitored and is used to estimate the FDR. One case where target-decoy search is exploited is in the MassWiz algorithm (Chapter 2.2.3)42.

The de novo search tree of algorithms make use of the characteristics of molecular fragments in the tandem MS spectrum43. Therefore, they do not require a database

of exactly known spectra, rather they make use of partial information for assignment. These algorithms provide general information about the spectrum at hand. The down side of de novo search algorithms is that they are usually iterative and may not always converge around the same answer. Algorithms of this tree include MSNovo44

(18)

12 The general workflow of de novo search algorithms is as following. First a list of possible sequences is generated from the measured spectrum. Hypothetical spectra are generated from the sequence and compared to the measured spectrum in a way similar to database search44.

Figure 5. Schematic workflow of de novo search algorithms. From the measured spectrum, several possible

sequences are generated. The generated sequence spectra are compared to the measured spectrum to determine the true sequence.

De novo search algorithms tend to be more robust to anomalies and allow for analysis of unknown proteins, however, they tend to be slower than its database search counterpart and are not as good in quantification. The PEAKS46 algorithm

was and still is a major de novo search algorithm and is usually used to compare newer de novo search algorithms. Due to the release year before 2006, PEAKS will not be discussed.

For nonribosomal peptides (NRP), the search algorithms CycloBranch47 and

CYCLONE48 can be used, however, as this is a specific type of peptides, these

(19)

13

2. Advances in LC-MS/MS data acquisition and analysis

2.1 Advances in LC-MS/MS data Acquisition

Many data acquisition methods have been proposed over the last decade, the most useful, unique and widely used methods are described in this chapter in chronological order upon which they were first described. While there consists a clearly defined border between DDA and DIA, some acquisition methods make use of hybrid data acquisition (HDA). HDA generally performs both a DDA and DIA method in order to obtain the benefits of both DDA and DIA.

(20)

14

2.1.1 DT DDA (2008) [DDA]

Decision tree-driven DDA (DT DDA) is an automated way to select dissociation method during acquisition49. The method developed by Swaney et. al. in 2008 allows

switching between CID (Chapter 1.2) and ETD (Chapter 1.2) to increase identification of peptides (compared to CID or ETD stand-alone)49. As described in Chapter 1.2,

CID can be used be used for single charged species, while ETD is better in fragmenting multiple charged molecules. Higher m/z ranges have a higher probability of multiple charged species, thus a decision tree can be made given a certain m/z window in which fragmentizer should be used (see Figure 6).

DT uses the high-accuracy MS1 data and assigns a dissociation method that is expected to result in the best identification. The decision is based on prior measurements, as shown in Figure 6, certain areas of the spectrum are assigned to a certain method. Whether CID or ETD is selected, depends on the total ion count and m/z of the spectrum.

The application of DT DDA is mostly as a compliment to the ‘main method’. For example, in Chapter 2.1.5 XDIA is introduced that uses an ETD. In order to increase identification by XDIA, DT can be considered (Chapter 5).

Figure 6. Scheme of DT probabilistic decision tree where CAD stands for CID49. It was found that certain charge

(21)

15

2.1.2 AMEx (2009) [DDA]

Accurate mass exclusion-based data-dependent acquisition (AMEx) is a DDA strategy to include low intensity signals by subsequent scans50. Published by

Rudomin et. al. in 2009, AMEx builds the mass exclusion list for the primary MS given prior scans50. Using the known information from previous scans, the entire

spectrum can be measured given that enough scans are performed.

AMEx works by first obtaining a standard DDA spectrum by building a mass exclusion list by spectral counting. The second step of AMEx is the qualification of the obtained spectrum. Peptides that are qualified and validated are excluded from the subsequent exclusion lists. In the final step, the new exclusion list is merged with the retention time clusters and uploaded for the second acquisition50. These

operations are show schematically in Figure 7.

(22)

16 Due to the updated exclusion list, AMEx allows for better identification of proteins. There was no direct intent to quantify the peptides identified. However, it can be expected that the peptides identified can be quantified with approximately 75% the confidence of standard DDA (as peptides scans are repeated approximately 3 times in AMEx and 4 times in standard DDA for high intensity peptides)50.

2.1.3 PAcIFIC (2009/2011) [DIA]

Goodlett c.s. described the method of precursor acquisition independent from ion count (PAcIFIC) in 200951, which was updated in 201152. PAcIFIC makes use of

multiple injection, during which an m/z range of 15 m/z is scanned in the tandem MS, covering a range of 400 to 1400 m/z. In order to increase accuracy, the scan of 15 m/z is divided in 10 segments of 1.5 m/z to be measured for a total time of 0.05 sec each. This method uses a total of 67 injections to determine the complete spectrum of 400 to 1400 m/z, which translates to several days of instrument time.

The spectral resolution of the measured spectra is higher than that of regular DIA53,

where the window is alternated within one scan over the whole range; however, this is essentially due to artificial spectral resolution increase, as smaller fractions are analysed at a time.

In 2011, Goodlett c.s. modified their method to allow quantitation (qPAcIFIC). The modification from PAcIFIC uses isobaric labelling and a complementary pulse-Q-dissociation (PQD) scan in order to quantify the identified proteins from the CID spectral scan52. PQD was developed and patented by Thermo scientific and works

similar to CID. Different from CID, PQD pulses the precursor ions, allowing low Q factor ions to be fragmented54. A PQD in the ion trap allows the detection of low m/z

ions55. PQD scans are dominated by poorly fragmented molecules when compared

to CID, thus limiting its usability56.

The qPAcIFIC method was shown to allow accurate quantification for a broad spectrum of proteins within a sample of P. Aeruginosa.

(23)

17

2.1.4 AIF (2010)[DIA]

In 2010, Geiger et. al. described the first use of all ion fragmentation (AIF)28. AIF

allows peptide identification without precursor selection, by higher energy collisional dissociation (HCD Chapter 1.2) fragmentation. The eluted peptides from the LC are directly electrosprayed into the C-trap of the MS, where packages of peptides are fragmented by the HCD. The peptide fragments are transferred back through the C-trap towards the orbiC-trap for analysis28.

Figure 8. Scheme of orbitrap MS, regular (left) and AIF (right). Regular orbitrap MS (left) loads an inlet of

peptides into the C-trap and releases them into the orbitrap after a defined accumulation time. In AIF (right) the peptides are loaded through the C-trap into the HCD. Fragments of these peptides are transferred back from the HCD into the C-trap. The C-trap with fragmented peptides is injected into the orbitrap by thought a deflector plate.

Originally, the AIF method used one collision energy for fragmentation of the peptides, however, Geiger et. al modified this feature to have a ramped collision energy to increase fragmentation28.

AIF scans can be alternated with MS scans in order to guarantee the detection of all elution peaks. This allows the efficient identification of protein mixtures of bovine serum albumin (BSA), 48-protein universal protein sample (UPS) and HeLa cells28;

herein a database search was performed in the MaxQuant environment. The HeLa cells were pre-treated and separated by SDS-page allowing Geiger et. al. to identify 120 distinct peptides belonging to 20 proteins.

(24)

18 Geiger et. al. identified all known peptides in BSA as well as 5 and 11 contaminants, without and with ramped collisional energy respectively.

From the 48-protein UPS (equimolar), 45 proteins were identified, together with 2 that were not present. The two falsely identified proteins were expected due to the MaxQuant software used (FDR ≈ 1%). The unidentified proteins were low in molecular weight and are probably unidentified to their generic digestion with other proteins in the sample.

2.1.5 XDIA (2010) [DIA]

Extended data-independent acquisition (XDIA) is an extension to the DIA strategy developed by Venable et. al.26,53. The DIA method described by Carvalho et. al. in

2010 allows the extraction of multiplex spectra26. Multiplex spectra occur when

multiple precursor ions are fragmented in the same ion window, compromising the approach made by Venable et. al.26,53. XDIA uses ETD (Chapter 1.2) to fragmentize

the target molecules. ETD is effective in conserving post translational information during fragmentation of larger molecules.

Ion dissociation in XDIA is achieved by ETD followed by CID fragmentation. The acquisition in XDIA consists of two MS scans. The primary MS obtains a high-resolution spectrum, acquired using a linear ion trap – orbitrap (LIT-Orbitrap). The second MS performs a series of consecutive MS scans with 20 m/z ion windows with an overlap of 1 m/z. The resulting data is analysed by the complementary XDIA processor algorithm.

The XDIA processor is used to convert the obtained MS spectra to a format so they can be analysed as a DDA spectrum (by e.g. SEQUEST/COMET as shown in Chapter 2.2.1).

XDIA is shown to achieve higher quantification rates than conventional DDA (approximately 250 percent more), with a lower FDR. The increased quantification rate allows to say that XDIA improves quantitation confidence.

(25)

19

2.1.6 SWATH-MS (2012) [DIA]

Gillet et. al. described the method of SWATH MS57 in 2012. In SWATH MS, the

window from the primary MS is scanned in swaths (Da windows) in a continuous way, so that the complete ion window of interest is scanned.

SWATH MS is a self-described extension of the DIA approach originally described by Venable et. al.53. In this method, the primary MS provides a data-independent scan

sequence in which the ion spectrum of 400 to 1400 m/z is scanned in steps (in this case steps of 10 m/z). SWATH MS extends this by scanning through the primary spectrum (400 to 1200 m/z) by changing the quadrupole-quadrupole m/z selection of the qTOF instrument in user defined swath increments (initially shown with 25-Da increments) called swaths; this provides precursor maps from the peptides (e.g. 425 to 450 Da with 25-Da increments). A scheme of the swath method is shown in Figure 9.

Figure 9. SWATH MS scheme where the black bars shown open windows in 25 m/z swath windows. The

complete spectrum is incremented in steps of 25 m/z until the full cycle is completed and the second cycle starts57.

The SWATH MS spectrum of HeLa cells, pre-treated and separated using SDS-page (similar to Chapter 2.1.4), was analysed by the Andromeda algorithm (Chapter 2.2.2). This resulted in the identification of 101726 peptide features, however, protein identification of these features was not performed.

(26)

20

2.1.7 FT-ARM (2012) [DIA]

Fourier transform all reaction monitoring (FT-ARM) allows the rapid determination of all fragment peptides eluted from the LC and was introduced by Weisbrod et. al. in 201258. The method was developed as a complementary method to data-dependent

shotgun analysis and works by searching empirical or theoretical peptide fragmentation spectra.

An FT-ARM scan involves the fragmentation of all eluted peptides from the chromatogram. The fragmentation spectra obtained are matched against the target spectra to find spectral matches as shown in Chapter 2.2.4.

FT-ARM has been shown to be applicable to a yeast, E. coli and BSA sample, where quantitation was shown on clean BSA and yeast samples. During a spiking experiment with BSA in a yeast sample, 2 BSA peptides were quantitated successfully. This shows the applicability of the method in contaminated matrices.

2.1.8 MSX (2013) [DIA]

Multiplexed data independent acquisition (MSX) was developed by Egertson et. al. in order to increase precursor selectivity59.

During a MSX scan, five windows of 4 m/z are scanned throughout the complete range (500-900 m/z). The 5 separate windows contain multiplex information about the complete 400 m/z range and are de-multiplexed in order to obtain a spectrum that is similar to a complete scan of 100 times 4 m/z. The five isolation windows are randomly selected.

MSX exploits simple regression in order to build the complete de-multiplexed spectrum from several scans. The beauty of MSX, is that the calculation requirements are easily performed and can be used to increase follow-up scans (shown in equation 1).

𝐵 = 𝐴 ∗ 𝑋 (1)

MSX is shown to be applicable in a yeast sample (S. Cerevisiae) allowing a threefold increase in limit of detection when compared to MS1 detection.

(27)

21

2.1.9 pSMART (2014) [HDA]

In 2014 Prakash et. al. described the hybrid data acquisition and processing strategy (pSMART), which is able to combine the high sensitivity of DDA with the selectivity of DIA60. pSMART acquires DIA spectra of the LC elute, with interrupt DDA

acquisitions. The exclusion list for the DDA measurement is determined from the DIA scan to allow increased quantification, as shown in Figure 10.

Figure 10. pSMART sequence, with 5 Da acquisitions for DIA acquired using independent cycles. The narrow

DIA cycles are interrupted by HR/AM MS scan events (at user defined time)60

The pSMART sequence was compared to SWATH MS (as a standard DIA) and HR/AM MS (as a standard DDA) with a human plasma sample (ethylenediaminetetraacetic acid, EDTA, stabilized). Using a spectral library containing DDA spectra, the pSMART sequence shows a lower decoy hit rate, while maintaining high spectral match rate.

As expected, the pSMART method shows a higher sensitivity when compared to DDA, and a higher reproducibility when compared to DIA. Unexpected in the overlap between DDA, DIA and pSMART was that pSMART does not converge towards a linear combination of DDA and DIA spectra.

(28)

22 pSMART is part of newly developed HDA methods, and is not widely accepted and/or known. Due to these reasons, HDA methods like pSMART are not used in analysis of other samples for which they were developed.

2.1.10 PASEF (2015) [DIA]

One of the problems Parallel Accumulation-Serial Fragmentation (PASEF)19

overcomes, is the detection of multiple precursors eluting from the LC column. In the article published by Mann et. al.19 in 2015, TIMS* (Chapter 1.2) are exploited in

combination with QTOF (Chapter 1.2) in order to increase the parallel accumulation speeds to tens of milliseconds.

In standard TIMS-MS/MS, the eluting ions are recorded using a TOF and the spectra collected are used to create a topN list to select the desired measuring window to set the quadrupole for accumulation; In PASEF, however, the mass selection of the quadrupole is changed rapidly in order to target several ions during accumulation, as shown in the Figure 11.

While the PASEF method shows great advantage over standard TIMS-MS/MS, it is important to take into account the expenses required for the required equipment. Both the ion-trap and accumulation must be performed in the sub-millisecond range as the accumulation scan time cannot exceed the regular accumulation time of tens of milliseconds in order to provide a distinct advantage. The TIMS described was an experimental model and requires commercialisation to be readily available.

A major bottleneck described in the by Mann et. al.19 is that the instrument controller

of the quadrupole is much slower than required for the PASEF accumulation and must be considered when measuring using PASEF.

(29)

23

Figure 11. Scheme of PASEF, the ions captured by the TIMS are released and the quadrupole exhilarates the m/z that elutes from the TIMS towards the detector; switching in the millisecond range. As published by Meier et.

al.19

PASEF was demonstrated on HeLa digest. PASEF was able to detect 250,000 peptides, of which 45,000 where fragmented and 30,000 where identified. The unidentified peptides were described to be too low in abundance to be fragmented and/or identified. The total time of acquisition was approximately 90 minutes, making it among one of the fastest methods discussed19.

(30)

24

2.2 Database search algorithms

Database search algorithms are the most straightforward type of matching. During a database search, the measured spectra are matched to a database of spectra (target spectra). The combination of target spectra that can build the measured spectrum can be identified and quantified, as shown in Chapter 1.4. The most important part of database search algorithms, is the scoring of the target spectra (spectra in the database). The scoring shows the relative probability that the spectrum contains the target. In order to increase the sensitivity of analysis, several search strategies can be performed and compared to one another.

The first widely used database search algorithm was SEQUEST61. The SEQUEST

algorithm still stands as the main database search algorithm and will be shown in Chapter 2.2.1. The second most widely used database search algorithm is Mascot62.

Mascot will not be discussed in detail, as the algorithm is well known and not of historical significance.

The Comet and MassWiz algorithms shown in this chapter both compare their results to the open mass spectrometry search algorithm (OMSSA). OMSSA was published in 2004 by Geer et. al.63 and will not be explained in depth. Essentially, OMSSA uses a

Poisson probability distribution in order to score the target spectrum compared to the experimental spectrum. The authors of OMSSA claim that OMSSA is most useful in large data sets, where Mascot is insufficient63.

(31)

25

2.2.1 SEQUEST/Comet (1994/2013)

Developed by Yates et. al. in 1994, the SEQUEST algorithm was designed to correlate target spectra with experimental measurements.

SEQUEST tries to find the linear combination of target spectra to build the measured spectrum by comparing several characteristics of the measured spectrum. The score is increased if the immonium ion of a peptide is measured and decreased if not. The number of matching ions is highly important, as it show similarities between the target and measured spectrum.

As shown in Equation 2, the score (S) of a peptide (p) being present in the measured spectrum depends on the number predicted fragment ions (ni) and the number of

ions that match the target spectrum within a user defined tolerance (im). The β

parameter represents the continuity of an ion series and the ρ parameter is and expectancy given that the immonium ion is measured, which is increased when the immonium ion is measured and decreased if not. The total number of sequence ions is shown as nt.

𝑆𝑝 =

(∑ 𝑖𝑚)𝑛𝑖(1+𝛽)(1+𝜌)

𝑛𝑡 (2)

The SEQUEST algorithm as published by Yates et. al. has seen commercial applications by Thermo Fisher Scientific and Sage-N research39.

The SEQUEST algorithm as written in 1994 for academic use was updated for applications in modern computers by Eng et. al. in 2013 and republished as Comet39.

Comet differs from SEQUEST as it fixed a calculation error where the actual expectation value (E-value) was calculated from the trans log form of the score distribution, rather than the cumulative score distribution.

Eng et. al. claim that Comet’s performance was approximately 10% better than OMSSA39,63 and X!Tandem in low-resolution MS002FMS. Here Comet it’s runtime

was in between OMSSA and X!Tandem. With high-resolution MS/MS, Comet and X!Tandem both outperformed OMSSA by over 10%, however X!Tandem was four times faster in the process.

(32)

26

2.2.2 Andromeda (2011)

In 2011, Cox et. al. described a novel method for protein identification called Andromeda16. The algorithm described uses a probabilistic scoring model which links

the observed protein peaks to known protein profiles.

The score of a peptide sequence is given in Equation 3 as Sp. Sp depends on the

number of theoretical ions (n) and the number of matching ions in the spectrum (k). The q parameter shows the number of statistical significant peaks in 100 Da windows. If the peak density is high, the algorithm essentially assumes that the non-matching ions are insignificant.

𝑆𝑝= −10 𝑙𝑜𝑔10∑ [(𝑛𝑗 ) ( 𝑞 100) 𝑗 (1 − 𝑞 100) 𝑛−𝑗 ] 𝑛 𝑗=𝑘 (3)

Comparison by Eng et. al. of the Andromeda algorithm with the widely used Mascot algorithm shows similar results16, allowing to state that with the tested protein, both

methods perform equally well.

The Andromeda search algorithm is freely available as a stand-alone program and as part of the MaxQuant Enviroment. Cox et. al. state that the stand-alone program performs the algorithm equally to the one implemented in MaxQuant, however, the implemented version also performs FDR in order to determine the false identification rate16.

(33)

27

2.2.3 MassWiz (2011)

The robust database search algorithm MassWiz was proposed by Yadav et. al. in 201142. The MassWiz algorithm compares the sum of intensities of the peaks of the

measured spectrum with the sum of intensities of the peaks of the target spectrum. The score of a peptide sequence using the MassWiz algorithm is shown in Equations 4 and 5. In Equation 4, the intensity in the ith peak is denoted as Ii. The number of

peaks in the experimental spectrum after processing is denoted as n and the number of matched peaks is shown as k.

The score of a peptide sequence is given in equation 4 as Sp. Sp depends on the

primary score for a peptides S(P) is shown in equation 5. S(P) is calculated by iterating through a set of y/b/a ion series where j is the index of the peak (i in equation 4). Xij shows the score for the jth peak with Cij denoting a continuity score. If

Xij is not zero, then Nij and Wij score the peaks compensated for the neutral loss of

nitrogen and water respectively. The difference in mass between the theoretical and experimental peaks is denoted as Δmij. The Qj scores the matched peak for

immonium in the spectrum.

𝑆𝑝 = 𝑆(𝑃) ∗ √∑𝑘𝑖=1𝐼𝑖 ∑𝑛𝑖=1𝐼𝑖 (4) 𝑆(𝑃) = ∑ ∑ [𝑋𝑖𝑗+𝐶𝑖𝑗 𝑒|∆𝑚𝑖𝑗| + 𝑁𝑖𝑗 𝑒|∆𝑚𝑖𝑗| + 𝑊𝑖𝑗 𝑒|∆𝑚𝑖𝑗| ] + ∑ 𝑄𝑗 𝑒|∆𝑚𝑖𝑗| ℎ 𝑗=1 𝑛 𝑗=1 𝑖∈{𝑦,𝑏,𝑎} (5)

Equations 4 and 5 essentially show that if the experimental spectrum and theoretical spectrum are different due to easily lost groups of water and nitrogen atoms, then the score is still denoted high. This is different from the Andromeda and SEQUEST algorithm, where these losses account just as harshly as loss of carbon groups. Yadav et. al. show that multiple cases, the MassWiz algorithm is shown to outperform SEQUEST (2.2.1), OMSSA63 and X!Tandem (2.2.5)42. Mascot62 was shown to

(34)

28

2.2.4 FT-ARM (2012)

The FT-ARM algorithm is developed for the acquisition with the same name (Chapter 2.1.7). FT-ARM’s searching algorithm uses a dot-product comparison in order to find matches. The scoring is performed as shown in Figure 12 and equation 6. The dot-product score (Sp) of the experimental spectrum vector R and the hypothetical

spectrum vector T is summed for all peaks (n) in the experimental spectrum.

𝑆𝑝 = ∑𝑛𝑖=1𝑅𝑖𝑇𝑖 (6)

Figure 12. Illustration of the FT-ARM strategy with A, all ions are fragmented to produce a total ion

chromatogram. B, a complex fragmentation spectrum is produced. C, hypothetical spectra and D, dot product analysis58

Applications of the FT-ARM algorithm can be found in Chapter 2.1.7. The proposed advantage of FT-ARM (acquisition and method) over DDA-Mascot (acquisition-analysis) is questionable. Upon comparison, DDA-Mascot showed higher identification and more stable quantification at equal FDR as FT-ARM58. FT-ARM

was able to identify different peptides than DDA-Mascot and is therefore suggested as a complementary method.

(35)

29

2.2.5 X!Tandem (2004 renewed in 2008)

The X!Tandem was originally introduced in 2004 by Craig and Beavis and was updated in 200864.

X!Tandem scores the target spectra using equation 7. In equation 7, the expectancy score of a peptide (Sp) is calculated given the number of mass spectra generated (s).

For a protein sequence that is inferred, the number of unique peptide sequences is denoted as n, with an expectation value ej. The peptide score sequence is denoted

as N and β is the peptide score sequence divided b the number of peptides in the proteome considered. 𝑆𝑝= (𝛽𝑛(1−𝛽)𝑠−𝑛 𝑠𝑁𝑛−1 ) ∗ (∏ 𝑒𝑗 𝑛 𝑗=1 ) ∗ (∏ (𝑠−𝑖) (𝑛−𝑖) 𝑛−1 𝑖=0 ) (7)

The scoring equation as shown in equation 7 is biased to include as many unique peptides as possible, as only the number of unique peptides increase the scoring. Due to the expectancy score of certain peptide sequences, the algorithm is expected to perform well on generic peptide sequences, even with a low number of spectra. In 2008, X!! Tandem appeared, which essentially performs the same as X! Tandem, with the added benefit of being useable in parallel processing65.

(36)

30

2.3 De novo search algorithms

The de novo search algorithms are entirely different from database search algorithms, and cannot be simplified to a more generic work flow than shown in Chapter 1.4. The de novo search algorithms are highly complex and will therefore not be explained in great detail. Most of these algorithms were validated against the popular de novo algorithm PepNovo66, an algorithm out of the scope of this review.

2.3.1 MSNovo (2007)

The de novo search algorithm MSNovo was developed by Mo et. al in 200744. As

described in Chapter 1.4, the de novo algorithms generates sequences. In MSNovo, the sequences are generated given prior knowledge of the spectra. Here the probability of finding a certain peak at a position other than expected, is trained using a known data base of spectra. The probability of finding certain sequences is trained in a similar way.

MSNovo was shown to achieve more accurate and precise sequences when compared to PepNovo on several different spectra.

2.3.2 ADEPT (2010)

ADEPT is a search algorithm that relies on two tandem MS/MS spectra to determine the peptide sequence, developed by He et. al. in 201067. ADEPT uses PEAKS46 to

determine the initial sequences, and scores the resulting sequence spectra given a lanrange loss function.

Using the scoring function as described by He et. al., the ADEPT method scores more accurately than PepNovo.

(37)

31

2.3.3 Antilope (2012)

Antilope is a de novo search algorithm developed by Andreotti et. al. in 201245.

Antilope works by exploiting a spectrum graph sequence. Antilope tries to find, not all, but just the biggest fragment of the peptides. Given the bigger fragments, the smaller fragments can be filled using spectral matching.

Antilope was shown to score not as good as PepNovo.

2.3.4 pNovo+ (2013)

The pNovo+ search algorithm developed by Chi et. al. in 2013 is capable of deducing topmost sequence candidates from HCD + ETD spectra68. pNovo tries to find pairs of

bigger fragments, for example, if a peptide has a mass of 1350 Da and a single charges species is found at 800 m/z, the b fragment must be found at 550 m/z. pNovo+ was compared to PEAKS46 and was shown to identify more peptides

sequences.

2.3.5 UniNovo (2013)

UniNovo is a de novo search algorithm developed by Jeong et. al. for universal peptide sequencing69. UniNovo uses a unique combination of Bayesian interference

and the generation of a spectrum graph for the de novo peptide reconstruction. UniNovo was shown to gain a higher precision and recall than PEAKS46.

(38)

32

3. Discussion

The following chapter will discuss and compare acquisition methods (Chapter 3.1) and analysis methods (Chapter 3.2).

Comparison of the different acquisition methods will be performed by looking at the applicability, availability and ease of the methods in question. Due to a lack of equal samples, comparison based on identification and quantification cannot be performed. A flow chart for finding the most suitable method is found in supplementary material 4.

The analysis methods will be compared in terms of applicability, availability and false discovery rate of the method.

In 2007, Balgley et. al. reported the comparison of a multitude of available tandem mass spectrometry peptide identification algorithms by targeted-decoy search, all showing FDR of approximately 5-10%17.

While the targeted-decoy search is open for critics, due to the use of several assumptions made in the built up of the decoy database, it is safe to say that for many cases, the assumptions hold41, such that the target-decoy search can be used

as an indicator for the FDR and as a good comparator between algorithms. These assumptions include that it is hypothesised that the decoy spectra are not present in the experimental spectrum.

(39)

33

3.1 Comparison of acquisition methods

DT DDA (Chapter 2.1.1) is useful as a complementary method to methods that use a CID or ETD in their fragmentation. DT DDA is easy to implement to mass analysers that allow switching between CID and ETD fragmentation.

AMEx (Chapter 2.1.2) is easy to implement compared to the other methods discussed. The use of the dynamic topN list, allows it to be used on most modern measurement devices. There seems to be no reason why AMEx is not useable in the case of identification. The usage of AMEx for quantification seems doubtful, as the result is weighted, inducing a greater bias. Given this, quantification using AMEx still seems more applicable than quantification by FT-ARM (Chapter 2.1.7). FT-ARM and AMEx were both compared to classical DDA, where AMEx was able to identify more peptides than FT-ARM.

As mentioned in Chapter 2.1.7, FT-ARM is to be used as a complementary method to DDA. What is remarkable, is that the acquisition sequence of FT-ARM and AIF (Chapter 2.1.4) are similar. AIF was able to identify all BSA peptides, including contaminations. This observation suggests that software in the FT-ARM used is the limiting factor. FT-ARM can be implemented on most LC-MS/MS devices, however, AIF requires an HCD-Orbitrap mass analyser.

HeLa peptide quantification of the AIF method compared to SWATH MS (Chapter 2.1.6) shows that AIF identified less than one percent that of SWATH MS. While SWATH MS was able to identify over one hundred thousand peptides, it was unable to resolve the proteins from the fragments. SWATH MS requires QqTOF device in order to be implemented.

The PAcIFIC (Chapter 2.1.3) and MSX (Chapter 2.1.8) methods both break the spectra into smaller parts to be analysed in multiple runs. These two methods do not have comparable tested samples and therefore cannot be directly compared. The MSX method was released several years after PAcIFIC. The non-linear acquisition proposed in MSX can be applied to PAcIFIC acquisition in order to decrease acquisition time. This increase in speed is achieved as there is no need for measuring the complete spectrum, and overlapping segments can increase

(40)

34 sensitivity for certain regions. Both methods were demonstrated on Orbitrap instruments, however application on other mass analysers like qTOF is conceivable. XDIA (Chapter 2.1.5) used an unidentified yeast lysate to present their method. The resulting number of peptides is similar to the yeast lysate used for the PAcIFIC demonstration, however this is non-conclusive as the strains may differ. The XDIA processor is able to convert its DIA spectra to spectra that represent DDA spectra, making it a good complement to standard DDA. The XDIA method is equally easy to implement as the PAcIFIC and MSX method, requiring a LIT-Orbitrap only.

The most advanced methods discussed are PASEF (Chapter 2.1.10) and pSMART (Chapter 2.1.9). Both acquire high quality spectra for quantification and identification. The pSMART acquisition is easy to implement on a hybrid LIT-Orbitrap mass analyser. Identification by pSMART is similar to standard DIA, however, the confidence of the quantified peptides is higher. Due to the higher confidence of quantification, the identification of measured peptides was approximately 50 percent better when compared to standard DIA60.

PASEF allows the rapid separation of complex protein mixtures, however, a trapped IMS device is required and software is not readily available. The acquisition speed gain (several milliseconds) allows PASEF to be used in both quantification and identification. PASEF was able to detect two hundred and fifty thousand peptides in a HeLa sample, which is two and a half times more than SWATH MS. Contrasting SWATH MS, PASEF was able to recover thirty thousand proteins in the HeLa sample.

(41)

35

3.2 Comparison of data analysis methods

As shown in Chapter 2.2, the analysis of the acquired data can be done in a database or de novo way. Database dependent algorithms are generally faster than de novo algorithm and are the better choice during targeted analysis (target peptides are in the database). When dealing with unknown peptides, a de novo algorithm might be more suitable.

For the database search algorithms, the Comet algorithm (Chapter 2.2.1) was shown to outperform X!Tandem (Chapter 2.2.5), as did MassWiz (Chapter 2.2.3), and indirectly, Andromeda (Chapter 2.2.2). The Comet algorithm seems more suitable for DDA problems, while the dynamics of Andromeda allow it to be more suitable for DIA problems. MassWiz was able to outperform SEQUEST (indirectly Comet) and performed less than MASCOT. MASCOT on the other side performed equally well as Andromeda. Thus MassWiz might be able to replace Comet, however, this must be shown in future research.

FT-ARM (Chapter 2.2.4) was suggested to be the limiting factor to the FT-ARM acquisition (Chapter 2.1.7) in the preceding chapter. The FT-ARM analysis is there for recommended as a complementary analysis together with a database search in the MaxQuant environment28.

The de novo search algorithms were compared to either PEAKS or PepNovo. Antilope (Chapter 2.3.3) was shown to work less good than PepNovo, while MSNovo (Chapter 2.3.1) and ADEPT (Chapter 2.3.2.) were shown to work better. Due to this, Antilope is never the best choice. ADEPT has the advantage over MSNovo, that it does not make use of a new search algorithm all together. Thus ADEPT is the safe choice.

pNovo+ (Chapter 2.3.4) and UniNovo (Chapter 2.3.5) both have their advantages. UniNovo is able to identify the peptide sequences with higher precision and recall than PEAKS, while pNovo+ was able to identify more sequences of the peptides all together.

(42)

36

4. Conclusions

To recall the original question posed in Chapter 1.2, this review was intended to allow to correctly select the best acquisition and analysis technique.

As the sample types in proteomic research differ in complexity, several search strategies were compared in Chapter 3. This allows the follow conclusion to be taken. For the analysis of samples of minor complexity (e.g. yeast extract), the XDIA, MSX and PAcIFIC acquisitions suffice. During the analysis of bigger peptides towards the 1200 Da with equal abundance, the XDIA method will probably result in the best quality spectra. The XDIA processor allows the DIA spectra to be converted towards DDA format, thus analysis by Comet seems the most logical choice.

Peptide samples with ranging abundance clustered in the spectrum can best be analyzed by MSX. The MSX acquisition resolves the complete spectrum given several measurements in the ranges where peptides can be measured. If, however, the eluting peptides are not clustered in m/z, than PAcIFIC will produce higher quality spectra at the cost of longer instrument time. Both MSX and PAcIFIC spectra can best be resolved using the Andromeda algorithm. PAcIFIC also allows untargeted identification, upon which the ADEPT algorithm can be used as a safe choice.

For peptide samples of higher complexity (e.g. Whole cell extracts), the SWATH MS and PASEF method are best used. The SWATH MS algorithm is performing at lower quantification and qualification rates than PASEF, however, SWATH MS is currently more readily available. The Andromeda algorithm for targeted analysis is best used in both SWATH MS and PASEF.

If qualification and quantitation are both required, than the pSMART acquisition is the best option for human plasma or equally complex samples. The PASEF acquisition is the overall best choice when it comes to qualification and quantitation, as it was the only acquisition method able to recover approximately thirty thousand peptides. The general conclusion that can be drawn, is that DDA is fully developed and DIA is still in its final developing phase. The advantages that DIA can give over classical DDA approaches on modern measurement devices is both visible in identification and quantification. Because of this, a rise in applications of DIA methods is sure to arise in the coming years. HDA methods that combine the advantages of DDA and

(43)

37 DIA seem to be ideal; however, this is still in its early stage of development and is not expected to become a major method until DIA approaches are generally used.

5. Future proposals

LC-MS/MS in proteomics is still dominated by DDA, due to the distinct advantaged and conservatism. An exchange from the dominant DDA towards DIA is not expected, however, HDA shows potential into becoming a new standard. HDA allows the combination of the conservative DDA methods with already developed DIA methods, and combines the advantages of both.

The easiest method combination to be expected is DT-DDA (Chapter 2.1.1) with XDIA (Chapter 2.1.5). DT-DDA was originally developed for DDA methods, however, can be used with XDIA, given that the instrument allows it. XDIA was developed to increase identification rates compared to DIA, a proposal to perform DT-XDIA would further increase identification, this quantitation confidence.

Acknowledgements

Financial support comes from the European Research Council. Guidance and support from the University of Amsterdam is highly acknowledged, in particular dr. I. Dapic and dr. G. L. Corthals.

Referenties

GERELATEERDE DOCUMENTEN

Daarnaast wordt er de laatste jaren meer melk verzuiveld zodat het absolute prijsverschil tussen de prijs op gangbare en biologische bedrijven niet circa 6 euro bedraagt

scholgebieden, zoals op de Dogger, kunnen vissen. Zolang we de vangstgegevens van deze schippers kunnen meenemen, hebben we informatie over de ontwikkelingen in het scholbestand

De afgelopen tijd zijn verschillende rapporten verschenen die wijzen op een mogelijk ne- gatieve relatie tussen landbouwbeleid en volksgezondheid, bijvoorbeeld door het subsidië-

Het aardewerkensem- ble uit de keldervulling dat in het midden of de tweede helft van de 14de eeuw te dateren is, maakt dat de steenbouw wel- licht niet voor de 15de eeuw kan

Als uit het ECG blijkt dat uw hartritme inmiddels weer regelmatig is gaat de behandeling niet door.. U gaat dan weer

A heat map presenting the gene expression data, with a dendrogram to its side indicating the relationship between genes (or experimental conditions) is the standard way to visualize

A total of 10 dark frames, for each exposure time used in the science images, were taken during each night and average combined using the IRAF DARKCOMBINE task located

Supplementary Materials: The following are available online at http://www.mdpi.com/2218-1989/10/12/514/s1 , Figure S1: Variable window calculator results, Figure S2: The