Data Processing Methods for 2D Chromatography

(1)

Literature Review

Data Processing Methods for

2D Chromatography

Muzi Li

Supervisor: dr.

Gabriel Vivo-Truyols

University of Amsterdam 1 2 3 4 5 6 7 8 9 10 11 12 13 14

(2)

MSc Chemistry

Analytical Sciences

Literature Review

Data Processing Methods for

2D chromatography

Muzi Li

(1022 6761)

Supervisor: dr. Gabriel Vivo-Truyols

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

(3)

ABSTRACT

Two dimensional liquid and gas chromatography have become increasingly popular in the application of many fields such as metabolites, petroleum and food analysis due to its substantial resolving power of separating complex samples. However, the additional dimension together with a variety of detection instruments render the complexity of data sets by increasing the order of the data generated by instruments, though benefiting more useful information by, for instance, second-order advantages. The high order of data requires data processing methods so to transform chromatograms and spectrums into useful information in several steps. Data processing methods is consisting of two steps: data pre-processing and real data processing. Pre-processing, including baseline correction, peak detection, smoothing and derivatives, alignment and normalization, is aiming for reducing the unrelated variations in chemical variations caused by interferences such as noise. This step is important since this prepares the raw data sets for real data processing such as classification, identification and quantification. If data pre-processing fails, the data may be obscured by the unrelated variations, resulting in the failure of real data processing. In the real data processing steps, methods such as PCA, GRAM, PARAFAC were used for classification and quantification of the sample compounds. This literature present, however not in details, the most popular data processing methods used for two dimensional chromatography and their application was mentioned as well. 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

(4)

CONTENT

INTRODUCTION...5

Order of Instrument...8

Data pre-processing...10

Baseline correction...10

Smoothing and Derivatives...15

Peak detection...16

Alignment...17

Normalization...19

Data processing...20

Supervised and unsupervised learning...20

Unsupervised...21

Supervised...27

Conclusion & Future work...34

Reference...35 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74

(5)

INTRODUCTION

Two-dimensional (2D) chromatography (2D liquid chromatography or 2D gas chromatography) refers to a procedure where parts or all of the sample components to be separated are subjected to two separation steps by different separation mechanisms. In planar chromatography, 2D chromatography refers to the procedures where components are supposed to first migrate in one direction and subsequently in a direction at right angles to the first one by two different eluents. [1] Compared to 1D chromatographic performance, 2D chromatography possesses substantial resolving power providing high separation efficiency and selectivity. In 1D chromatography, resolution (RS) is often used to quantitatively measure the degree of separation between two components A and B. RS is defined as the time difference of two adjacent peaks divided by the sum of the peak width of both. [2] However, it is difficult to acquire acceptable resolution of all peaks for complex samples consisting of numerous components. Therefore, peak capacity (PC) is introduced to measure the overall separation, particularly for complex samples analyzed in gradient elution in 2D chromatography. [3] The peak capacity in separation is defined as the total number of peaks that can be fit into a chromatographic window, when every peak is separated from adjacent peaks with RS=1. [3] Since the fractions from the first separation are further resolved in the second orthogonal separation, the peak capacity for 2D separation is equal to the product of peak capacities of each separation. [3] For instance, if the peak capacity in 1D in isocratic is PC=100, then the total peak capacity of 2D would be PC= 100 ×100=10,000 . [3] Due to the high resolving power [5-14], the use of 2D chromatographic separation has raised substantially in bio-chemical field, [15-18] a nice review about multidimensional LC in the field of proteomics [19] and a review on the application of 2D chromatography in food analysis [20] have been published. After instrumental analysis, data sets are produced and are in need of interpretation in terms of identification, classification and quantification. The process of data interpretation/transformation into useful information, such as inferring a property of interest (typically involving search of bio-markers) or classifying a sample into one of several categories, is termed chemometrics. An example of sample classification is given in Figure 1.

75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103

(6)

Figure 1. Classification of chromatograms is based on the relative abundance of all the peaks in the mixture. [21]

Although enormous information can be extracted from 2D chromatography, the complexity of the raw data generated by 2D instruments makes the work of data interpretation time-consuming; often, there is chance of overlooking on the data. In order to extract the most useful information out of the raw data, a well-defined procedure of data processing needs to perform. Daszykowski et al. [22] summarized the results as listed in Table 1 by searching paper titles and keywords containing chemometrics and chromatography. The results exhibited promising scope for solving chromatographic problems. There are multiple chemometrical methods developed, applied and improved for each aspect of problem in chromatography in the past decades bearing both advantages and disadvantages, and the extensive application of chemometrics in chromatography has spread from drug identification [23] in pharmaceutical and beer/wine quality control/classification [24-27], proving economic fraud [28] in food field to identification of microbial species by evaluation of cell wall material [29-30] and predicting disease state [31-33] in clinically medical, as well as oil exploration (oil-oil correlation, oil-source rock correlation) [34] in petroleum.

104 105 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122

(7)

(8)

Table 1. Results of keyword search in

SCOPUS system, using a quick

search (“keyword(s)” and

chromatography). [22]

8 | P a g e

Keyword(s) Score

1 Multivariate curve resolution 44

2 Alternating least squares 34

3 Mcr-als 18 4 Chemometrics 403 5 Experimental design 605 6 Multivariate analysis 275 7 Pattern recognition 280 8 Classification 1029 9 PCA 556 10 Qspr 51 11 Qsar 111 12 Topological indices 38 13 Topological descriptors 11 14 Modeling retention 5 15 Fingerprints 802 16 Clustering 219 17 Peak shifts 16 18 Deconvolution 244 19 Background correction 21 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176

(9)

Order of Instrument

Before introducing data processing methods in 2D chromatography, data-type-based classification shall be defined beforehand so to better understand the methods to be applied. The classification of analytical instruments or methods is summarized for the simplification of data processing based on the type of data generated, using the existed mathematical terminology as following [35] : a zero-order instrument is one which generates only a single datum per sample since a single number is a zero-order tensor. Examples of zero-order instruments are ion-selective electrodes and single-filter photometers. A first-order instrument, including all types of spectrometers, chromatographs and even arrays of zero-order sensors, is one which generates multiple measurements at one time or for one sample, wherein the measurements can be put into an ordered array as a vector of data (also termed as a first-order tensor). Likewise, a second-order instrument generates a matrix of data per sample, mostly used in but not restricted to hyphenation such as gas chromatography – mass spectrometer (GC-MS), LC-MS/MS, GC-GC. Higher order of data can be generated by even more complex instruments and there is no limit to the maximum order of data. [35] In Table 2, the concepts of order of data were depicted.

Data

order Array Calibration

One sample A sample set

Zero Scalar One-way Univariate

First Vector Two-way Multivariate

Second Matrix Three-way Multi-way

Third Three-way Four-way Multi-way

Fourth Four-way Five-way Multi-way

177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193

(10)

Table 2. Different arrays that can be obtained for a single sample and for a set of samples. [36] Calibratio n Required selectivit y Maximu m analytes Miminum standards (with offset) Interferences Signal averagi ng Statistics Somethin g extra Zero order Full 1 1 (170) Cannot detect; analysis biased None Simple, well defined

−

¿

First order Net analyte signal No. of sensors 1 per species (1+1 per species present) Can detect; analysis biased

√

J

Complex defined

−

¿

Second order Net analyte rank

min (I, J) 1 (169) Can detect; analysis accurate

√

I ∙ J

Complex, not fully investigate d First-order profiles

Table 3. Advantages and disadvantages of different calibration paradigms. [35]

For 2D chromatography analysis, the data would be at least first-order (e.g. LC

×

LC); however, single wavelength detector or hyphenated detector (LC × LC-MS) is always necessary which complicating the data by increasing the data order to such as second-order or third-order providing detailed and precise information with second-order advantages (Table 3). The primary second-order advantage is the higher selectivity even with unknown interferences. [35, 37] Different methods were categorized and elucidated to give an overview of the current data processing methods for 2D chromatography. 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211

(11)

Data pre-processing

In 2D data processing, pre-processing must be applied to the raw data before quantitative and/or qualitative data analysis due to the obscuration of irrelevant chromatographic variations caused by interferences such as noise (high frequency) and background (low frequency). In contrast, the intermediate frequency is the signal of component. Pre-processing is crucial because it helps reduce the unrelated variations in chemometric analysis of chemical variations and it has become the critical point potentially determining the success and failure in many applications. [38-40] Particularly in metabolomics analysis, the methods of preprocessing have become the primary ones which are difficult and influential in the final results. [41] In most cases, the variables in a data set adjacently located to each other are related and contain similar information; methods for filtering the noise and background correction utilize this relationship for interference removal.

Baseline correction

The interference in baseline consists of background (low frequency) and noise (high frequency). General baseline correction procedures are designed to reduce the low frequency noise while some procedures and smoothing techniques are particularly for high frequency variations so to improve the signal-to-noise (S/N) ratios. [38] Note that the definition of background tends to be used in a more general sense to designate any unwanted signal including noise and chemical components, while baseline is associated with a smooth line reflecting a “physical” interference. [42] In Figure 2, difference between background, noise and signal of component is well elucidated.

212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230

(12)

Figure 2. Components of analytical signal: (a) overall signal; (b) relevant signal; (c) background; and, (d) noise. [22]

Baseline correction is typically the first step in preprocessing due to the baseline shifting and interference contributed by solvents and impurities etc. which assign imprecise signals to each component at a fixed concentration. After baseline correction, the baseline noise signal is supposed to be numerically centered at zero. [38] The simplest way of baseline correction is to run a “blank” analysis and subtract the chromatogram from the sample one. However, several blank runs are in need of performance in this case to obtain a confidence level on the blank chromatogram since variations might occur from run to run analysis. A second way of baseline correction is to use polynomial least squares fitting simulating a blank chromatogram and then subtract it from the sample one. This method is effective to some extent but is in need of user intervention and prone to variability particularly in low S/N environments. [43] An alternative is to use penalized least squares algorithm with some adaption. The penalized least squares algorithm was first published by Whittaker [44] in 1922 as a flexible smoothing method (noise reduction). Silverman et al. [45-46] later developed another method for smoothing named roughness penalty method. The penalized least squares algorithm can be regarded as roughness penalty smooth by least squares, which is balanced between the fidelity to the original data and the roughness of the fitted data. [43] The asymmetric least squares (ALS) approach was widely applied by Eilers et al. for smoothing [47], for background correction [48] in hyphenated chromatography and for finding new features in large spectral data sets [49]. In order to apply penalized least squares algorithm in baseline correction, both Cobas et al. [50] and Zhang et al. [51] introduced a weight vector based on the originals. However, peak detection should be performed before baseline correction in both cases while the baseline of existence negatively affects the peak detection. Method proposed by Cobas et al. [50] did not apply for complex baseline, while the method from Zhang et al. [51] though having improvements over Cobas’s with better accuracy, is time-consuming, particularly in two-dimensional datasets.

Eilers et al. [52] proposed an alternative algorithm named asymmetric weighted least squares based on original asymmetric least squares (AsLS) [53]. The new one is a combination of a Whittaker smoother with asymmetric weighing of deviations from the (smooth) trend to get an effective baseline estimator. The advantages of this methods are [52]: 1) the baseline position can be adjusted by varying two parameters (p for asymmetry and λ for smoothness.) while the flexibility of the baseline can be tuned with p; 2) it is fast and effective while keeping the analytical peak signal intact; 3) no prior information about peak shape or baseline (polynomial) is needed. With the application of this method, GC chromatogram, MS, Raman and FTIR spectrums were well baseline-corrected. However, it is difficult to find the optimal value of λ and not able to provide a fully automatic procedure to set the optimal values for the parameters. And in this case the user judgment and experience are in need. To overcome these problems, a novel algorithm termed adaptive iteratively reweighted Penalized R2_{: coefficient of determination,}

indicating how well data points fit a line or curve. Normally ranges from 0 to 1; the higher the value, the better the model.

Q2_{: second quartile}

RMSECV: root-mean-square

error of cross-validation, a measure of model’s ability to predict new samples; the smaller the value, the better the model. 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268

(13)

Least Squares (airPLS) was proposed by Zhang et al. [43]. According to Zhang et al. [43] the adapted method is similar to the weighted least squares and iteratively reweighed least squares but uses different ways to calculate the weights and adds a penalty item to control the smoothness of the fitted baseline. The airPLS algorithm has been proved to be effective in baseline correction while reserving primary useful information for classification and the R 2_{, Q}_{and RMSECV}2 _{of regression models} pretreated by airPLS were evidently better than those by Cobas et al. [50] and Eilers et al. [52] and uncorrected, especially when the number of principal components is small in principle component analysis (PCA). [43] Recently, a new method was developed by Reichenbach et al. [54] for two-dimensional GC (GC

×

GC) and was incorporated into GC Image software system. The algorithm is based on statistical model of the background values by tracking the adjacency around the smallest values as a function of time and of noise. The estimated background level is then subtracted from the entire image, producing a chromatogram in which the peaks rise above a near-zero mean background. The background removal algorithm is effectively removes the background level of GC

×

GC, but it does not remove some artifacts observed in these images. Soon, this approach was adapted for LC

×

LC in two important aspects. Since the background signal in gradient in LC can either be decreasing or increasing as a slope due to the change in solvent constitution, in that case, the baseline correction should track the “middle” value instead of the smallest. [55] Another problem is that the variance of background in the second dimension (2_{D) is significant, so that the correction algorithm} model should fit in both dimensions. [55] An example of LC chromatogram before and after baseline correction was given in Figure 3.

Some other algorithms such as weighted least squares (WLS) was also proposed and applied to baseline correction mainly for spectra. A commonly used method for 2D chromatography is the linear least squares algorithm which can be applied in multi-dimensions while other common algorithms such as polynomial least squares is limited to one-dimension. De Rooi et al. [56] developed a two dimensional baseline method based on penalized regression originally for spectroscopy but claimed to be applicable to two dimensional chromatography data as well. Filgueira et al. [57] developed a new method called orthogonal background correction (OBGC) for baseline correction, particularly useful in correcting complex DAD background signals in fast online LC × LC. This method was developed based on the currently existing two baseline correction methods (moving-median filter and polynomial fitting) for one-dimensional liquid chromatography, and was extended by combining with either of which to correct the two-dimensional background in LC

×

LC. Comparisons between the newly developed method with the two basic methods and dummy blank subtraction were performed on the second dimension (2_{D) chromatogram and the results were illustrated in Figure 4.}

269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302

(14)

Figure 3. (a) Background values before (solid line) and after (dashed line) correction along a single row in the first dimension. A row with no analyte peaks was selected so that the values reflect only the baseline and noise. After correction, the values fluctuate in a small range centered very close to zero. (b) Background values before (solid line) and after (dashed line) correction along a single column in the second dimension. This secondary chromatogram with analyte no peaks was selected so that the values reflect only the baseline and noise. After correction, the values in the region of analysis are very close to zero. [55]

Figure 4. Comparison of estimated baselines using the different methods on a typical single 2_{D chromatogram.}

The chromatograms are intentionally offset by 7 mAU to help visualization. (a) Conventional baseline correction methods: the blue solid line chromatogram is the real single 2_{D chromatogram; the black dashed line}

is the estimated baseline using the moving-median filter, and the red dot-dashed line is the estimated baseline using the polynomial fitting method. (b) The two methods are applied in combination with the OBGC method; the line format is same as in a. [57]

Furthermore, compared to dummy blank subtraction, the reproducibility of the peak height of measured peaks was significantly enhanced after the application of OBGC. The new robust method for baseline correction has been proved to be effective for LC × LC and is considered to be applicable for any 2D technique where the first dimension (1_{D) has lower frequency baseline} 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329

(15)

fluctuations than the 2_{D. However, they did not explain clearly the principle of the newly developed} method in the article but only mentioned the development was based on the existing method.

No. Name of method Note Usage [Ref]

1 Dummy blank

subtraction

√ Simple Baseline

correction

× manual, time-consuming, more errors

2 Polynomial least

squares fitting

√ Effective Baseline

correction

× user intervention, not suitable for low S/N

3 Penalized least

squares

√ fidelity to the original data and the

roughness of the fitted data

Smoothing [44]

× Need peak detection

4 Roughness penalty method Smoothing [45-46] 5 Asymmetric least squares (AsLS) √ Effective Smoothing [47], Background correction [48]

× Need to optimize two parameters,

constant weights for entire region

6 Weighted

vector-AsLS-1

√ no need for peak detection Baseline

correction [50]

× not for complex baseline

7 Weighted

vector-AsLS-2

√ no need for peak detection, better

accuracy Baseline correction [51] × time-consuming 8 Asymmetric weighted least squares

√ Easy to perform by tuning two

parameters, fast and effective, no prior information needed

Baseline correction [52]

× Difficult to find the optimal value for one

parameter, in need of user judgment and experience

9 adaptive iteratively

reweighted Penalized Least Squares (airPLS)

√ Effective while reserving the primary

information, particularly for small number of principle component in classification; extremely fast for large datasets

10 LC/GC Image

incorporated

√ Powerful, accurate, quick Baseline

correction [55]

11 Orthogonal

background correction (OBGC)

√ Effective in 2_{D, highly reproducible of}

peak height

Table 4. Summary of methods for baseline correction and smoothing.

Smoothing and Derivatives

Smoothing is a low-pass filter for the removal of high-frequency noise from samples and sometimes termed as noise reduction. As mentioned above, some algorithms can also be applied to smoothing. 330 331 332 333 334 335 336 337

(16)

be performed by linear Kalman filter, which is mostly used as an alternative of linear least squares for the estimation of the concentration of the mixture components, [59] often used in 1D chromatography. The most classic smoothing method is Savitzky-Golay smoother (SGS) [60] which fits a low order polynomial to each data pixel and its neighbors and then replaces the signal of that with the value provided by the polynomial fit. [38] However, in practice, the missing values and the boundary of the data domain in computation makes it complicated when using SGS. Instead, Whittaker smoother based on penalized least squares has several advantages over SGS. It was said to be extremely fast with automatic boundaries adaption and even allows fast leave-one-out cross validation. [61]

It is worth noting that digital filters are of good treatment on signal processing in terms of undesired frequencies elimination without distorting the frequency region containing crucial information. [22] The digital filters can be performed in either the time domain or frequency domain. the windowed Fourier transform (FT) is often used to analyze the signal in both time and frequency domains, in a way of studying the signal segment by segment. [22] However, due to the Heisenberg uncertainty principle that precision in both time and frequency cannot be achieved simultaneously, there is a severe disadvantage in FT. The narrower the window, the better localized the signal of peaks at the cost of less precision in frequency; vice versa while with a broader window. [22, 62] To obtain precision in both time and frequency domains Wavelet transform (WT) is preferable, particularly for non-stationary types of signal. WT takes advantage of the intermediate cases of uncertainty principle so to capture the precision in both time and frequency domains with only a little sacrifice of precision for both. [22, 63]

In contrast to smoothing, derivatives are high-pass filter and frequency-dependent scaling. Derivatives are a common method to remove unimportant baseline signals from samples by taking the derivative of the measured responses with respect to the variable number or other relevant axis scale such as wavelength. It is often used either when lower-frequency features (baseline) are interferences or higher-frequency features contain the signal of interest. [58] The use of this method is under the condition that the variables are strongly related to each other and adjacent variables contain similar correlated signal. [58] Savitzky-Golay algorithm is often used to simultaneously smooth the data while taking the derivative so to improve the utility of derivatized data. [19] Vivo -Truyols et al. [64] developed a method to select the optimal window size for the use of Savitzky-Golay algorithm in smoothing, which successfully applied to NMR, chromatography and mass spectrometry data and shown to be robust.

Non-stationary: features change with time or in space.

341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373

(17)

Peak detection

Peak detection is also an key step in data pre-processing which distinguish the important information, sometimes the most important information, from the noise particularly in search of bio-marker. Peak detection methods are almost fully developed for 1D chromatography with single channel detection, and they are based on detecting signal change in the detectors and applying the condition of unimodality. [65] Peak detection methods are mainly consisting of two families [66]: those make use of matched filters and those make use of derivatives. Only few peak detection methods for two dimensional chromatography (performed in time compared to 2D-PAGE in space) have been reported in literature [67-68] and only two main families of methods are available [65]: those based on the extension of 1D peak detection algorithms [69-70] and those based on the watershed algorithm [71].

In general, the former ones follow a two-step procedure [65]: it first detects peaks in a one-dimensional form using the raw signal from the detector, and this step has an advantage of avoiding the discontinuity of sub-peak that drain algorithm has; in the second step, a collection of criteria is then applied to decide the merging of peaks into a single two-dimensional peak from the one-dimensional ones. In spite of the slight difference in those criteria, they are all based on the peak profile similarities (i.e. peaks detected in the first dimension eluting at the same time in the second dimension).

Reichenbach et al. [71-72] adapted watershed algorithm to peak detection in 2D GC chromatography and the new method is termed drain algorithm. The drain algorithm, which has been applied in 2D LC and 2D GC [71, 73], is an inversion of watershed algorithm. [65, 71] When applying to two dimensional chromatography, the chromatographic peak ( a mountain) appears like a negative peak ( a basin) so that the algorithm works by detecting peaks from the top to down then to the surrounding valleys with minimum thresholds defined by users into two dimensions. [65, 73] Noise artifacts result in over-segmentation detection of multiple regions which should be segmented as a single region when using this algorithm, however, this can be solved by smoothing. [71] However, the fatal drawback of drain algorithm is the discontinuity of sub-peak resulting in the “appear” and “disappear” of a peak several times during the course of elution; moreover, peak splitting occurs due to the intolerance to the retention time variation in the second dimension. [65] Since the variation in second dimension is unavoidable, Peters et al. [69] proposed a method for peak detection in two dimensional chromatography using the algorithm (termed C-algorithm) developed by Vivó-Truyols et al. [74] which was originally designed for one dimension. They extended the use into two dimensional GC 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406

(18)

modification. Vivó-Truyols et al. [65] built up a model suitable for both LC

×

LC and GC

×

GC and compared the C-algorithm and watershed algorithm. In their studies, watershed algorithm has 20% probability of failure in normal situations in GC

×

GC using C-algorithm as an reference.

Alignment

Alignment of retention time is also very important in preprocessing since retention time variations can be affected by pressure, temperature and flow rate fluctuation as well as column bleeding. The purpose of alignment (also named warping) is to synchronize the time axes in order to construct representation of signals for n chromatograms (corresponding to n samples) for further data analysis such as calibration and classification. [22] To acquire reproducible analysis of samples, the peak position shifting should be corrected by alignment algorithms. [38] When using higher order instrument for analysis, the data obtained become more complicated to process. Methods particularly developed for alignments in 2D can be categorized into two groups [75]: the first group is seeking the maximum correlation or the minimum distance between chromatograms on the basis of one-dimensional benefit function, such as correlation optimized warping (COW) [76-77] and dynamic time warping (DTW); on the contrary, the second group of methods focuses on second-order instruments which generate a matrix of data per sample. Example of methods are: rank minimization (RM), of which having a remarkable advantage that the interferences coeluting with the analytes of interest do not really affect the performance of alignment, [75] iterative target factor analysis coupled to COW (ITTFA-COW) and parallel factor analysis (PARAFAC). Yu et al. [75] developed a new method named abstract subspace difference (ASSD) based on RM with some modification for alignment. The performance of this new method is comparable with the old RM on both simulated and experimental data, but it is more advantageous than RM due to higher intelligence and suitability for dealing with analytes coeluting with multiple interferences. Furthermore, ASSD can be used in combination with trilinear decomposition to obtain the second-order advantages. Eilers [78] developed a fast and stable parametric model for warping function which consumes little memory and avoids the artifacts of DTW which is time and memory consumptive. This method is very useful for quality control and is easily interpolated, allowing alignment of batches of chromatograms for a limited number of calibration samples. [78]

409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436

(19)

Figure 5. Aligning homogeneous TIC images. (a) Shown are the contours and peaks of TIC chromatograms for a pair of FA + AA samples, which are respectively from the first and last GCXGC/TOF-MS analyses. [79]

Most of the alignment methods are based on the similar procedures originally developed for 1D chromatograms; however in 2D chromatograms the alignment becomes more critical due to the higher relative variability of the retention time in the very short 2D time window. [80] Therefore, Castillo et

al. [80] proposed an alignment algorithm called score alignment utilizing two retention times in two

dimensions to improve the precision of the retention time alignment in two dimensions. Zhang et al. [79] developed an alignment method for GC

×

GC-MS data termed 2D-COW which works on two dimensions simultaneously. Example of application of 2D-COW is presented in Figure 5. Besides, this method can work for both homogenous and heterogeneous chemical samples with a slightly different process before. It was also claimed to be applicable in any 2D separation images such as LC

×

LC data, LC

×

GC data, LC

×

CE data and CE

×

CE data in principle. Pierce et al. [81] has developed a comprehensive 2D retention time alignment algorithm using a novel indexing scheme. This new comprehensive alignment algorithm is demonstrated by correcting GC

×

GC data, but was designed for all kinds of 2D instruments. After alignment, the classification by PCA gave 100% accurate scores clustering. However, there is still future work needed to improve this algorithm for combination with spectral detection in order to preserve the spectral information as alignment is performed on retention time. Furthermore, the range of the shifting should also be investigated as well as perturbations in pressure and temperature since the data used in their work gave far peak shifting exceeding the nearest neighbor peaks. A second-order retention time algorithm 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458

(20)

UV, GC-MS). However, it is successfully applied on GC

×

GC data because the second GC column possesses the signal precision to act like a spectrometric detector. [82-83] This method requires an estimation of the number of chemical components in the time window of the sample being analyzed, however, it is not a disadvantage because many literature covered the estimation or psuedorank of a bilinear data matrix (i.e. Prazen et al. [37] used PCA using singular value decomposition (SVD) to estimate the psuedorank). [84-86] This alignment algorithm is not objective for first-order chromatographic analysis because retention time is the only qualitative information. [37] Fraga et al. [87] proposed a 2D alignment algorithm based on their previous alignment method developed for 1D [88]. This 2D alignment method can objectively correct for run-to-run retention time variances on both dimensions in an independent, stepwise way and it has been proved to be robust as well. They also claimed that this 2D alignment with generalized rank annihilation method (GRAM) was successfully extended to high-speed 2D separation conditions with a reduced data density.

Normalization

Normalization in preprocessing is also an important step due to the bias introduced by sample preparation and poor injection volume precision provided by injectors. The most commonly used method is the internal standard method. [89] However, the choice of internal standard is limited since the internal standard needs to be inert and fully resolved from all native components while possessing similar structure to the sample analytes. Likewise baseline correction, there are also several algorithms for normalization. Although without using the standard solution, the responses are much dependent on detectors. For example, the flame ionization detector (FID) responses in GC are largely dependent on the carbon content of the solute. If samples belonging to similar types are analyzed by FID, the normalization method algorithm will have the least error in data analysis. In general, normalization is often used for the determination of the components of sample mixture since as time passes by, the response of each component may vary during analysis, making the comparison difficult. Other normalization methods are either mathematically forcing the mean signal of each chromatogram to equal 1 [90], or forcing the maximum peak volume to equal 1 [91] so that the sum of all signals/chromatogram constitutes as 100% and each component takes a certain percentage of all.

To summarize, the preprocessing procedures are very controversial and tricky since they can determine the degree of usefulness of the raw data with the interference of users. With the advantages and limitations of each method in every step of preprocessing, there is no best method for all. When preprocessing methods are not used in the right way, unwanted variations can be introduced. [40] Currently, some software/tools developed for data processing are already embedded with certain algorithms chosen by the manufactories for commercial use (i.e. GC Image), many researchers prefer 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494

(21)

to use in-home written routines [92]. However this is not clear-cut guidelines for choosing the optimal methods. A nice review on preprocessing methods with critical comments is given by Engel et al. [40].

Data processing

The advantage of 2D instrumental analysis is that the data produced by instrument provide more information in the form of second-order, third-order or even higher with the existence of unwanted inferences. However, this advantage is at the cost of complicated useful information extraction. If preprocessing of data is performed well, the next step is the real data processing procedure such as data reduction, data decomposition and classification.

Supervised and unsupervised learning

In data analysis, the statistical learning always falls into two categories: supervised and unsupervised. Let us look at grouping as an example to understand these two categories. In chemometrics, the aim of classification is to separate a number of samples into different groups by their distinguishing characteristics - similarities. However, the word classification is ambiguous in the field of pattern recognition. To clarify the definition, grouping is used herein instead of classification to indicate the general grouping. There are two types of grouping in pattern recognition: supervised and unsupervised. Supervised pattern recognition requires a training set of known groupings in advance, and tries to answer the belonging of the group of an unknown sample precisely. [93] In short, an supervised pattern recognition is based on the prerequisite that the number of the groups of samples is already known beforehand, and this kind of grouping is termed classification. An unsupervised grouping is applied to explore the data when the number of groups is not known beforehand while the aim is to find the similarities/dissimilarities between them. For instance, a large number of wine chromatograms are given, and the researchers want to separate those samples into several groups by their origin; in this case how many origins of the wine is unknown and this grouping exploration is termed clustering.

So back to the original two categories, supervised learning is a process looking for an model fitting the observations of the predictor measurements (xi) and relating the associated response measurement (yi). With this model, it is expected to predict the response for future observations (prediction) accurately or to better understand the relationship between the response and the predictors. In contrast, unsupervised learning try to manage a more challenging situation where there is no associated response to every observation. It is not possible to fit a linear regression model since there 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525

(22)

details; and some of them are not popular in 2D chromatography. For all statistical methods on this part, the reader is referred to these books [94]. In this literature review, only the most popular methods applied in 2D chromatography are explained.

Unsupervised

HCA

HCA, short for hierarchical clustering analysis, is an unsupervised method for data mining. While K-means clustering, which requires a pre-specified number of clusters K in advance, HCA does not require this. HCA has an advantage over K-means clustering that it results in an clear tree-like representation of the observations, called dendrogram. HCA works in the way that putting objects/variables with small distance (high similarity) together in the same cluster. The HCA algorithm is simple and it is based on the calculation of distance, mostly Euclidean

distance.

Suppose there are n observations, each of them is treated as its own cluster. The clustering first starts with calculating the smallest Euclidean distance of two observations among all and fuse them. Since it is an iterative process, the calculation and clustering will continue till all the distances are calculated.

The dissimilarity between two observations indicates the height in the dendrogram where the fusion is placed. The dissimilarity of clusters of observations distinguishing that of observations is termed linkage. There are four types of linkage: complete, average, single and centroid.

Euclidean distance - The Euclidean distance between points p and q is the length of the line segment connecting 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552

(23)

Linkage Description

Complete Maximal intercluster dissimilarity. Compute all pairwise fissimilarities between the observations in cluster A and the observations in cluster B, and record the largest of these dissimilarities.

Single Minimal intercluster dissimilarity. Compute all pairwise dissimilarities between the observations in cluster A and the observations in cluster B, and record the smallest of these dissimilarities. Single linkage can result in extended, trailing clusters in which single observations are fused one-at-a-time. Average Mean intercluster dissimilarity. Compute all pairwise

dissimilarities between the observations in cluster A and the observations in cluster B, and record the average of these dissimilarities.

Centroid Dissimilarity between the centroid for cluster A (a mean vector of length p) and the centroid for cluster B. Centroid linkage can result in undesirable inversions.

Table 5. A summary of the four most commonly-used types of linkage in hierarchical clustering. [95] Average and complete linkages are generally preferred over single linkage due to their tendency of yielding more balanced dendrograms, while centroid linkage is often used in genomics but suffers from the drawback of inversion where two clusters are fused at a height below either of the individual ones. [95] Centroid is often used in chromatography field. In general, the resulting dendrograms are strongly dependent on the linkage used.

This clustering method is very popular in 2D chromatography application and is said to be used for the case when sample sets are smaller than 250. [96] Ru et al. [97] applied both HCA and Principle component analysis (PCA, explained later) for the peptide sample data sets analyzed by 2D LC-MS for the peptide feature profiling of human breast cancer and breast disease sera. Schmarr et al. [98] also applied HCA and PCA for profiling the volatile compounds from fruits. Groger et al. [99] applied HCA and PCA as well for profiling of illicit drug samples.

PCA

Principle component analysis (PCA) is a very useful statistical method that can be applied to simplification, data reduction, modeling, outlier detection, variable selection, classification, prediction and unmixing. [101] 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575

(24)

capture the essential data patterns of X to interpret the information. [101] Generally, objects are the samples and variables are the measurements. The decomposition in PCA can be performed by eigenvalue decomposition, or singular value decomposition (SVD). The illustration of PCA by eigenvalue decomposition is given in Figure 6.

Figure 6. A data matrix X with its first two principal components. The model of PC in matrix form can be mathematically expressed as:

X =T ∙ P

T

+

E

(2)

Where T are called scores, having the same number of rows as the original data matrix, P are loadings, having the same number of columns as the original data matrix and E, which is not explained by the PC model, is the residuals. ti and pj are denoted as the vectors in each scores and loadings matrix.

From a geometric perspective, a data matrix X ( N rows × K columns) can be represented as an ensemble of N points distributed in K dimensions space. This space may be termed M-space for measurement space or multivariate space or K-space to indicate its dimensionality. [101] It is difficult to visualize by human eyes when K>3 but not by mathematically. The number of principle components is used to explain the information in PCA dimensionality. When a one-component PC model is sufficient to explain the data, the model will be a straight line; and a two-component PC model is a plane consisting of two orthogonal lines, so a component PC model is a three-dimensional space wherein three lines are orthogonal. Besides, an components PC model is an A-dimensional hyperplane, which is a subspace of one dimension less than its ambient space. In data reduction, this is useful when the original data set is large and complex so that PCA can approximate it by a moderately complex model structure.

578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603

(25)

In PCA algorithm, it searches for the axis used for the projection of the data points where the loss of information (variability) is the minimum. In other words, since PCA is a least squares model, the PC model is built on the basis that the sum of the squared residuals (Stotal) is the smallest. The first principle component (PC1) is the one capturing the largest variance containing the most useful information of all, then it is rotated to search for PC2 which is orthogonal to PC1 capturing the second largest variance in the left variances. This process carries on till all the useful variances are captured by PCs only leaving the residuals (noise) out. The rule of PC model is that all the PCs should be orthogonal. The number of principle components is determined by the total contribution of PCs able to explain the data matrix, which is dependent on the size of the component. After transforming the data matrix into a number of PCs, the size of each component is measured. The size of the component is termed eigenvalue which quantifies the variance captured by the PC: the more significant the component, the larger the size. [101] The eigenvalue can be calculated by sum of squares of each PC scores vector (Sk) divided by the sum of squares of the total data (Stotal). A basic assumption in PCA is that the scores and loadings vectors corresponding to the largest eigenvalue contains the most useful information relating to the specific problem. [100] One simple example is present in Table 6. PC1 explained 44.78% of the total data matrix and the first three PCs accounted for 95.37% of the total, hence in this case, three principle components were sufficient to explain the information from data matrix.

Total PC1 PC2 PC3 PC4 PC5

670 Eigenvalue 300 230 109 20 8

% 44.78 34.34 16.27 2.99 1.19

Cumulative % 44.78 79.11 95.37 98.36 99.55

Table 6. Illustration of size of eigenvalues in PCA. [101]

A common rule for choosing the number of principle components is determined when the cumulative value exceeds the cut-off value 95%. However, this would not work in every case due to the fact that Stotal is dependent on the variance of the raw data. Sometimes it requires a preprocessing of the data before applying PCA, and this can be achieved through scaling (e.g. mean-centering the data by subtracting the column averages corresponds to moving the coordinate system to the centre of the data). The scaling is essential since PCA is a least squares method meaning the variables with large variances will have large loadings, which enlarges the scale of the coordinate axis. The common ways to avoid this bias are standardization or mean-centering or log centering the matrix columns so the variance will be 1. [94] The scaling of variance makes all coordinate axes have the same length so that each variable having same influence on the PC model. [94] In general, the 95% cut-off rule is not the 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636

(26)

PCA provides two kinds of plots scores (T) and loading (PT_{) plot. Each of which investigates the} relationships among objects and variables respectively. The value of object i (its projection) on a principle component p is termed score.

Figure 7. (A) The scores plot obtained from PCA of 18 basil, 18 peppermint, and 18 stevia GC×GC-TOFMS m/z 73 chromatograms demonstrates differentiation between-species based on metabolite profiles. Ref [176]; (B) PCA plot of 2D GC-TOFMS data for serum samples. The group A samples were stored at -20

℃

and the group B samples were stored at -80 ℃ . [80]

As mentioned in the beginning that PCA can be applied to grouping, and PCA is an unsupervised method. PCA has been widely used in chromatography, and working efficiently in 2D chromatography. [80, 100, 102-103] With 54 chromatograms of three different species of plants (18 for each species), Pierce et al. [102] used PCA to quickly and objectively discover differences between complex samples analyzed by 2D GC-MS. PCA compared the metabolite profiles of the 54 submitted chromatograms and 54 scores of the m/z 73 data sets were successfully clustered in three groups according to the types of the plant, furthermore, they yielded highly loaded variables corresponding with chemical differences between plants providing complementary information for m/z 73. However, this approach has never been demonstrated for 2D GC-TOFMS metabolites data.

PCA has also been used in quality control to detect possible outliers by Castillo et al. [80]. 60 human serum samples analyzed by 2D GC-MS and the total metabolite profiles were used in the evaluation. All samples were separated into two clusters by the storage temperature as can be seen in Figure 7 (B) indicating no outliers in this case. An example of PCA applied to 2D GC data by Schmarr et al. [98] for profiling the volatile compounds from fruits is given in Figure8.

639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662

(27)

Figure 8. PCA analysis: In the first/second principal component plot (panel A), except for “Cox-Orange” and with much lower distance “Pinova”, all apples (reddish and yellowish color shades) are projected into the center. Pears, which are encoded by green color hue, appear on the upper left, while “Alexander Lucas” and “Conference” are clearly distinguishable. The group of quince fruit samples appears at largest distance to the other samples on the upper right. [98]

Indeed, PCA has been proved to be a very popular method in 2D chromatography. Still, another method called hierarchical PCA (H-PCA) was suggested by Pierce et al. [102] to be conceivably applied to this type of data. The principle of H-PCA is basically the same but providing more information due to its dealing with higher dimensional data sets. It works in the way of constructing several PCA models based on a subset of the entire higher-order data set (i.e. all the mass channels of 2D GC-MS), and the scores from all PCA models can be combined to form a new matrix. The same extension of PLS is termed H-PLS and both methods are well explained by Wold et al. [104].

MPCA

Multiway principle component analysis (MPCA), an extension of PCA, has recently become a promising exploratory data analysis method with 2D GC data. [102-103, 105-107] The principle of MPCA is basically the same as PCA, only extended to higher order data generated by instruments. In short, MPCA is an unfold method where the two-way data for each sample is unfolded row-wise and PCA is performed on the unfolded data. [107] It has been applied to extracted matrices of raw 3D data arrays to determine the exact compounds distinguishing classes of samples. [108-109,110]

663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684

(28)

Supervised

PLS

Partial least squares analysis (PLS) is a dimension reduction method which first identifies a new set of M features Zi (i=1,2,3,…M) that are linear combinations of the original data and then fits a linear model by least squares. However, PLS differs from principle component regression (PCR) in a supervised way, which is as explained earlier taking the response into account. In other words, PLS works on finding the linear combination of highly loaded chromatographic signals by capturing the highest covariance of both the variable and response. An comparison example between PCR and PLS was presented in Figure 9.

Figure 9. Comparison between PCR and PLS; where the first PLS direction (solid green line) and first PCR direction (dotted green line) are shown. [203]

Different from PCA, PLS places the highest weight on the variables that are most strongly related to the response. From both Figure 9 and the principle of PLS, it is clear that vectors in PLS does not fit the predictors as closely as PCA does but provides a better explanation on the relationship with response, which is very important for quantification; furthermore, PLS can work with multivariate response variables.

Multi-way partial least squares (N-PLS) is an extension of PLS into multiple dimensions. [111-112]

Interval multi-way partial least squares (iNPLS) as it says in the name uses intervals of multi-way datasets to build calibration models. [113] iNPLS is an extension of interval partial least squares (iPLS) proposed by Norgaard et al. [114], which was developed for first order data by splitting the dataset into a collection of intervals set by users and then a PLS model is calculated for each interval with the lowest root mean square of cross-validation (RMSECV). However, there is no algorithm for second-order data though iPLS, like many other methods, could also be used for it by unfolding the 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710

(29)

data as PCA does. This also arise some problems (i.e. introduce bias into calibration because the untargeted peaks coeluting with targeted ones are also calculated in the intervals) when applying to GC

×

GC data by unfolding. This has resulted in the development of iNPLS. iNPLS is basically the same as PLS, but splitting data matrix into intervals in both dimensions with multi-way algorithm. Like NPLS, iNPLS does not have second-order advantage but is able to analyze an unknown sample containing interferences which do not present in the calibration. As an supervised pattern recognition method, partial least-squares discriminant analysis (PLSDA) has been used for modelling, classification and prediction in 2D GC-TOFMS [115].

Fisher ratio analysis (FRA) which calculates the ratio of variance between groups and variance within groups as a function of an independent variable is a robust method for classification. In chromatography, the new independent variable is the retention time for classification. The schematic of reducing 4D data to 2D for Fisher ratio calculation has been well depicted by Pierce et al. [116] It has been applied to breast cancer tumor data analyzed by 2D GC-MS. [117] Guo et al. [118] applied FRA to 2D MS data for metabolite profiling. FRA has been successfully applied to 2D GC-TOFMS dataset by Pierce et al. [116] and proved to be better than PCA when handling biodiversity by differentiating regions of large within-class variance from regions of large class-to-class variance.

Figure 10. Schematic of novel indexing scheme to reduce 4D data to 2D data for calculation of Fisher ratios. Ultimately, the entire set of data that is collected is automatically (i.e., not manually) submitted to Fisher ratio 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730

(30)

Peak Deconvolution

Peak deconvolution is to resolve the overlapping peaks in complex mixture to enhance selectivity of a certain chromatographic technique when separation cannot be improved by optimizing the separation conditions. [22] This is necessary for quantification of components of interest. Several chemometrical approaches can be used to achieve deconvolution with a single-channel detector, however they are not used in an everyday practice due to the prerequisite of advanced knowledge. [22] Typical data sets obtained by 1D technique can be presented as a two-way data table as in Figure 11. This type of data can be decomposed into two matrices containing both concentrations (chromatogram) and profiles (spectral). An example is to use orthogonal projection approach (OPA) [119] followed by alternating least squares (ALS) [120]. Two alternatives of OPA are window factor analysis (WFA) and evolving factor analysis (EFA). [121] Due to the limitation of EFA on prediction of peak shapes, a non-iterative EFA (iEFA) was developed [122]. EFA was successfully applied to LC-DAD and GC-MS data. A good review of mixture-analysis approaches for bilinear data matrices has been published with comparisons of different methods. [123]

733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750

(31)

Figure 11. Illustration of the bi-linear chromatographic data. Columns of matrix X contain the concentration profiles (chromatograms) and rows contain the spectral profiles. [123]

The methods mentioned above are mostly designed and applied to 1D chromatography data sets, including those with multichannel detectors (i.e. DAD) producing second-order (trilinear) data sets. For 2D chromatography, the data form will be first-order (bilinear) with only 2D chromatography and the order of data will increase to trilinear (can be decomposed into a matrix

I × J × K

) or higher with different detectors (i.e. single wavelength UV detector or MS). Second-order data provides a trilinear structure and the trilinear structure is beneficial for signals which are not sufficiently resolved by the instrument can be resolved mathematically. Zeng et al. [124] developed an method (named simultaneous deconvolution) using non-linear least squares curving fitting (NLLSCF). The NLLSCF analysis was based on Levenberg-Marquardt algorithm due to its satisfactory performance to treat multi-parameter systems. This method allows simultaneous deconvolution and reconstruction of peak profiles for both dimensions and this makes accurate quantification and finding of retention parameters of target components. It was originally designed for GC

×

GC datasets but can also be utilized for LC × LC datasets.

751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768

(32)

Figure 12. Illustration of bilinear data structure. The data matrix, in the form of a contour plot, depicts the overlapped GC × GC signals of three components, A-C. Each component’s signal is bilinear when its noise-free signal can be modeled as the product of its pure column 2 chromatographic profile vector and its pure column 1 chromatographic profile vector. Here, the bilinear portion of the data matrix is modeled as the product of two matrices, each matrix containing the pure chromatographic profile vectors for components A-C on a given column. The nonbilinear portion of the data matrix is grouped into one matrix designated as noise. Concentration information for each component is incorporated within each pure chromatographic profile vector. [124].

Generalized rank annihilation method (GRAM) introduced by Sanchez and Kowalski [66] has been successfully applied to 1D chromatography data, such as deconvolution peaks in LC-DAD [88, 125]. It was the first deconvolution method applied to comprehensive 2D chromatography [83, 126]. The schematic bilinear data structure of 2D chromatography was depicted in Figure 12. The application of GRAM extended from 1D GC chromatography to 2D GC chromatography is based on that the second column of a GC × GC system can be treated as a multichannel detector. [83] Full resolution of all the analytes of interest is not necessary since GRAM can be successfully applied to 2D GC data. [82

-83, 126] It was even demonstrated that GRAM can mathematically resolve overlapped GC × GC signals without any preprocessing alignment to the data sets under favorable conditions. [82] The deconvoluted peaks in GC × GC was presented in Figure 13.

769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791

(33)

Figure 13. GRAM deconvolution of the overlapped ethylbenzene and p-xylene components in the sample white gas comprehensive GC=GC chromatogram shown in Figure 9, using a white gas standard for comparison. (A) First GC column and (B) second GC column estimated pure profiles. Deconvolution is successful despite the low 0.09 resolution of the peaks on the second column, because retention times are very precise within and between GC runs. [82]

Nevertheless, there are some prerequisites of using GRAM to 2D chromatographic data [83]: first, the detection response must be linear with the concentration; secondly, the peak shapes must remain unchanged, which means there is no overloading effect on the column; thirdly, the convoluted peaks must have resolution on both dimensions; finally, there cannot be perfect covariance in concentration of two compounds within the data window being analyzed from the standard to the sample. A key advantage of GRAM over other analysis methods is that the unknown sample can contain overlapped peaks not present in the calibration standard. [83] While GRAM can only quantify two injections at one time where one of which needs to be a standard, PARAFAC does not have these limitations and can be used to analyze more than two samples for three-way LC×LC data and four-way LC×LC data.

Figure 14. Schematic overview of PARAFAC. [106]

Parallel factor analysis (PARAFAC) is a generalization of PCA using an iterative process to resolve trilinear signals by optimizing initial estimates by ALS and signal constraits. [127] It has been applied 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812

(34)

Trilinear decomposition (TLD) initiated PARAFAC was shown to be powerful for multivariate deconvolution in 2D GC-TOFMS data analysis, the partially resolved components in complex mixtures can be deconvoluted and identified without requiring a standard dataset, signal shape assumptions or any fully selective mass signals. [128] PARAFAC is also applicable for higher order data than three way datasets. [120]

Figure 15. (A) PARAFAC deconvoluted column 1 pure component profiles of the low-resolution isomer data set. (B) PARAFAC deconvoluted column 2 pure component profiles of the low-resolution isomer data. [128] 815 816 817 818 819 820 821 822 823 824

(35)

Compare to PARAFAC, PARAFAC 2 does not need alignment before. [250-251] PARAFAC 2 allows peaks to shift between chromatograms by relaxing the bilinearity constrait on the dimension containing the chromatographic data. When analyzing 2D LC-DAD samples, PARAFAC was not capable of analylzing the entire sample at once. 2D GC-TOFMS datasets treated by PARAFAC (with alignment) and PARAFAC2 were compared [131]: it was found that PARAFAC was more robust with lower S/N and lower concentrations while PARAFAC2 did not need alignment analysis. However, this study was performed on fully resolved peak instead of overlapped peaks. Both methods are based on ALS minimization of the residual matrix and yields direct estimates of the concentrations without bias. [106] However, PARAFAC2 only permits the inner-structure relationship in one direction.

Figure 16. Accuracy of the various quantitation methods based on the analysis of a reference mixture with known analyte concentrations. [106].

Van Mispelaar et al. [106] compared several methods for calibration with a standard mixture and the results were shown in Figure 16. It was shown that the model given by PARAFAC2 was overestimating while PARAFAC is the most accurate method of all. Even with real samples, the results obtained also showed that PARAFAC2 was overestimating the concentration in all cases. It was also shown that PARAFAC2, which was supposed to be able to deal with retention time shift due to its inner-product structure, was neither accurate nor robust. [198]

825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846

(36)

Conclusion & Future work

For data pre-processing and real data processing procedures, a variety of methods can be chosen for two dimensional chromatography datasets, and they all have their advantages and disadvantages. It is a pity that the utilization of method is mostly dependent on user experience and preferences that people tend to use what they have learned. There are no clear-cut guidelines to choose the optimal methods. The search for optimal methods in the future would be substantially beneficial for the development of chemometrics.

Tool box

Currently, most of the pre-processing algorithms and data processing methods for classification and quantification can be applied directly in tool box packed in softwares such as Matlab and R. (i.e. The PLS algorithm was from the PLS Toolbox by Eigenvector Research Inc. (Eigenvector Research Inc., Wenatchee, WA). The N-PLS algorithm was from the N-Way Toolbox by Rasmus Bro

www.models.life.ku.dk/source/nwaytoolbox/ This has stimulated the development of algorithm by user experience and provided a variety of choices on methods by user preferences.)

847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863

(37)

Reference

1. Nomenclature for chromatography, (IUPAC Recommendations 1993) 2. D. Harvey, Modern analytical chemistry 1st ed., p549

3. L.R. Snyder, Introduction to Modern Liquid Chromatography 3rd ed., p76 4. K.S. Booksh, B. R. Kowalski, Ana. Chem., 66 (1994) 782-791 5. J. Blomberg, High Resolut. Chromatogr., 20 (1997) 539 6. R.B. Gaines, Environ. Sci. Technol., 33 (1999) 2106

7. R.B. Gaines, in: Z. Wang, S. Stout (Eds.), Spill Oil Fingerprinting and Source Identification, Academic Press, New York, 2006, p. 169

8. G.S. Frysinger, High Resolut. Chromatogr. 23 (2000) 197 9. G.S. Frysinger, Environ. Sci. Technol., 37 (2003) 1653 10. J. Beens, J. High Resolut. Chromatogr., 23 (2000) 182 11. C.M. Reddy, Environ. Sci. Technol., 36 (2002) 4754 12. C.M. Reddy, J. Chromatogr. A, 1148 (2007) 100 13. G.T. Ventura, PNAS, 104 (2007) 14261

14. G.T. Ventura, Org. Geochem., 39 (2008) 846 15. A. Motoyama, Anal. Chem., 80 (2008) 7187 16. J. Peng, J. Proteome Res., 2 (2003) 43 17. M. Gilar, Anal. Chem., 77 (2005) 6426 18. J. Peng, J. Proteome Res., 2003, 2, 43

19. X. Zhang, Anal. Chimica. Acta., 664 (2010) 101 20. P.Q. Tranchida, J. Chromatogr. A, 1054 (2004) 3 21. InforMetrix, Chemometrics in Chromatogr., 1996 22. M. Daszykowski, Trends in Anal. Chem., 25 (2006) 11 23. G. Musumarra, J. Anal. Toxicology, 11 (1987) 154 24. I. Moret, J. Sci. Food Agric., 35 (1984) 100 25. I. Moret, Riv.Vitic. Enol., 38 (1985) 254

26. L.E. Stenroos, J. Am. Soc. Brew. Chem., 42 (1984) 54 27. P.C. Van Rooyen, Dev. Food Sci., 10 (1985) 359 28. B.E.H. Saxberg, Anal. Chim. Acta, 103 (1978) 201 29. H. Engman, J. Anal. Appl. Pyrolysis, 6 (1984)137 30. W.R. Butler, J. Clin. Microbiol., 29 (1991) 2468 31. B.R. Kowalski, Anal. Chem., 47 (1975) 1152 32. R.J. Marshall, J. Chromatogr., 297 (1984) 235 33. J.A. Pino, Anal. Chem., 57 (1985) 295

34. J.E. Zumberge, Cosmochim. Acta, 51 (1987) 1625

35. Karl S. Booksh, Bruce R. Kowalski, Analytical Chemistry, 66 (1994) 782 36. M. Escandar, Anal. Chimica Acta, 806 (2014) 8

37. B.J. Prazen, Anal. Chem. 1998, 70, 218 38. K.M. Pierce, J. Chromatogr. A, 1255 (2012) 3 39. K.M. Pierce, Sep. & Purif. Rev., 41 (2012) 143 40. J. Engel, Trends in Anal. Chem., 50 (2013) 96 41. R.G. Brereton, Applied chemometrics for scientists,

42.docs.google.com/viewer?url=http%3A%2F%2Fwww.chemometrics.ru%2Fmaterials 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908