Local Ion Signatures (LIS) for Forensic Comparison of GCGC-MS data

(1)

Local Ion Signatures (LIS) for Forensic Comparison of

GC×GC−MS data

Stean Jonkersa,∗

a_{Van 't Ho Institute for Molecular Sciences, University of Amsterdam,} PO Box 94248, 1090 GE, Amsterdam, The Netherlands

Abstract

Detection and classication of ignitable liquid residue (ILR) in re debris is a complex task due to a number of complicating factors. Challenges in-clude the wide range of ignitable liquids that exist, the low concentrations of ILRs present in re debris samples and the omnipresence of interfering or uninformative matrix compounds in the samples. Comprehensive two-dimensional gas chromatography-mass spectrometry (GC×GC−MS) allows for extensive sample separation and analysis; however, retention time shift-ing and peak integration issues make manual and automated evaluation of these chromatograms a complex task. A recent publication [1] presents one approach to overcome these problems in the context of automated ILR detec-tion and classicadetec-tion in re debris samples by using a combinadetec-tion of data reduction and feature selection. Data reduction is performed by a method introduced as the Local Ion Signature (LIS), which is the sum of intensi-ties per MS m/z channel for all points in a certain chromatographic region. Coarsely dividing chromatograms into twelve LIS regions enabled detection

∗_{Corresponding author}

Email address: steffan.jonkers@student.uva.nl (Stean Jonkers)

(2)

of ILRs in re debris samples with an accuracy of 84%.

Building on the foundations laid down in that work, we evaluate the detection and classication of ILR in laboratory-prepared re debris samples using various classiers (decision tree, k-NN and PC-LDC) and a wide range of LIS region selection strategies. ILR detection was performed by many classiers with >96% accuracy and with false positive rates of <5%. Overall classication of the type of IL (white spirit, gasoline or lamp oil) in ILR-containing samples was performed at >84%, demonstrating that automated sample classication can assist the re debris analyst in his work.

Keywords: Local Ion Signature (LIS), Fire Debris Analysis (FDA), Ignitable Liquid Residue (ILR), Chemometrics, Comprehensive

two-dimensional gas chromatography-mass spectrometry (GC×GC−MS), k-NN, PC-LDC, Decision tree

(3)

Contents

1 Introduction 5

1.1 Forensic relevance . . . 5

1.2 Recent developments . . . 6

1.3 Total Ion Spectrum . . . 7

1.4 Local Ion Signatures . . . 8

1.5 Current work . . . 9

2 Materials and methods 9 2.1 Datasets . . . 9

2.2 Workow . . . 10

2.3 Cross-validation . . . 10

2.4 GC×GC−MS region selection strategies . . . 10

2.4.1 Equal-sized regions . . . 12

2.4.2 Dierent-sized regions . . . 13

2.5 LIS calculation . . . 14

2.6 LIS normalization . . . 14

2.7 Classiers . . . 17

2.7.1 Decision tree classiers . . . 18

2.7.2 Nearest-neighbour classiers . . . 19

2.7.3 PC-LDC . . . 20

3 Results and discussion 21 3.1 Performance in dierent region selection strategies . . . 23

3.2 Classiers . . . 26

3.2.1 Decision tree classiers . . . 26 3

(4)

3.2.2 Nearest neighbour classiers . . . 28 3.2.3 PC-LDC . . . 29 3.3 Normalization . . . 31 3.4 Future work . . . 32 4 Conclusions 33 5 Acknowledgments 34 4

(5)

1. Introduction 1.1. Forensic relevance

In 2013, more than 36000 res were reported in the Netherlands [2]. These res injured almost 700 people and led to the death of 92 people. Causes were determined for nearly half of the res and 37% of those res were determined to be caused by arson or vandalism. Besides costs in terms of lives and health, nancial costs were huge, with damages to companies alone exceeding 630 million euros [3]. Aggregated societal costs associated with arson for matters like re-ghting, police investigations, paramedics, health care and insurance have not been estimated, but will presumably be in the order of hundreds of millions to billions of euros a year. Physical evidence that could otherwise be used to investigate arson, e.g. nger marks, bres and DNA, is often destroyed by the re or by re-ghting eorts. As a consequence investigators often turn to the analysis of re debris to detect ignitable liquid residues (ILRs) which might demonstrate arson or provide other information valuable to the investigation.

Identication of ILRs is often deemed a complicated and challenging pro-cedure, inter alia because of a wide and increasing range of ignitable liquids (ILs) that exist, low concentrations of ILRs in re debris samples and the omnipresence of matrix compounds that can impede the detection and iden-tication of ILRs [4, 5, 6, 7], either because they are uninformative and cause extra noise during analysis, or because they are constituents of ILs, in which case it is not always clear if the compound originates from an IL or from other materials.

The American Society for Testing and Materials (ASTM) dictates in its 5

(6)

E1618 standard [8] that detection and classication of ILRs in re debris should involve gas chromatography-mass spectrometry (GC−MS) and should be based on visual pattern matching against reference ILs. In common prac-tice this implies a subjective interpretation of analysis results, which can lead to erroneous decisions, especially for complex re debris samples. Au-tomated sample classication, when implemented correctly, may decrease inter-analyst variation and may increase evaluation performance.

1.2. Recent developments

Two routes towards improvement of the analysis of complex re debris samples have been explored in the past years. The rst development is the introduction of GC×GC−MS in the eld [9, 10, 11, 1]. Secondly, chemomet-ric techniques are often applied in recent FDA research [12, 13, 14, 15, 16, 1, 17, 18, 19, 20].

Compound separation is a necessary step in re debris analysis (FDA) and since ILRs generally consist of volatile compounds, GC is a chromato-graphic technique well-suited for use in FDA. Detection is often performed by a mass spectrometer, yielding structural information on the separated com-pounds. Comprehensive two-dimensional gas chromatography-mass spec-trometry (GC×GC−MS) was introduced in FDA little over a decade ago and provides superior separation compared to GC−MS, while enabling the detection of compounds even if they are present in very low abundance [1, 9, 10, 11]. As opposed to the technique's advantages, retention time-shifting and peak integration issues are often more prominent in GC×GC−MS than in GC−MS, making it more dicult to compare inter-laboratory chromato-graphic data.

(7)

Chemometric techniques play an important role across multiple domains of analytical chemistry [21] and many of these techniques have been applied in the analysis of pure ILs, either aiding in the interpretation of analytical data, such as Principal Component Analysis (PCA) [15, 16, 13, 14] and Hierarchi-cal Cluster Analysis [15], or performing automated classication based on the data, such as Discriminant Analysis [16], Partial Least Squares Discrim-inant Analysis (PLS-DA) [12], Soft Independent Modelling of Class Analogy (SIMCA) [13, 14] and Self-Organising Feature Mapping [15]. PCA was also used as a feature reduction technique in the analysis of ILR (as opposed to IL) [17, 18, 22]; ILR classication has been demonstrated using Linear Dis-criminant Analysis (LDA) [1, 17, 22], Quadratic DisDis-criminant Analysis [17], soft Bayesian classication [20], PLS-DA [19] and SIMCA [18, 19], with clas-sication accuracy ranging from 70-90%.

1.3. Total Ion Spectrum

Classication of GC−MS and GC×GC−MS samples, either manually or automatically, can be done on the basis of various representations of those samples, e.g. peak tables [23], total ion current (TIC) [10, 24, 25] or ex-tracted ion chromatograms (EIC) [25, 26, 1]. The Total Ion Spectrum (TIS) or Summed-Ion Mass Spectrum, as it was called when it was rst explored in FDA [27], is another representation. A TIS is generated by summing the intensities across each m/z channel over the entire chromatographic range and entails an immense dimensionality reduction [27]. A TIS does not in-clude any chromatographic information and therefore does not suer from retention time shifting and peak integration issues. Despite the loss of chro-matographic information, the TIS approach was found to be promising in

(8)

quickly and robustly characterizing dierences between pure ignitable liquid samples, with correct classication performance of 85-95% [27]. Worse clas-sication performance of about 70-81% was obtained with more realistic re debris samples [17, 18, 28] due to the presence of matrix compounds in the samples, which contribute towards the TIS but impede classication.

1.4. Local Ion Signatures

A hybrid approach which aims to both make use of the extensive sep-aration space obtained by GC×GC−MS sepsep-aration and to incorporate the advantages of the TIS representation (dimensionality reduction and the ab-sence of retention time shifting and peak integration issues) while extending application to realistic re debris samples was recently published [1] and is called the Local Ion Signature (LIS) approach. The plural form of Local Ion Signature, i.e. Local Ion Signatures, will be indicated in this work by LISes. A LIS is obtained in the same way as a TIS, by summing the intensity per m/z channel over a certain chromatographic range. Diering from the TIS approach, a LIS is summed over a limited chromatographic range instead of over the entire chromatogram. Care must be taken here to ensure that these ranges are not so limited in size that retention time shifting becomes an issue. LIS calculation will be further illustrated in Section 2.5.

Applying a score-based likelihood ratio (LR) feature selection technique and LDA, LISes of twelve equal-sized regions that collectively covered the entire chromatographic space were found to enable a 84% accuracy in ILR detection and a 59-93% performance in ILR classication, varying per class of ignitable, in 155 GC×GC−MS laboratory-created arson samples [1].

(9)

1.5. Current work

The current work builds upon the foundations laid in [1] and will aim to elevate the ILR detection and ILR classication performance delivered in that research. A wide range of strategies that dene chromatographic re-gions for which LISes will be calculated is explored. Additionally, various classiers working in the resulting LIS space will be evaluated. The rela-tionships between the chemical data and those combinations of LIS region selection strategies and classiers that maximize ILR detection and ILR clas-sication performance will be explored. Modelling will be performed on a GC×GC−MS dataset of 90 laboratory-created re debris samples.

2. Materials and methods 2.1. Datasets

Realistic re debris samples were obtained by creating small-scale labora-tory res according to the procedure described in Appendix A. A total of 19 dierent substrates were used in these res; each of the 90 samples consisted of a mixture of 5 substrates in varying proportions. White spirit, gasoline or lamp oil was added as an IL to 45 of the samples. For each sample the substrate constitution and the presence and class of IL, if added, is sum-marized in Appendix B. GC×GC−MS data was obtained for all samples; instrumental details can be found in Appendix A. All 90 samples were used to evaluate ILR detection; in ILR classication only the 45 samples to which IL was added were used. An example chromatogram is visualised in Figure 1.

(10)

2.2. Workow

A schematic overview of the workow adhered to in this research, involv-ing the selection of regions of which LIS will be calculated, the subsequent LIS calculation, LIS normalization, stratied partitioning and cross-validated ILR detection and ILR classication, is presented in Figure 2. The stages in this workow will be explained in the following sections.

2.3. Cross-validation

Stratied 5-fold cross-validation was employed in order to reduce model overtting and to evaluate the generalizability of the classiers. The samples were randomly permutated and subsequently split into 5 partitions or folds of equal size and of similar composition with respect to IL presence or IL class, i.e. in ILR detection each fold contained equal proportions of samples containing ILR and samples without ILR, and in ILR classication each fold contained an equal number of samples per IL class. The classiers were trained on training sets that consisted of 4 of these 5 folds, the other fold being the test set which was used to evaluate the performance of the classiers. In one Monte Carlo (MC) iteration each of the 5 partitions served as a test set once; all of the results presented in this paper are aggregated over 100 MC iterations and only include the performance of ILR detection and ILR classication in test sets.

2.4. GC×GC−MS region selection strategies

One of the aims of this work was to explore various strategies for the selection of chromatographic regions of which LISes will be calculated. The

(11)

Retention time first chromatographic dimension (minutes) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 79 Re te ntio n t im e seco nd chr o m ator a ph ic di m en sio n ( seco nd s) 0 1 2 3 4 11 11.5 12 12.5 13 13.5 14 Na tu ral l og T IC

Figure 1: Example TIC GC×GC−MS chromatogram of sample 48, a mixture of substrate materials (untreated oak oor, area rug, sofa, magazines and computer monitor case) burned in the presence of white spirit as IL

Dataset LIS region selection LIS calculation ILR detection /classification Partitioning LIS normalization Training set Test set Train model Test model

Figure 2: Schematic overview of the workow for ILR detection and ILR clas-sication in GC×GC−MS data in this research. The dataset used to model ILR detection contained 90 samples; in ILR classication a subset of 45 of these samples were used

(12)

explored strategies could be categorized into two groups, one that employed equal-sized and one that employed dierent-sized LIS regions.

2.4.1. Equal-sized regions

Previous work employed a strategy that divided the chromatographic space into 12 equal-sized rectangular regions or tiles in a 4 (rst chromato-graphic dimension) × 3 (second chromatochromato-graphic dimension) grid [1]. The m×nregion selection strategies explored in the current work are summarized in Table 1 and an example strategy is visualized in Figure 3. In the special case of m = n = 1, the LIS equals the TIS of the chromatogram.

Table 1: Equal-sized, m×n region selection strategies explored in this work, where m is the number of tiles along the rst chromatographic dimension and n is the number of tiles along the second chromatographic dimension. Explored strategies are indicated by × m 1 2 3 4 5 6 8 10 12 14 17 20 30 40 50 n 1 × − − × × × × × × × × × × × × 2 − − − × × × × × × × × × × × × 3 − − − × × × × × × × × × × × − 4 − − − × × × × × × × × × × − −

First chromatographic dimension

Sec o nd c hr o ma to -gr a phi c di mens io n 0 m n

Figure 3: Example LIS region selection strategy using m×n equal-sized rectangular regions. In this example m = 8 and n = 4

(13)

2.4.2. Dierent-sized regions

Strategies employing rectangles of dierent sizes were explored as well. Since ILs consist for a signicant part of volatile compounds, and these com-pounds elute early along the rst chromatographic dimension, a relative large portion of compounds informative to the classication tasks might elute early along the rst chromatographic dimension as well. Hence, a higher region resolution along the beginning of the rst chromatographic dimension could benet classication. Explored strategies that exploited this concept are summarized in Table 2 and an example strategy is depicted in Figure 4.

Table 2: Dierent-sized, m × n region selection strategies explored in this work. Each scalar in m indicates the proportional width of a re-gion along the rst chromato-graphic dimension; n is the number of tiles along the sec-ond chromatographic dimen-sion m 1:1:2 1:1:1:2 1:1:1:1:2:2 1:1:1:1:1:2:2 1:1:1:1:2:2:4 1:1:1:1:1:1:2:2 1:1:1:1:2:2:4:4 n 1 × × × × × × × 2 × × × × × × × 3 × × × × × × × 4 × × × × × × ×

Sec o nd c hr o ma to -gr a phi c di mens io n 0 m n

Figure 4: Example LIS region selection strategy using dierent-sized rectangular regions. In this example m = 1 : 1 : 1 : 1 : 2 : 2 : 4 : 4 and n = 4

(14)

2.5. LIS calculation

Introduced in section 1.4, LISes are calculated by summing the intensity per m/z channel over a certain chromatographic range. This concept is vi-sualized for GC−MS data in Figure 5 but was generalized to accommodate GC×GC−MS data. In our workow a region selection strategy determined q, the number of LISes which were calculated for each of the r samples in a dataset. A LIS is a [1 × s] vector, s being the number of mass spectral features (m/z values) represented in the data. In this work, all the LISes calculated for a sample were concatenated into a [1 × [s × q]] vector. By representing the chromatographic data as LISes the number of features per sample was reduced from [total number of MS scans ×s] to [q×s], at the cost of a lower resolution in the separation space, i.e. unlike a regular MS scan, which can be pinpointed to a specic location (pixel) in the chromatographic space, a LIS can only be linked to a certain region in this space.

2.6. LIS normalization

Each LIS was normalized three times following three dierent strategies: csum-normalization, TIS-normalization and autoscale-normalization, all of which enabled the comparison of LISes for which the original data was mea-sured on distinct instruments, under dierent parameters or with variable sample injection volume. ILR detection and ILR classication was always performed on LISes that were normalized following the same strategy. The strategies are explained in Figure 6; some notes will be made here.

Csum-normalization operates on the collection of all features of a single sample and causes this collection to sum to some constant. The strategy does not attenuate any dierences in mass signal magnitude of dierent features

(15)

m/ z 1 5 0 1 4 9 1 0 1 1 .. .. .. .. .. .. .. . 1 3 2 4 .. .. .. .. .. .. .. . 21 39 12 40 .. .. .. .. .. .. .. . 18 31 9 29 .. .. .. .. .. .. .. . 0 1 0 0 .. .. .. .. .. .. .. . 4 3 0 2 .. .. .. .. .. .. .. . 0 0 0 0 .. .. .. .. .. .. .. . 0 0 0 0 .. .. .. .. .. .. .. . 0 0 0 0 .. .. .. .. .. .. .. . 0 0 0 0 .. .. .. .. .. .. .. . 0 0 0 0 .. .. .. .. .. .. .. . 0 0 0 0 .. .. .. .. .. .. .. . 10 15 8 22 .. .. .. .. .. .. .. . 9 12 6 15 .. .. .. .. .. .. .. . 2 3 0 4 .. .. .. .. .. .. .. . 0 1 1 0 .. .. .. .. .. .. .. . 3 4 50 20 .. .. .. .. .. .. .. . 2 2 16 7 .. .. .. .. .. .. .. . 44 77 23 75 .. .. .. .. .. .. .. . LIS

Figure 5: LIS calculation for example GC−MS data. The chromatographic region for which a LIS is calculated is coloured blue. The LIS is calculated by summing the intensity per m/z channel over the indicated chromatographic range and is visualized as a bar plot

caused by variation in the abundance of masses in the samples and therefore preserves the information of ratios between features that was present in the unnormalized LIS feature space.

TIS-normalization operates on each m/z channel that is represented in a single sample separately and ensures that each of these channels sum, across all LISes, to 1, thereby eliminating any dierences between signal magnitudes of m/z channels.

Autoscale-normalization works on each feature that is present in the data separately and subtracts from each feature the mean of that feature across all samples and subsequently divides by the standard deviation of that feature across all samples. This causes the data for each feature to be centred around zero and to have a variance of 1, thereby attenuating dierences in signal magnitude between highly abundant and less abundant masses. Since this

(16)

feature # m/z LIS #1 1 2 3 40 47 59 LIS #2 4 5 6 40 47 59 LIS #3 7 8 9 40 47 59 LIS #4 10 11 12 40 47 59 unnormalized sample X X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

1. multiply each feature value by the number of features present in a sample 2. divide each feature value by the sum of all feature values in a sample ( = X1+X2+....+X11+X12) normalized sample X X1×12 X2×12 X3×12 X4×12 X5×12 X6×12 X7×12 X8×12 X9×12 X10×12X11×12X12×12 feature # m/z LIS #1 1 2 3 40 47 59 LIS #2 4 5 6 40 47 59 LIS #3 7 8 9 40 47 59 LIS #4 10 11 12 40 47 59 unnormalized sample X X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

1. substract from each feature value the mean of that feature accross samples 2. divide each feature value by the standard deviation of that feature accross samples autoscale normalization: unnormalized sample Y Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 unnormalized sample Z Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 Z10 Z11 Z12 mean X1+Y1+Z1 X2+Y2+Z2 X3+Y3+Z3 X4+Y4+Z4 X5+Y5+Z5 X6+Y6+Z6 X7+Y7+Z7 X8+Y8+Z8 X9+Y9+Z9 X10+Y10+Z10X11+Y11+Z11X12+Y12+Z12 standard deviation std(X1,Y1,Z1) std(X2,Y2,Z2) std(X3,Y3,Z3) std(X4,Y4,Z4) std(X5,Y5,Z5) std(X6,Y6,Z6) std(X7,Y7,Z7) std(X8,Y8,Z8) std(X9,Y9,Z9) std(X10,Y10,Z10) std(X11,Y11,Z11) std(X12,Y12,Z12) normalized sample X X1-11 X2- 2 2 X3- 3 3 X4- 4 4 X5- 5 5 X6- 6 6 X7- 7 7 X8- 8 8 X9- 9 9 X10- 10 10 X11- 11 11 X12- 12 12 normalized sample Y Y1-11 Y2- 2 2 Y3- 3 3 Y4- 4 4 Y5- 5 5 Y6- 6 6 Y7- 7 7 Y8- 8 8 Y9- 9 9 Y10- 10 10 Y11- 11 11 Y12- 12 12 normalized sample Z Z1- 1 1 Z2- 2 2 Z3- 3 3 Z4- 4 4 Z5- 5 5 Z6- 6 6 Z7- 7 7 Z8- 8 8 Z9- 9 9 Z10- 10 10 Z11- 11 11 Z12- 12 12 normalized sample X X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 feature # m/z LIS #1 1 2 3 40 47 59 LIS #2 4 5 6 40 47 59 LIS #3 7 8 9 40 47 59 LIS #4 10 11 12 40 47 59 unnormalized sample X X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

TIS1 TIS2 TIS3 TIS1 TIS2 TIS3 TIS1 TIS2 TIS3 TIS1 TIS2 TIS3

TIS normalization: feature # m/z TIS 1 2 3 40 47 59 TIS sample X X1+X4+ X7+X10 X2+X5+ X8+X11 X3+X6+ X9+X12

divide each LIS of a sample by the TIS of that sample, which is equivalent to dividing each feature value by the sum, accross all LISes of a sample, of the m/z channel to which that feature corresponds

csum normalization:

TIS normalization:

Figure 6: Explanation of the three dierent normalization techniques that have been applied in this work. In this simplied example, 4 LISes are calculated per sample, and each LIS contains 3 features that correspond with m/z channels 40, 47 and 59, respectively. Csum-normalization operates on the collective of all fea-tures of a single sample together; TIS-normalization operates on each m/z channel represented in a single sample separately; and autoscale-normalization operates on each feature separately, across samples

(17)

strategy is the only one of the three strategies that uses information from all samples to normalize a single sample, care must be taken to ensure that normalization does not take place until after the samples have been divided into partitions for cross-validation (introduced in Section 2.3). Autoscale-normalization should be applied separately on the training set and on the test set to prevent any interdependence of the two normalized data sets, which would be biasing any classier subsequently trained and tested on these sets.

2.7. Classiers

The two classes in the ILR detection task were ILR present (ILR+) and ILR absent (ILR−), corresponding to samples that were known to have been burnt in the presence of IL and in the absence of IL, respectively; the three classes of IL which were added to the ILR+ samples (one class of IL per sample) correspond to three classes in the ILR classication task: white spirit (WS), gasoline (GAS) and lamp oil (LO). Three types of classiers were trained to dierentiate between these classes: decision tree classiers, nearest-neighbours classiers and linear discriminant classiers operating in principal component space (PC-LDC). The ILR detection classiers operated in a feature space of dimensionality [q × s] with 90 samples, where q is the number of LISes per sample and s is the number of mass spectral features (m/z values) represented in the data. ILR classication was carried out in the same [q × s] feature space on 45 samples.

All classiers were supervised learning classiers, i.e. they were modelled on samples of known classes, and all classiers were trained using Pattern Recognition Tools (PRTools) version 5.1.1 [29]. All computations in this

(18)

work were programmed and carried out in MATLABTM_{version 8.4.0.150421}

(R2014b) 64-bit (MathWorks Inc., Natick, Massachusetts, U.S.A.) on a com-puter with an Intel R i7 3.6 GHz CPU and 32GB RAM running Microsoft c Windows 7 Professional.

2.7.1. Decision tree classiers

Decision tree classiers are non-parametric: they operate in the exhaus-tive, normalized LIS feature space. The classier creates decision trees which generally contain a top-level root node; one or more intermediate-level inter-nal nodes, if necessary for classication; and multiple bottom-level leaves or terminal nodes [30]. A sample that is to be classied starts at the root node; the root node and internal nodes are decision stages at which one or more features and corresponding split values determine to which lower-level node a sample is forwarded. Eventually a sample reaches a terminal node, at which point it is assigned to a class. A simple example decision tree is visualised in Figure 7. Generally, the features and split values that are used in the model are chosen such that the combination discriminates well among classes [31];

A,B,C A B,C B C root node intermediate nodes terminal nodes feature 1 x feature 1 >x feature 2 >y feature 2 y

Figure 7: Example decision tree which classies a sample into class A, B or C based on two features and two corresponding split values x and y

(19)

a decision tree does not necessarily need to use all features present in the data to obtain 100% classication performance. Many measures exist that can be used to dene the features and split values that best partition the data [31]; in the current work a maximum entropy criterion was used which was implemented in PRtools based on [32].

Decision trees provide readily understandable information on the features that are used by the classier but are very sensitive to overtting on the underlying data since the classier generally operates on a small subset of features. To guard against overtting the trees may be pruned, i.e. one or more of the lower-level nodes may be removed from the tree and split values might be adjusted in order to create a simpler tree which is more likely to capture structure that is inherent in the problem [33]. The decision trees in this work were pruned using a pessimistic strategy dened by Quinlan [34]. 2.7.2. Nearest-neighbour classiers

A nearest-neighbour classier determines the k samples that are closest, according to some metric, to a sample that is to be classied; the classes to which these nearest neighbours (NNs) belong determine the class that is assigned to the query sample [35]. The number of NNs that is taken into account in this process, k, is a parameter of the classier. In this work, a wide range of k = {1 : 25, 27, 29, 31, 34} was explored. The distance metric that was used in this research is the Euclidean distance and samples were assigned to the class of the majority of their k NNs. Nearest neighbour clas-siers are non-parametric and in this research the clasclas-siers operated in the exhaustive, normalized LIS feature spaces; no feature selection was applied. This prevented the samples from being classied based on a small subset of

(20)

features and hence increased model generalizability; however, since the ex-haustive set of features was considered, features merely contributing noise to the classication problem were considered as well, possibly lowering classi-cation performance [35]. Unlike decision tree classiers, k-NN classiers do not provide information on the discriminative power of single features; they do, however, provide an intuitive approach to the classication problem. 2.7.3. PC-LDC

Unlike the decision tree and k-NN classiers, the PC-LD classiers did not operate in the exhaustive, normalized LIS feature space. Instead, the samples were projected onto their principal components (PCs), obtained us-ing Principal Component Analysis (PCA) [36]. A linear discriminant was subsequently trained on the projected data.

PCA is an unsupervised dimension reduction method which constructs PCs, being linear combinations of the original features of the data that cover a maximum proportion of variance across all samples and over the exhaustive original feature space. The rst PC that is constructed covers a maximum proportion of the initial variance; the second PC is dened orthogonal to the rst PC and covers a maximum proportion of remaining variance, et cetera [36]. The method is unsupervised in the sense that sample class labels are not taken into account in the construction of the PCs. In many datasets a large proportion of the total variance can be explained by a only few PCs. By projecting all samples onto the rst x PCs, where x < the number of features, a dataset with reduced dimensionality was obtained, on which the classiers were trained and tested. PCA was performed separately on the training and on the test set. In this work values of x = {10, 25, 40} were

(21)

explored.

Subsequent to PC projection, the LD classier further projects the data onto dimensions that maximize inter-class scatter in order to separate classes as much as possible [37]. The LD classier in PRtools is based on [38, 39, 37] and assumes normal class densities with equal covariance matrices in the LD projection.

PC-LDC may perform well, given that the PC dimension reduction usu-ally reduces noise in the data [37]; depending on the structure of the data, the parametric assumptions of PC-LDC may or may not increase generalizability and classication performance. The approach, however, is less intuitive than k-NN and does not provide an easily comprehendible overview of features that are informative to the classication problems.

3. Results and discussion

All results in this work were produced by classiers which were trained on data produced in a set of laboratory re experiments. Although these experiments were organized such that a reasonable range of substrate mate-rials and ILs were included, they cover only a small part of the variation in complex re debris samples encountered in forensic casework. Consequently, any classier trained on this data might not demonstrate consistent perfor-mance when classifying re debris samples that do not originate from these re experiments. Nevertheless, the results present a proof-of-concept and provide some insight into the space in which the classiers operate.

(22)

Figure 8: The top 100 classiers according to ILR detection, WS classication, GAS classication and LO classication performance, plotted in ROC space. The x-axes represent False Positive Rate (FPR) which is the portion of all true negative samples that is wrongly classied as positive; the y-axes represent True Positive Rate (TPR) which is the portion of all true positive samples that is correctly classied as positive. Ideal classiers are situated in the top left corners of the plots

Figure 8 provides an impression of the type of performance that was encountered in this research. Table 3 summarizes, for each classier family, the top-performing classiers that were tested in this research. ILR detection performance can be considered very good for all classier families; overall

(23)

ILR classication performance was signicantly worse for the tree classiers than for the others. Both maximum detection and maximum classication performance is higher than in previous work [1, 27, 17, 18].

3.1. Performance in dierent region selection strategies

Region selection strategy properties of the top 5% classiers according to detection and classication performance are summarized in Table 4. For all classier families that were tested, top-performing region selection strate-gies according to ILR detection performance were dierent from the top-performing ILR classication strategies. This suggests that a two-step ap-proach to optimizing the combined ILR detection and subsequent ILR clas-sication performance may be desirable for casework application. The mean number of LIS regions in the top detection strategies was around double to quadruple the mean number of LIS regions in the top classication strategies, caused predominantly by a lower number of tiles along the rst chromato-graphic dimension. Dierent chemical information is useful to the classiers in ILR detection compared to ILR classication; the lower mean size of LIS regions used in ILR detection suggests that smaller sets of compounds are indicative of ILR presence whereas for the prediction of ILR class larger sets of compounds are more informative.

Besides a lower mean number of tiles along the rst chromatographic dimension, the top decision tree and k-NN strategies according to ILR clas-sication performance also had a lower mean number of tiles along the second separation dimension compared to the top ILR detection strategies, whereas for PC-LD the opposite was observed. It must be noted that the top ILR detection and classication strategies, regardless of the classier family, do

(24)

Table 3: Properties of the best-performing classiers of each classier family

Best classier according to:

Family Detection performance Detection FPR Overall classication

Tree

Normalization TIS TIS Csum

Regions 40×3 [1 : 1 : 2] × 3 6×4

Pruning Quinlan pessimistic Quinlan pessimistic Quinlan pessimistic

Score 94.8% 4.6% 72.2%

k-NN

Normalization TIS TIS TIS

Regions 30×4 [1 : 1 : 2] × 4 [1 : 1 : 1 : 1 : 2 : 2 : 4 : 4] × 1

k 3 3 20

Score 97.6% 3.9% 87.5%

PC-LD

Normalization Csum Csum TIS

Regions 6 × 1 6 × 1 4 × 1

# PCs used 25 25 10

Score 96.8% 4.2% 87.3%

Table 4: Properties of the top 5% region selection strategies according to detection (det) and classication (class) performance

Classier family Tree k-NN PC-LD

Top 5% according to Det Class Det Class Det Class

Mean detection performance [%] 93.7 89.5 96.6 84.8 96.3 93.9

Mean detection FPR [%] 6.7 12.8 4.8 28.6 5.1 9.0

Mean classication performance [%] 65.3 70.0 67.9 84.2 75.3 85.7

Mean # tiles 30.6 13.1 61.0 16.0 35 16.2

Mean # tiles in dimension 1 10.2 5.0 23.7 7.2 18.2 5.8

Mean # tiles in dimension 2 3 2.6 3.0 2.2 1.8 2.8

Strategies using equal-sized tiles [%] 60 90 98 35 100 43

Strategies using dierent-sized tiles [%] 40 10 2 65 0 56

TIS normalized data [%] 100 60 15 100 6 100

Csum normalized data [%] 0 40 85 0 94 0

(25)

contain some strategies with no tiling along the second dimension (i.e. the entire second dimension is covered by each LIS region), suggesting that in-formation on the polarity of the sample compounds, which is provided by the second separation dimension, is not required for high classication per-formance. However, when examining the top 5% best performing classiers a strong tendency towards multiple tiles in the second chromatographic di-mension was observed. This suggests that a second orthogonal separation mechanism is benecial to the automated classication of complex re debris samples. Additionally, the decision tree and k-NN classiers with the lowest false positive rates (FPRs) on average have a higher region resolution along the second chromatographic dimension than the classiers with highest per-formance, indicating that extensive tiling along the second dimension may be benecial towards securing a low FPR, which is an important property of a classier that is to be used in a forensic setting. Strategies using more than four tiles along the second chromatographic dimension have not been explored but may be able to further decrease FPR.

For the k-NN and PC-LD classiers the proportion of top region selec-tion strategies being of the dierent-sized category is much higher in the classication stage than in the detection stage. For both classier families, a [1 : 1 : 1 : 1 : 2 : 2] × 1 tiling strategy allows for almost similar perfor-mance as an 8 × 1 tiling strategy, while only the rst four tiles along the rst chromatographic dimension of these strategies are corresponding in size. For ILR classication this either suggests that fewer compounds in the second half of the chromatogram are deemed informative, or that the informative compounds in the last half of the chromatogram demonstrate less overlap in

(26)

the calculated LIS features, i.e. they have less overlap in the m/z dimensions. 3.2. Classiers

3.2.1. Decision tree classiers

Decision tree classiers delivered the worst ILR detection and ILR clas-sication performance of all classiers with up to 94.8% ILR detection and 72.3% ILR classication accuracy, respectively. The lowest FPR that was observed for a LIS region selection strategy-decision tree combination was reasonable at 4.6%, but averaged over the top 5% ILR detection classiers this increased to 6.7%. For comparison, the best decision tree classier oper-ating on a TIS (1 × 1) tiled feature space was able to detect ILR with 89.9% accuracy and 10.6% FPR, and classied ILR with 58.2% overall accuracy. The best decision tree ILR detection and ILR classication classiers oper-ated in an extensive 40 × 3 tiled, TIS normalized and a 6 × 4 tiled, csum normalized feature space, respectively. Example trees modelled on this space are depicted in Figure 9. Decision trees provide readily understandable in-formation on the m/z features that are used in classication decisions, their chromatographic location at the resolution of the LIS regions, and their split values based on which the classication decisions are made. Decision trees are very sensitive to overtting on the underlying data which may be guarded against by pruning the decision trees and indeed, as expected, over exten-sive cross-validated testing the pruned decision trees performed better than their unpruned equivalents. The detection trees in Figure 9 demonstrate the eects pruning had on model complexity. It was introduced in Section 2.7.1 that decision tree classiers generally operate on a small subset of features, and the example in Figure 9 shows that the pruned ILR detection tree

(27)

clas-Pruned ILR detection tree til e #21 m/ z 99 0.0 079 157 : ILR+ til e #21 m/ z 99 > 0.0 079 157 : ILR -

ILR detection tree til

e #21 m/ z 99 0.0 080 233 : til e #21 m/ z 99 > 0.0 080 233 : ILR - til e #1 m/z 6 1 0.03 815 1: I LR+ til e #1 m/z 6 1 > 0.03 815 1: til e #1 m/z 3 5 0.08 638 2: I LR+ til e #1 m/z 3 5 > 0.08 638 2: til e #1 m/z 4 1 0.12 214 : ILR+ til e #1 m/z 4 1 > 0.12 214 : til e #1 m/z 3 8 0.07 342 2: I LR+ til e #1 m/z 3 8 > 0.07 342 2: til e #1 m/z 3 5 0.24 977 : ILR -til e #1 m/z 3 5 > 0.24 977 : ILR+ til e #1 m/z 3 5 > 1.85 33: WS

ILR classification tree til

e #1 m/z 5 4 0.04 044 5: til e #1 m/z 5 4 > 0.04 044 5: til e #1 m/z 4 2 0.36 824 : til e #6 m/z 7 8 0.03 581 2: L O til e #6 m/z 7 8 > 0.03 581 2: WS til e #1 m/z 3 9 0.12 015 : til e #1 m/z 3 5 1.69 77: WS til e #1 m/z 3 5 > 1.69 77: WS til e #1 m/z 3 5 1.85 33: til e #1 m/z 3 5 1.71 99: GA S til e #1 m/z 3 5 > 1.71 99: L O til e #1 m/z 3 9 > 0.12 015 : til e #1 m/z 4 0 0.27 042 : L O til e #1 m/z 4 0 > 0.27 042 : GA S til e #1 m/z 4 2 > 0.36 824 : GA S Pruned ILR classification tree til e #1 m/z 5 4 0.04 017 1: til e #1 m/z 5 4 > 0.04 017 1: til e #1 m/z 3 9 0.19 235 : til e #6 m/z 6 2 0.03 176 3: L O til e #6 m/z 6 2 > 0.03 176 3: WS til e #1 m/z 3 5 1.11 967 : WS til e #1 m/z 3 5 > 1.11 967 : L O til e #1 m/z 3 9 > 0.19 235 : GA S Figure 9: Example ILR detection and ILR cl as sica ti on decision trees. The ILR detection trees are mo delled on a 40 × 3 tiled, TIS normalized feature space; the ILR classication trees are mo delled on a 6 × 4 tiled, csum normalized feature space. For eac h stage both an unpruned and a Quinlan pessimistic pruned tree is sho wn

(28)

sies a sample based on a single feature out of the 55920 features present in the modelled space. This can be problematic, also intuitively, when classify-ing new samples, especially if these originate from outside the laboratory re experiments. Even in ILR detection and ILR classication of the laboratory re experiments it seems unwise to base a classication on such a small sub-set of the original feature space, which is evidenced by the lower performance of the decision tree classiers compared to the other classiers.

3.2.2. Nearest neighbour classiers

Nearest neighbour classiers delivered highest performance and lowest FPR of all classiers that were tested. The best-performing ILR detection classier operated in a very extensive, 30 × 4 tiled, TIS normalized feature space and used 3 nearest neighbours in its model, delivering 97.6% accuracy. The extensive tiling means that at the spatial resolution of tiles with a size of [2 min 38 s (rst dimension) × 1 s (second dimension) ] chemical charac-teristics enabling the highest ILR detection accuracy are most prominent.

Considerably more nearest neighbours (k = 20) are used in the top ILR classication model, operating in a [1 : 1 : 1 : 1 : 2 : 2 : 4 : 4] × 1 tiled, TIS normalized feature space and classifying at 87.5% accuracy, suggesting a two-step approach to optimizing the combined ILR detection and subsequent ILR classication performance. Example distributions of these top perform-ing strategies are shown in Figure 10. Average detection FPR of the top 5% k-NN classiers was very reasonable at 4.8% with a minimum of 3.9%. For comparison, the best k-NN classier operating on a TIS (1 × 1) tiled feature space detected ILR with 92.6% accuracy classied with 52.6% overall accuracy. Lowest detection FPR observed with such a strategy was 10.2%.

(29)

Euclidian distance to other samples Number of sa mpl es in bi n 0 0.5 1 1.5 2 2.5 3 0 1 2 3 4 5 WS GAS LO 0 0.2 0.4 0.6 0.8 1

Euclidian distance to other samples 0 5 10 15 Number of sa mpl es in bi n ILR-ILR+

Figure 10: Example distributions of Euclidean distances between a query sample that is to be classied and samples of known classes. Left: ILR detection in a 30×4, TIS-normalized feature space. Right: ILR classication in a gasoline-containing sample in [1 : 1 : 1 : 1 : 2 : 2 : 4 : 4] × 1, TIS-normalized feature space

Introduced in Section 2.7.2, k corresponds to the number of nearest neigh-bours whose classes determine the query sample's class. Models using a low value of k, such as our best-performing ILR detection classier, are sensitive to local variation in the data, which is both a known strength of k-NN clas-siers [28] as well as a source of possible model overtting. Overtting can be guarded against by using higher values of k, which leads to slightly lower classication performance on the training and testing data (e.g. in the 30×4 tiled, TIS-normalized feature space, the ILR detection performance for the dataset drops from 97.5% for k = 3 to 97.0% for k = 6), but might benet model generizability to unknown samples.

3.2.3. PC-LDC

The PC-LDC classier family was able to perform the ILR detection and ILR classication tasks with high accuracy and low FPR. The best-performing PC-LDC ILR detection model operated in a simple 6 × 1 tiled, csum normalized feature space using the projection of the samples on the rst 25 PCs of the data, at 96.8% accuracy and 4.2% FPR. Maximum ILR

(30)

classication of 87.3% was observed for a 4×1 tiled, TIS normalized strategy using the rst 10 PCs of the data. An illustrative example is depicted in Figure 11. For comparison, the best operating on a TIS (1 × 1) tiled feature space detected ILR at an impressive 96.5% accuracy and 4.8% FPR using 25 PCs. Top ILR classication performance using such a tiling strategy was 82.5%. The PC projection caused that query samples were classied based on features that were known to capture almost all variance in our laboratory re samples; consequently, model generalizability towards unknown samples is expected to be better than for the decision tree classiers.

1.5 2 2.5 3 3.5 4 PC1 ×10-5 2.5 3 3.5 4 4.5 PC2 ×10-5 ILR-ILR+

Figure 11: Example linear discriminant tted between ILR+ and ILR− samples in the space of the rst two orthogonal PCs of the data. Samples that are located to the left of the line are classied by the linear discriminant classier as ILR+, samples that are located to the right as ILR−. This principle is generalized to a higher-dimensional space in which the PC-LD classiers operate (e.g. to a 25-dimensional space)

(31)

3.3. Normalization

Regardless of classier type, none of the top 5% ILR detection and ILR classication classiers performed on autoscale-normalized data. This is sur-prising for the PC-LD classiers because autoscaling is a customary data pre-treatment method in PCA since PCA is a least squares method and autoscaling gives each feature the same inuence on the model [36]. Good ILR detection and reasonable ILR classication of up to 96.0% and 77.6%, respectively, was observed for PC-LD classiers operating in autoscaled fea-ture spaces; these numbers were simply not high enough for the PC-LD classiers to be considered among the top 5%. PCA is sensitive to outliers and autoscaling might scale up noisy features if these features are almost constant in variance [36], which might explain the sub-optimal performance in autoscaled feature spaces.

The top 5% k-NN and PC-LD ILR detection classiers operated mainly on csum-normalized data, whereas the top 5% ILR classication classiers of these types operated solely on TIS-normalized data. The main dierence between csum- and TIS-normalized data is that the relative ratios of m/z channels represented in the data are constant in the latter but not in the rst. In ILR detection, the relative mass abundancies at some chromato-graphic resolution seem to relate to certain compounds that are indicative of ILR presence, which is further evidenced by the much smaller mean size of tiles in ILR detection vs ILR classication, capturing variation in the data at a more local level and hence being able to successfully capture information on a small set of compounds. It was suggested in Section 3.1 that lower chromatographic resolution benets ILR classication because larger sets of

(32)

compounds seem indicative of ILR class compared to ILR presence. In agree-ment with this, here it is suggested that when summing mass abundancies over a larger chromatographic space, any noise in the relative mass abun-dancies present in csum-normalized data compounds to a level at which it starts to be too dominating compared to TIS-normalized data to be used for top-performing ILR classication. Even so, classiers operating in csum-normalized space were still able to reach 82.5% ILR classication perfor-mance.

In the decision trees a reverse trend is observed with the top ILR detection classiers operating on TIS-normalized data and the top ILR classication models on both TIS- and csum-normalized data. Decision trees tend to focus on a very small number of features and, being able to select the most informative features, they suer less from compounding noise in the data. 3.4. Future work

In this work no (serious) retention time shifting issues were observed. Sample preparation was performed in a consistent manner and all data was measured under similar circumstances, on the same analytical instrument. Since representing chromatographic data as LISes entails a great dimension-ality reduction and thus a great reduction in storage space, LISes seem to be a good candidate for inter-laboratory and database GC×GC−MS comparison. However, retention time shifting issues are expected to arise when comparing GC×GC−MS data obtained from dierent sources or obtained on the same machine under dierent parameters or over time [40], which might also be reected in the LISes generated from that data. These issues can be obvi-ated by injecting a mix of unique anchor compounds to the samples during

(33)

sample preparation so that each sample contains a set of compounds whose absolute location in the chromatographic space might be dierent when anal-ysed under dierent parameters, but whose relative location will remain the same (given that similar separation columns are used). These anchors with variable location can be used to dene LIS regions that are not necessarily similar in absolute shape or size but that do cover a similar chromatographic range. Since the original chromatographic location is reduced to a location at the resolution of the LIS regions, the LISes of two such corresponding regions will be comparable.

In Appendix C a list of candidate deuterated anchors and some possible LIS region selection strategies that make use of these anchors are presented. 4. Conclusions

The results presented in this work demonstrate the applicability of the LIS approach for the detection and classication of ILR in re debris samples. While this research constitutes exploratory research on this matter, many valuable insights have been gained.

The dierences between tiling strategies that enable optimal ILR detec-tion and ILR classicadetec-tion performance in the dataset used in this work re-veals that distinct chemical markers are useful in accomplishing these tasks. The large variation observed in performance for classiers operating under dierent parameters on dierently tiled and normalized data provides sug-gestions on how to organize a fully automated workow providing excellent classication results which might, in the future, be developed. Using Lo-cal Ion Signatures instead of Total Ion Signatures leveraged ILR detection

(34)

and ILR classication performance for all tested classiers with the excep-tion of ILR detecexcep-tion using PC-LDC. Perhaps most importantly, it has been demonstrated that high quality, reliable ILR detection and ILR classication results can be achieved without the need for (complex) peak integration. GC×GC−MS data can be vastly reduced in dimensionality while preserving important chemical characteristics by intelligent computation of Local Ion Signatures.

5. Acknowledgments

The author would like to thank Martin Lopatka for all the supervision and guidance that he provided over the course of this work; Arian van Asten for stimulating discussions on the matter; Andjoe Sampat for performing the laboratory re experiments and collecting the analytical data; and the people of the Harynuk group at the University of Alberta for collecting (preliminary) analytical data on the samples with added deuterated anchors.

(35)

References

[1] M. Lopatka, Statistical interpretation of chemical evidence pertaining to re debris, Ph.D. thesis, University of Amsterdam, iSBN: 9789402801842 (June 2016).

URL http://hdl.handle.net/11245/1.535144

[2] CBS, Brandweerstatistiek 2013, Centraal Bureau voor de Statistiek, Den Haag, 2014.

[3] Verbond van Verzekeraars, Brandweer en verzekeraars slaan handen in-een (2015).

[4] J. Baerncopf, K. Hutches, A review of modern challenges in re debris analysis, Forensic Science International 244 (2014) e12e20. doi:10.1016/j.forsciint.2014.08.006.

[5] L. I. Ying-yu, L. Dong, S. Hao, An Analysis of Background In-terference on Fire Debris, Procedia Engineering 52 (2013) 664670. doi:10.1016/j.proeng.2013.02.203.

[6] C. Cherry, J. J. Lentini, J. A. Dolan, The Petroleum-Laced Background, Journal of Forensic Science 45 (5) (2000) 968989.

[7] S. Jhaumeer-Laulloo, J. Maclean, L. Ramtoola, K. Duyman, A. Toofany, Characterisation of Background and Pyrolysis Products that May Inter-fere with Forensic Analysis of Fire Debris in Mauritius, Pure and Applied Chemical Sciences 1 (2) (2013) 5161.

(36)

[8] ASTM International, ASTM E1618-14, Standard Test Method for Ig-nitable Liquid Residues in Extracts from Fire Debris Samples by Gas Chromatography-Mass Spectrometry, Vol. i, West Conshohocken, PA, 2014. doi:10.1520/E1618-11.2.

[9] G. S. Frysinger, R. B. Gaines, Forensic analysis of ignitable liquids in re debris by comprehensive two-dimensional gas chromatography., Journal of forensic sciences 47 (3) (2002) 471482. doi:10.1520/JFS15288J. [10] C. M. Taylor, A. K. Rosenhan, J. M. Raines, J. M. Rodriguez,

An Arson Investigation by using Comprehensive Two-dimensional Gas Chromatography-Quadrupole Mass Spectrometry, Journal of Forensic Research 3 (169). doi:10.4172/2157-7145.1000169.

[11] K. L. Organtini, A. L. Myers, K. J. Jobst, J. Cochran, B. Ross, B. Mc-Carry, E. J. Reiner, F. L. Dorman, Comprehensive characterization of the halogenated dibenzo-p-dioxin and dibenzofuran contents of residen-tial re debris using comprehensive two-dimensional gas chromatogra-phy coupled to time of ight mass spectrometry, Journal of Chromatog-raphy A 1369 (2014) 138146. doi:10.1016/j.chroma.2014.09.088. [12] L. Adutwum, J. Harynuk, Unique ion lter: A data reduction tool

for gc/ms data preprocessing prior to chemometric analysis, Analyti-cal chemistry 86 (15) (2014) 77267733.

[13] B. Tan, J. K. Hardy, R. E. Snavely, Accelerant classication by gas chromatography/mass spectrometry and multivariate pattern recogni-tion, Analytica Chimica Acta 422 (1) (2000) 3746.

(37)

[14] E. S. Bodle, J. K. Hardy, Multivariate pattern recognition of petroleum-based accelerants by solid-phase microextraction gas chromatography with ame ionization detection, Analytica chimica acta 589 (2) (2007) 247254.

[15] W. N. M. Desa, N. N. Daéid, D. Ismail, K. Savage, Application of un-supervised chemometric analysis and self-organizing feature map (sofm) for the classication of lighter fuels, Analytical chemistry 82 (15) (2010) 63956400.

[16] M. Monfreda, A. Gregori, Dierentiation of unevaporated gasoline sam-ples according to their brands, by spmegcms and multivariate statis-tical analysis, Journal of forensic sciences 56 (2) (2011) 372380.

[17] E. E. Waddell, E. T. Song, C. N. Rinke, M. R. Williams, M. E. Sigman, Progress toward the determination of correct classication rates in re debris analysis, Journal of forensic sciences 58 (4) (2013) 887896. [18] E. E. Waddell, M. R. Williams, M. E. Sigman, Progress toward the

determination of correct classication rates in re debris analysis ii: utilizing soft independent modeling of class analogy (simca), Journal of forensic sciences 59 (4) (2014) 927935.

[19] N. A. Sinkov, P. M. L. Sandercock, J. J. Harynuk, Chemometric classi-cation of casework arson samples based on gasoline content, Forensic science international 235 (2014) 2431.

[20] M. R. Williams, M. E. Sigman, J. Lewis, K. M. Pitan, Combined target factor analysis and bayesian soft-classication of

(38)

interference-contaminated samples: forensic re debris analysis, Forensic science in-ternational 222 (1) (2012) 373386.

[21] E. Szymanska, J. Gerretzen, J. Engel, B. Geurts, L. Blanchet, L. M. C. Buydens, Chemometrics and qualitative analysis have a vibrant re-lationship, TrAC - Trends in Analytical Chemistry 69 (2015) 3451. doi:10.1016/j.trac.2015.02.015.

[22] P. Sandercock, E. D. Pasquier, Chemical ngerprinting of unevaporated automotive gasoline samples, Forensic Science International 134 (1) (2003) 1 10. doi:http://dx.doi.org/10.1016/S0379-0738(03)00081-1. [23] W. S. Wingert, G.c.-m.s. analysis of diamondoid

hydrocar-bons in smackover petroleums, Fuel 71 (1) (1992) 37 43. doi:http://dx.doi.org/10.1016/0016-2361(92)90190-Y.

[24] H. C. Reinardy, A. G. Scarlett, T. B. Henry, C. E. West, L. M. Hewitt, R. A. Frank, S. J. Rowland, Aromatic naphthenic acids in oil sands process-aected water, resolved by gcxgc-ms, only weakly induce the gene for vitellogenin production in zebrash (danio rerio) larvae, Envi-ronmental science & technology 47 (12) (2013) 66146620.

[25] R. Shellie, P. Marriott, P. Morrison, Comprehensive two-dimensional gas chromatography with ame ionization and time-of-ight mass spectrom-etry detection: qualitative and quantitative analysis of west australian sandalwood oil, Journal of chromatographic science 42 (8) (2004) 417 422.

(39)

[26] A. M. Hupp, L. J. Marshall, D. I. Campbell, R. W. Smith, V. L. McGuf-n, Chemometric analysis of diesel fuel for forensic and environmental applications, Analytica Chimica Acta 606 (2) (2008) 159171.

[27] M. E. Sigman, M. R. Williams, J. A. Castelbuono, J. G. Colca, C. D. Clark, Ignitable Liquid Classication and Identication Using the Summed-Ion Mass Spectrum, Instrumentation Science & Technology 36 (4) (2008) 375393. doi:10.1080/10739140802151440.

[28] M. Lopatka, M. E. Sigman, M. J. Sjerps, M. R. Williams, G. Vivó-Truyols, Class-conditional feature modeling for ignitable liq-uid classication with substantial substrate contribution in re de-bris analysis, Forensic Science International 252 (2015) 177186. doi:10.1016/j.forsciint.2015.04.035.

[29] R. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. De Ridder, D. Tax, S. Verzakov, A matlab toolbox for pattern recognition, PRTools version 5.1.1.

[30] P. H. Swain, H. Hauska, The decision tree classier: Design and poten-tial, Geoscience Electronics, IEEE Transactions on 15 (3) (1977) 142 147.

[31] P.-N. Tan, M. Steinbach, V. Kumar, Intro. to data mining, Michigan State University and University of Minnesota (2006) 207223.

[32] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classication and regression trees, CRC press, 1984.

(40)

[33] J. R. Quinlan, Induction of decision trees, Machine learning 1 (1) (1986) 81106.

[34] J. R. Quinlan, Simplifying decision trees, International journal of man-machine studies 27 (3) (1987) 221234.

[35] P. Cunningham, S. J. Delany, k-nearest neighbour classiers, Multiple Classier Systems (2007) 117.

[36] S. Wold, K. Esbensen, P. Geladi, Principal component analysis, Chemo-metrics and intelligent laboratory systems 2 (1-3) (1987) 3752.

[37] C. Liu, H. Wechsler, Robust coding schemes for indexing and retrieval from large face databases, IEEE Transactions on Image Processing 9 (1) (2000) 132137.

[38] R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classication Second Edition, John Wiley & Sons, 2001.

[39] A. R. Webb, Statistical pattern recognition, John Wiley & Sons, 2003. [40] V. G. van Mispelaar, Chromametrics, Amsterdam: Universiteit van

Am-sterdam, 2005.

[41] M. Woldegebriel, J. Gonsalves, A. C. van Asten, G. Vivó-Truyols, Ro-bust bayesian algorithm for targeted compound screening in forensic toxicology, Analytical Chemistry.

(41)

Appendix A: Analytical methodology

This section was reproduced from [1] with permission from the original authors.

Sample preparation

Fire debris samples were rst analyzed according to a standard screening protocol. In order to perform headspace analysis, the jars were placed in an oven at 70◦_{C for at least 4 hours. Subsequently, a screening measurement was}

performed using gas chromatography coupled to a ame-ionization detector (GC-FID) to determine the appropriate sampling volume for GC×GC−MS extraction. Extracts for GC×GC−MS analysis were prepared by drawing a suitable headspace volume through a sorbent tube containing activated charcoal using a 100 mL syringe. Samples were collected by removing both ends of the sorbent tube using a glass cutter and connecting a needle on one end and the syringe on the other, using two pieces of silicon tubing. After pulling sample headspace through the tube, it was cut through and the charcoal contents were collected in plastic microcentrifuge vials. Each 1mL vial of dichloromethane (DCM) (Biosolve, Valkenswaard, The Netherlands) was spiked with 0.01 mg/mL chlorobenzene as an internal standard (IS) (Grace-Alltech, Breda, The Netherlands). After centrifuging for 15 min (at 13000 rpm) the supernatant was transferred using a glass Pasteur pipette to a GC vial, capped and stored for further analysis.

The complete composition of each experimental burn is provided in Ap-pendix B. The three types of ignitable liquids (WS, GAS, LO) were each used in 15 burn experiments. DCM was used for desorption from the

(42)

char-coal adsorption tubes. Neat ignitable liquids were also analyzed by diluting these with spiked DCM to a nal concentration of 0.01 mg/mL.

GC−FID screening

GC-FID screening analyses were performed on an Agilent Technologies 6890A Network GC System equipped with an Agilent Technologies 7683B Series Injector and an Agilent Technologies 7683B Series Autosampler (all from Agilent, Santa Clara, CA, USA). An Agilent medium-polarity DB-624 (6% cyanopropyl-phenyl, 94% dimethyl polysiloxane) column (30 m × 320 µm i.d.; 1.8 µm lm thickness) was used. The injection volume was 0.5 mL of headspace with a split ratio of 20 : 1. Helium was used as the carrier gas at a constant ow rate of 2 mL/min. The following temperature program was applied for the separation: 80◦_{C with a 2 min hold, followed by a linear ramp}

of 40◦_{C/min to 255}◦_{C with a 5 min hold. Both the injector temperature and}

the detector temperature were set to 250◦_C.

GC×GC−MS analysis

GC×GC−MS analyses were carried out on an Agilent Technologies 6890N GC System with a LECO dual-stage, quad-jet thermal modulator coupled to a Pegasus III ToF-MS (LECO, St. Joseph, MI, USA). The sample injec-tion order was randomized and samples were run in batches over the course of 4 weeks. A Gerstel (Mülheim an der Ruhr, Germany) autosampler was used for all injections. An Agilent DB-1 (100% dimethyl polysiloxane) rst-dimension column (30 m× 250 µm i.d.; 0.5 µm lm thickness) was used in combination with an Agilent DB-17 ((50%-Phenyl)-methyl polysiloxane) second-dimension column (1 m × 100 µm i.d.; 0.2 µm lm thickness). The

(43)

columns were connected via a universal press-tight connector (Restek, USA). For the analysis, extracts were injected as such. Splitless injections were car-ried out for the extracts and performance test mixtures with an injection volume of 1 µL. For the analysis of the diluted intact ignitable liquids split injections were carried out with an injection volume of 1 µL and a split ratio of 30 : 1. The temperature program of the PTV injector started with a tem-perature of 40◦_{C with a hold time of 0.1 min, followed by a linear ramp of}

12◦C/s to a nal temperature of 250◦C. Helium was used as the carrier gas at a constant pressure of 110 kPa. The temperature for the rst-dimension separation was initially set at 45◦_{C for 0.5 min, followed by a linear ramp}

of 1◦_{C/min to a temperature of 80}◦_{C, followed by a second linear ramp of}

3◦C/min to a temperature of 130◦C, and nally a linear ramp of 5.5◦C/min to a nal temperature of 255◦_{C with a hold time of 10 min. An oset of}

+5◦C was used for a parallel temperature program in the second column oven. The modulator temperature oset was 15◦_{C. The inlet temperature}

was held at 250◦_{C. A modulation time of 4 s was used during the entire}

run with a hot-pulse duration of 400 ms. The MS transfer-line temperature was maintained at 225◦_{C. The ion-source temperature was 250}◦_{C with an}

electron ionization (EI) energy of 70 eV. A solvent delay of 350 seconds was used. The acquisition rate was 200 scans per second, covering a 35-500 m/z range.

(44)

Appendix B: Details of sample composition of laboratory re ex-periments

EXP# Primary EXP# IL Bottle#

1 Floor skirting (painted and glued) 1 1 10 12 16 18 46 1 WS 1

2 Pine floor (treated) 2 2 7 10 17 19 47 2 WS 2

3 Oak floor (untreated) 3 3 7 12 15 17 48 3 WS 3

4 Laminate flooring 4 4 10 13 16 18 49 4 WS 4

5 Carpet material 5 5 3 6 17 19 50 5 WS 5

6 Under carpet material 6 7 2 4 6 10 51 6 WS 6

7 Area rug 7 8 9 13 15 18 52 7 WS 7

8 Vinyl flooring 8 9 4 12 16 18 53 8 WS 8

9 Curtains 9 10 1 5 6 17 54 9 WS 9

10 Gypsum board (painted) 10 11 7 9 18 19 55 10 WS 10

11 Ikea particle board material 11 12 5 6 16 18 56 11 WS 11

12 Sofa 12 13 2 5 12 14 57 12 WS 12

13 Wooden chair 13 14 4 7 9 11 58 13 WS 13

14 Mattress 14 17 1 2 9 13 59 14 WS 14

15 Magazines 15 19 7 10 13 15 60 15 WS 15

16 Newsprint 16 1 8 11 12 19 61 16 GAS 1

17 Computer monitor case 17 2 11 14 15 16 62 17 GAS 2

18 Clothing (mix of materials) 18 3 6 7 14 19 63 18 GAS 3

19 Electric cable and extension socket 19 4 12 14 18 19 64 19 GAS 4

20 5 2 3 6 16 65 20 GAS 5

21 7 3 12 17 18 66 21 GAS 6

WS White Spirit 22 8 5 14 15 17 67 22 GAS 7

GAS Gasoline 23 9 4 9 11 15 68 23 GAS 8

LO Lamp Oil 24 10 1 9 13 18 69 24 GAS 9

25 11 3 10 13 16 70 25 GAS 10 26 12 7 11 13 14 71 26 GAS 11 27 13 9 12 14 19 72 27 GAS 12 28 14 3 7 11 19 73 28 GAS 13 29 17 5 9 13 18 74 29 GAS 14 30 19 5 8 15 17 75 30 GAS 15 31 1 6 10 14 16 76 31 LO 1 32 2 5 13 15 18 77 32 LO 2 33 3 10 14 15 17 78 33 LO 3 34 4 11 12 13 14 79 34 LO 4 35 5 10 14 16 18 80 35 LO 5 36 7 10 11 16 19 81 36 LO 6 37 8 1 5 13 14 82 37 LO 7 38 9 2 13 16 19 83 38 LO 8 39 10 3 5 14 16 84 39 LO 9 40 11 8 12 16 17 85 40 LO 10 41 12 5 6 14 15 86 41 LO 11 42 13 2 14 15 16 87 42 LO 12 43 14 1 3 16 19 88 43 LO 13 44 17 5 6 9 10 89 44 LO 14 45 19 3 12 16 17 90 45 LO 15 Substrate Ignitable Liquids Secondary Substrate mixture burns

Substrate composition Substrate composition equal to EXP#

Added IL Substrate mixtures with IL burns

(45)

Appendix C: Anchored LIS approach

A set of 14 deuterated compounds were selected to elute over a wide range in both the rst and second chromatographic dimension which corresponded to a 30 m×0.25 µm, 1 µm Rtx-5MS (Restek, Bellefonte, PA) column and a 1 m×0.10 mm, 0.1 µm Rxi-17 (Restek) column, respectively. They are sum-marized in Table ??. An example chromatogram with marked locations of these anchors is shown in Figure 12.

Table 5: Suggested deuterated compounds to be used in a anchored LIS approach

Analyte name CAS number

Anthracene-d10 120-12-7 Benzene-d6 1076-43-3 Benzopheone-d10 112-61-9 Methylnaphthalene-d10 38072-94-5 n-Heneicosane-d44 39756-37-1 n-Heptadecane 39756-35-9 n-Heptane 33838-52-7 n-Nonadecane-d40 39756-36-0 n-Nonane-d20 121578-118 n-Pentadecane-d32 36340-20-2 n-Tridecane 121578-12-9 n-Undecane-d24 164858-54-2 Naphthalene-d8 1146-65-2 Tolune-d8 2037-26-5 Anchor detection

In order to facilitate anchored LIS region selection, the locations of these anchors have to be determined for every chromatogram. The rst step in this procedure is to obtain reference retention time values for each anchor. This step consists of performing a chromatographic separation of a solution

(46)

0 5 10 15 20 25 30 35 40 45 50

Retention time d1 (minutes)

0.0 0.5 1.0 1.5 2.0

Retention time d2 ( seconds)

Figure 12: Locations of the deuterated anchors in a GC×GC−MS chromatogram, marked by white circles

containing the anchors and by determining the retention times of the apexes of the anchors from this chromatogram.

The second step in the anchor detection procedure is performed by a method adapted from the targeted compound screening method presented in [41]. Briey, this method entails dening a probability density function for each pixel in the chromatographic space such that regions closer to the expected location of an anchor compound (i.e. its reference retention time obtained in the rst step) are assigned a higher probability and regions fur-ther away from this point have a monotonically decreasing probability as-signment. Rather than using a distance in the mass domain, we dene a probability density function for mass similarity between the pixels in the chromatographic space (and their respective mass channel vector) and the

(47)

First chromatographic dimension Second chro mato-gr aphic di mension 0

Figure 13: LIS region selection strategy using the r closest points to anchors present in the samples. Anchors are depicted by asterisks. Example shown for an arbitrary value of r

reference spectra expressed as the the likelihood of a particular cosine dis-tance between like-samples. This way, each point in the chromatogram may be evaluated based on its chromatographic proximity to the expected loca-tion of a particular anchor and its likelihood of exhibiting a particular mass similarity to a particular anchor. The successful application of this technique allows the determination of the locations of the anchors in all experimental chromatograms of the three datasets.

Anchored LIS region selection strategies

One of the LIS region selection strategies that could be explored denes the LIS regions as the closest r points in the chromatographic space to the anchors, visualized as equal-sized ovals for an example strategy in Figure 13. One could also dene irregular tetragons between the 14 anchors such as depicted in Figure ??. By projecting the anchors onto the edges of the chromatogram as in Figure ??, an additional 18 tetragons are dened.

(48)

First chromatographic dimension Second chro mato-gr aphic di mension 0

Figure 14: LIS region selection strategy using 7 irregular tetragonal regions dened around anchors present in the sample. Anchors are depicted by asterisks

Second chro mato-gr aphic di mension 0

Figure 15: LIS region selection strategy using 25 irregular tetragonal regions dened around the anchors present in the sample. Anchors are depicted by asterisks