The effects of parameter choice in ICA-FIX preprocessing

(1)

The effects of parameter

choice in ICA-FIX

preprocessing

Introduction

Since the implementation of the first fMRI neuroimaging technique was adopted in the early nineties (Ogawa, Lee, Kay, & Tank, 1990) it has played an increasingly important role as one of the main tools for cognitive scientists to test cognitive phenomena in a laboratory setting. fMRI research offers high spatial resolution with negligible negative effects for participants, thus being more feasible as a study method

compared to invasive techniques (Haynes, 2015). The host of topics studied using fMRI techniques has also steadily increased, ranging from academic research into affective networks in the brain (Feldman Barrett, Bhaskar Satpute, & Feldman, 2013) to finding neurological markers for diseases such as Alzheimer

(Griffanti, Dipasquale, Laganà, & Nemni, 2015). This surge in complexity of research topics has consequently led to greater scrutiny of the validity of research method, as well as a wider discrepancy in how researchers deal with the lack of convention on how to treat and prepare fMRI data.

One major contention within the neuroscientific field is the question of how to correctly

preprocess fMRI data before doing any final

analysis (Caballero-Gaudes & Reynolds, 2016; Liu, 2016; Shaw et al., 2003). To preprocess fMRI means to transform the original data point in a way that theoretically improves your signal to noise ratio (Caballero-Gaudes & Reynolds, 2016). Motion artifacts, changes in signal strength due to unwanted movement during data collection, would be an example of noise that would be taken out during preprocessing. While many different preprocessing steps exist, it is important to note that all such steps would constitute transformation of the original data, theoretically increasing the signal to noise ratio. This assumption, however, has been questioned since the very inception of fMRI research (Ogawa et al., 1990). Very little agreement has yet been reached on what the optimal choices in preprocessing are (Caballero-Gaudes & Reynolds, 2016), even for singular issues that are sub-parts of the step of preprocessing, such as nuisance regression, a multitude of ‘optimal’ methods exist, such as FIX-ICA (Griffanti et al., 2014), RETROICOR (Glover, Li, & Ress, 2000), GLMdenoise (Kay, Rokem, Winawer, Dougherty, & Wandell, 2013), etc. The resulting variance created by choosing different preprocessing tools can vary wildly (Pruim et al., 2015; Shirer, Jiang, Price, Ng, & Greicius, 2015). Different techniques can easily lead to a change in

results, thus putting into question the validity of both the method of preprocessing as well as the research itself. Beyond differences in method of

(2)

preprocessing, it could be argued that any transformation of the original data is problematic since there is a possibility that signal is removed rather than noise. This is especially true for highly affective tasks, where nuisance variables such as the heart rate could reasonably be thought to correlate with conditions evoking an emotional response (Iacovella & Hasson, 2011). Many such preprocessing issues exist, but most can be viewed through six general points of discussion: 1: using affective data, 2: using physiological data versus classifier based preprocessing approaches, 3: pluriformity of preprocessing tools, 4: liberal versus conservative approaches to preprocessing, 5: the use of data specific preprocessing tools vs data general

preprocessing tools, 6: and finally, the

overarching lack of agreed upon tools to study the effectiveness of preprocessing, which this study will propose a solution for.

Affective data

Working with affective stimuli poses several issues when considering preprocessing. As mentioned before, respiratory and cardiac signals can reasonably be assumed to positively correlate with the level of affectivity of any particular stimuli (Iacovella & Hasson, 2011). Thus, when regressing such noise sources, signal is likely to be reduced as well, which decreases any variability when contrasting affective conditions versus neutral conditions.

Furthermore, affective stimuli are thought to work on a different scale then basic sensory input, working on a network level rather than separate brain regions (Feldman Barrett et al., 2013). This difference in effective scope becomes problematic when considering preprocessing steps that work on the brain as a whole, such as proportional global signal scaling (Junghöfer, Schupp, Stark, & Vaitl, 2005). Proportional global signal scaling is a

preprocessing step in which global variations, which is assumed to be representing

background processes rather that task specific signal, across the brain are regressed. As in the case of respiratory and cardiac noise, however, affective tasks tend to correlate with such global variance, especially in contrast to basic sensory input. Therefore, this regression reduces any true variance that occurs as a product of experimental conditions and increases the chance of erroneous results. An example of how this works in practice would be the recent findings concerning the role of the amygdala in emotional processing. Evidence has recently been found that vascular structure might have been influencing research into affective cognitive phenomena, especially those linked to the amygdala (Boubela et al., 2015). Boubela et al. found that many of the results into emotional processing being linked to the amygdala could be attributed to venous drainage previously masked due to

(3)

preprocessing steps such as global signal scaling or cardiac regression. The current research aims to shed light on the effects of using affective data in preprocessing by training a classifier based cleaning procedure on affective data, as well as non-affective data separately.

Physiology vs classifiers

A further point of contention concerning

preprocessing is whether it is better to regress a transformed version of obtained physiological data, or to use classifier based algorithms to remove noisy variance. As mentioned before, many methods for removing nuisance variables exist, but all such methods can be divided into physiology based approaches and classifier based approaches (Caballero-Gaudes &

Reynolds, 2016). Early preprocessing tools were often physiologically based, the most well know of which is RETROICOR (Glover et al., 2000). RETROICOR fits low-order Fourier

transformations of the measured cardiac and respiratory cycles to the corresponding fMRI data, which is then regressed out. These approaches are still widely popular today because of the theoretical simplicity of the method. Criticism is often given on the high reliance on good physiological data, as well as the possible low relevance of cardiac and respiratory data in noise variance (Caballero-Gaudes & Reynolds, 2016). In response to these criticisms, many new approaches have adopted classifier algorithm to detect noise sources in

their data. One well known such method is ICA-FIX (Griffanti et al., 2014; Salimi-Khorshidi et al., 2014). Individual component analyses (ICA) deconstructs possible datasets into a set amount of distinguishable components, which can then be marked as either noise or signal. FIX-ICA can then train a classifier to distinguish these chosen noise components which can be applied to a larger dataset, possibly even to other datasets. The component classified as noise will then be regressed from the data. These types of approaches allow for a hands-on view of the data, which should translate to be more specific regression of noise. Criticism is often aimed at the high level of expertise necessary for correctly classifying components, as well as the difficulty in determining the degree to which signal is regressed together with the noisy components (Caballero-Gaudes & Reynolds, 2016)

Pluriformity of methods

Indeed, both types of nuisance regressions have their strengths and criticisms, yet this does not necessarily have to be problematic. Similarly, in neuroscience as a field, many methods of research exist in tandem; all with their respective criticisms, but not in competition with one another. In the case of preprocessing tools, however, many papers exist discussing the superiority of any possible choice made during preprocessing (Griffanti et al., 2014; Kay et al., 2013; Pruim et al., 2015). This superiority

(4)

often relies on a proxy measurement of signal to noise ratio, as defined by the researchers (Liu, 2016; Power, 2016). Many attempts at defining this concept have been proposed, yet many have pointed out that such methods are never more than proxies, and often unreliable (Caballero-Gaudes & Reynolds, 2016). Thus, the best an fMRI researcher can currently do, is to go through all the existing literature, and weigh all pros and cons according to their own

research scheme. This, of course, becomes extremely time-consuming, and thus unfeasible, due to the large variety in choices and options. Conservative & Liberal

To further narrow down the problem of pluriformity, preprocessing choices can generally be split up into two groups, conservative and liberal approaches. These terms refer to the approach taken to the

importance of signal and noise, where emphasis is placed either on whether signal should be kept in as much as possible, or noise removed as much as possible respectively. As an example, conservative approaches generally leave out any conventionally optional preprocessing measures (e.g. global signal scaling, 24 motion regressors instead of 6, etc.), while liberal approaches would otherwise apply these tools. As no true measurement of signal to noise exists, these types of approaches are purely theoretical, and many criticisms of either choice exist. This research aims to shed light on

this issue by including a threshold parameter for transformation of the data, to measure the effects of conservative or liberal approaches. Data specific & data general

Additionally, much has been written on the use of data-specific preprocessing measures rather than data-general tools. This divide touches on many of the previously mentioned points as well. This is exemplified by the fact that

physiological data driven tools are by definition data-specific, whereas classifier based tools can be both data-specific as well as data-general, as classifiers can be trained on a subset of the data, or a priori respectively. Similarly, many of the choices in preprocessing are naturally discussed in the context of data-specific or data-general superiority. Most importantly in this discussion, it is believed that data-specific tools generally improve analysis performance, while data-general tools are theoretically more valid, as noise should be generalizable in many cases. This again comes down to the theoretical notion of signal to noise trade-off, since

improved results do not necessarily mean improved signal to noise ratios, and might be less reproducible and thus less valid. This research will aim to shed light on this issue by introducing a third train-set for the

preprocessing classifier, namely a combined train set of affective and non-affective data. This set, being trained on data-specific as well as non-specific data, is then representative for

(5)

the data-general approaches, as the classifier should theoretically perform on a more pluriform set of data.

Lack of consensus

Finally, an issue that encompasses all previous contentions is that there is currently no consensus on the method to compare

effectiveness of preprocessing tools(Caballero-Gaudes & Reynolds, 2016; Liu, 2016). While all methodological tools that claim superior effectiveness will point to one or more papers to prove their claim, the methods to arrive at such a conclusion varies widely. An example of this would be the 2014 paper by

Salimi-Khorshidi et al. which discusses the usefulness of ICA-FIX in the preprocessing of standard fMRI datasets as well as high quality fMRI dataset. In this study, the researchers used a ‘leave one out’ (LOO) testing procedure to test for classifier accuracy on detecting noisy

components. The LOO method is more widely used for this type of testing, yet quite

specifically in classifier based analyses. Alternatively, RETROICOR’s effectiveness was originally rated on the increase in signal strengths in region seemingly being activated prior to preprocessing (Glover et al., 2000). There are many more such methodologies, however, since they vary so wildly in their scope of inquiry, differences in resulting claims

become un-interpretable. Multiple attempts have been made to introduce a method for

determining quality of the fMRI data. One noteworthy example of such an attempt is the processed method by Jonathan Power (2016). While a complete explanation of this method exceeds the scope of this paper, it is important to note that this method has been positively received by the neuroscientific community, but is yet to be used for any comprehensive assessment of preprocessing, partly due to its complexity, both in practice as well as

interpretability when comparing preprocessing methods.

Instead, this paper will aim to devise and apply a novel method for comparing preprocessing tools and parameter choices, namely the use of a classifier based preprocessing tool that is trained on multiple occasions with different datasets, then applied on a non-affective fMRI dataset using a varying degree of rigorousness. The cleaned-up data will then be used in a multi voxel pattern analysis, which results in an interpretable outcome that can shed light into some of the abovementioned issues.

FIX-ICA

FMRIB’s ICA-based Xnoiseifier, or FIX is a popular auto-classifier based preprocessing tool (Griffanti et al., 2014; Salimi-Khorshidi et al., 2014). The classification specification used when applied to data is created from a hand-labeled sub-set of the data or similar dataset

(6)

seemingly representative of the applied to dataset. This chosen subset is hand-labeled on a single session Multivariate Exploratory Linear Optimized Decomposition into Individual Components, or MELODIC for short (Griffanti et al., 2017). MELODIC divides a complex 4D data structure into discriminable components with varying rates of explained variance to the total dataset. These components can then be hand-labeled as noise or signal, according to guidelines provided by the Salimi-Khorshidi et al. explanatory paper published in 2014, as well as expert knowledge (figure 1). The resulting hand-made classification is then used as a proxy to automatically classify components in the FIX cleanup.

FIX offers multiple parameter choices for how the subsequent cleanup process is applied, how strict the cleanup is, as well as options to test for successful classification. Firstly, a threshold level is required as an input ranging from 0 (extremely liberal clean-up) to 100 (extremely conservative cleanup), with recommended threshold scores being between 5 and 20. Secondly, an extra motion confound correction can be applied using a high pass filter. This could increase data quality if the classifier is not correctly trained on motion noise. Finally, the option is given to apply full variance cleanup (aggressive), or to take out unique variance cleanup (non-aggressive). This implies that subsequently, the components classified as

noise are taken out as transformations completely, or that the data is transformed according to the degree to which it conforms to the classifier determination of noisiness

respectively.

The main criticism of FIX is its high reliance on good training sets since the effectiveness of the automated process is determined by the comparability between train and applied set.

(7)

Similarly, it relies heavily on the quality of initial classification, since the classifier relies entirely on the assumption of the correct classification of the trainset.

For the purpose of this study, FIX offers multiple parameter boons that translate to the

aforementioned contentions concerning preprocessing. Namely, the trainset can be varied, to test for the effects of data specificity on the resulting classification scheme, as well as the difference in effects of using affective data or neutral data. The size of the trainset can also be varied for possible variation in resulting classification. Secondly, the thresholding parameter is a strong theoretical proxy for the liberal-conservative dichotomy. And finally, the aggressiveness parameter similarly can give clues as to what conservative cleanup entails. MVPA

Multi voxel pattern analysis (MVPA) is a

relatively novel method for analyzing fMRI data. MVPA utilizes the entire amount of voxel information, or a chosen subset, to train a classifier to identify trial information or other discernable traits such as linked behavioral data. This is in opposition to the standard univariate analysis typically done in fMRI research, where each voxel is individually determined to be a contributing factor to a certain condition. While many ways of applying the MVPA scheme exist, and many are still

being invented, the purpose of this paper is to utilize the variability of an MVPA scheme to derive a simple determinant for effectiveness of the preprocessing measures.

MVPA typically works by training a classifier on a subset of trials in an fMRI experiment, to then test the resulting classification on a left-out train set, and test the accuracy of that classification. This research, for example, will test whether fluid intelligence can be predicted through fMRI data collected during a working memory task. In this case, a subset of the participants’ collected working memory data is used to train weights distinguishing between high and low scores on a raven’s matrices task performed, which is a proxy for fluid

intelligence, by the same participants. The remaining data is then fed into the classifier, where the trained weight categorizes individual participants into high or low expected raven matrices scores. This process is then repeated where train and test division of participants is varied in a K-fold structure. This is done to avoid one-off results, and more accurately represent the true accuracy of the MVPA scheme. The reason MVPA analyses are fitting to the study of preprocessing measures is that it results in a single interpretable accuracy score, which can be higher or lower as a result of the preprocessing tools. It is thus implied that it is the quality of the data fed into the classifier

(8)

that results in differences in accuracy. This does only hold if the classification itself is

significantly correct, e.g. information in the fMRI data can predict the type of trial or behavioral data used to determine accuracy. The FIX-MVPA scheme used in this research will thus produce accuracy scores indicating the successfulness of the trade-off of signal to noise resulting from the preprocessing. While the expected directionality of the results is somewhat contested, as mentioned in the previous sections, some general expectation can be drawn from the current literature: data-specific preprocessing increases accuracy scores, use of affective data for preprocessing will decrease accuracy scores, high thresh holding will decrease accuracy, and finally, aggressive cleanup will decrease accuracy. Methods

This research will perform multiple MVPA analyses where fluid intelligence, as measured by a raven’s matrices task, will be predicted from fMRI data collected during a working memory task. MVPA schemes will vary in the exact fMRI data used, where preprocessing will vary in parameter settings. All MVPA analyses will be tested for significance, after which the resulting accuracy scores will be tested against each other to assess the effectiveness of the preprocessing scheme.

MRI specifications

217 subjects were scanned in a Philips Achieva 3T MRI scanner equipped with a 32-channel SENSE head coil. Spatial planning was manually determined by a short initial survey scan. Structural data was collected with a T1-weighted scan using 3D fast field echo (TR = 82ms, TE = 38ms, flip angle: 8◦, FOV: 240mm x 188mm, Voxelsize: 1.0 x 1.0 x 1.0, 220 slices were acquired using single-shot ascending slice order). Multiple T2*-weighted scans were performed in the subsequent scanning procedure, all using single shot gradient echo, echo planar imaging (TR=2000ms, TE=27,63ms, flip angle: 76,1◦, FOV: 240mm x 240mm, in-plane resolution 64 x 64, voxel size: 3.0 x 3.0 x 3.0, 37 slices with ascending acquisition, slice thickness 3mm, slice gap 0,3mm). The working memory task used in this experiment contained 162 volumes per participant.

fMRI data task

During scanning procedure, the participants were presented with 40 trials, of which 32 were active working memory trials, and 8 were control trials where no active working memory was required. Each active trial consisted of a short (1s) fixation cross, followed by a sample visual display (0.5s), a delay period (4s or 6s), and a probe display (0.5s). Sample displays consisted of a gray background covered with eight white bars presented in a circle around

(9)

the center (figure 2). Participants were asked to remember the orientation of all 8 bars. After the delay period, participants were presented with a single probe bar, and asked to identify whether this bar was congruent or incongruent with the original sample display. Trials were equally divided between the congruent and incongruent probes. Control trials consisted of a short fixation cross (1s), followed by a change in luminance. Trial order was optimized to

maximize BOLD-response estimation. MRI general preprocessing

All preprocessing steps done prior to the FIX-ICA were performed using FSL 5.0 (Jenkinson, Beckmann, Behrens, Woolrich, & Smith, 2012; Salimi-Khorshidi et al., 2014). Functional data were initially slice-time corrected using FSL’s MCFLIRT. Spatial smoothing was applied using a 5mm isotropic kernel as well as a Savitsky-Golay high-pass filter (polynomial degree: 5, window size: 100/TR). Functional data were

co-registered to image T1 space using boundary-based registration and transformed non-linearly to MNI152 (2mm) using FSL’s FNIRT.

15 participants were then randomly selected to perform a MELODIC individual component analysis. The resulting components were hand selected as either noise or signal, which was used to train 3 separate ICA classifiers from the working memory data, the affective data, and a combined classifier were both datasets were used to train the resulting classifier. These classifiers were then individually applied to the entire WM data set, with two additional parameters which varied across folds. The first being the strictness threshold set, from

extremely conservative (40) to extremely liberal (1) removal thresholds, with a total of 5 levels. Additionally, a feature of aggressive and nonaggressive cleanup was implemented,

(10)

meaning that the noisy components were removed entirely or removed according to the correlation to the classifier model respectively. This resulted in a 3x5x2 FIX cleanup scheme, with a total of 30 individual cleaned datasets (figure 3). Additionally, a dataset used as a control was produced were no FIX cleanup was applied.

Individual time series were modeled with a double gamma HRF event-related GLM design using FSL’s FEAT. All resulting beta values were converted to t-values resulting in a whole-brain map of t-values per trial.

Behavioral data

Behavioral data measuring intelligence was assessed using the Raven Standard Progressive Matrices Test (SPM). The SPM test is used as a standard assessment of ability to adapt to novel

cognitive tasks and is highly correlated with general intelligence (Carpenter, Just, & Shell, 1990; Schoelles, Bringsjord, Burrows, Colder, & Hamilton, 2003; Stephenson & Halpern, 2013). The version of the SPM conducted consisted of multiple 3 x 3 graphic matrixes containing patterns. The bottom right square in the matrix is left blank. Participants were asked to select the correct remaining square from 8 presented options shown beneath the matrix. Patterns consisted of one to five graphical elements (lines, shapes, textures, etc.). 36 matrixes were presented, with behavioral scores reflecting the amount of correctly identified pattern solutions. Participants had 30 minutes to solve all

presented matrixes. Resulting SPM scores were calculated and normalized.

(11)

MVPA specifications

The data of all 217 subjects were entered into multivoxel pattern analysis script coded in python using the packages SKlearn (Pedregosa et al., 2011) and SKbold (Snoek, 2017).

Behavioral data was binarized, removing the middle twenty percent (40-60 percentile). Standard feature selection detecting the top five percent most relevant voxel values was performed prior to classification due to limitations in computational power. After behavioral and feature selection a standard support vector classification was applied to the data in a stratified K-fold structure dividing data in train and test sets. A majority under sampler (Snoek, 2017) was applied to correct for any differences in other behavioral variables that could results in a lopsided classification. Stratification was applied ten times, with each stratification instance resulting in ten train and test sets, thus resulting in one hundred individual’s

classification scores.

Results were programed to be entered into a standard confusion matrix, with which the final accuracy of the multivoxel pattern analysis was calculated.

Analysis tools

Analysis of the results was done using permutation testing (Conroy, Walz, & Sajda, 2013; Stelzer, Chen, & Turner, 2013) to gauge

the accuracy and significance of each individual MVPA scheme, after which a bootstrap testing method (Auffermann, Ngan, & Hu, 2002) was used to determine significance of the difference in mean between the preprocessing parameter choices.

Parameter testing was done individually for each MVPA accuracy score. An array

randomizer (Snoek, 2017) was used to switch participant number linked to the fMRI data, which in turn should result in a truly random dataset where the MVPA analysis should perform at chance level. This randomized procedure and subsequent MVPA accuracy testing was then performed a thousand times to determine the bell curve of the MVPA scheme used. Finally, the p-value is determined by measuring all accuracy scores in the bell curve that are higher than the found MVPA preprocessing scheme as a proportion to the total thousand scores.

To test for significant differences in mean scores between MVPA preprocessing schemes a bootstrap method of testing was used. Because the assumption of individual samples was not met, no parametric testing could be performed, thus resulting in this choice of testing.

Bootstrap testing is done in a similar fashion to the previously mentioned array permuter testing. A standard t-test is performed between the compared MVPA results, which results in a t

(12)

statistics score. Subsequently, the individual means of each MVPA result set was removed, and the total mean of both datasets was added. This is done to ensure no true mean difference exists. The resulting data sets were then sampled where 600 random accuracy scores were selected from each data set, and these were used in a standard t-test. This procedure was performed 20,000 times in order to determine the bell curve of the t-test results under the assumption that no mean difference exists. All scores out of the 20,000 resulting in a higher t-statistic than the true t statistic found from the original data set, as a proportion to the total 20,000, are then taken as the p-value. Results & Conclusion

Permutation test

All MVPA analyses were found to be significant, though with low accuracy scores all around the 55 percentiles. The results of each individual permutation average were added into groups corresponding to the preprocessing elements that were used. This resulted in table 1, containing the average scores of each group with their corresponding standard deviations. Figure 4-6 similarly show the resulting means to better visualize these results.

Bootstrap testing

Bootstrap t-test results are presented in table 2-4. All results were pooled similarly to the permutation test results, and were successively

tested against one another. All pooled results were similarly tested against the no-clean condition. When considering a liberal

significance threshold of 0.05, three significant results were found: the low threshold

compared to the high threshold, the working memory ICA type compared to the combined ICA type condition, and finally, the working memory ICA type compared to the anticipation only ICA type. Under a more conservative threshold of 0.01, only the last of these passes significance testing. While many of the results appear to be inconclusive on the major contentions outlined in the previous sections, some general conclusions can be drawn. The significant difference found between low threshold and high threshold parameter setting seems to be part of a trend of accuracy

increasing with more liberal thresholding (fig. 6). However, it does not naturally follow that thresholding should be kept at the bare minimum, since even the most conservative threshold over-performed the noclean group. Another clear finding is the underperformance of the affective data as compared to other data types. ICA schemes trained on only affective data were outperformed by both working memory data, as well as a combination of both data types used in training. These results also show a preference for data-specific training data over data general approaches. While the combined larger training set was not

(13)

significantly outperformed by the data-specific ICA type, it did result in the highest accuracy, and clearly trended in favor of data-specific superiority. This cannot, however, be

(14)

Figure 4: Aggressiveness results

(15)

(16)

(17)

Discussion

This research has aimed to present a novel method for determining superior and inferior methods of preprocessing. Initial results using this novel method have given insight into some of the main discussions currently existing in the literature. The first of these is the use of affective data for training classifier based preprocessing tools, where it was found that, when considering neutral stimulus testing, affective trained classifier

preprocessing was significantly outperformed by non-affective classifiers. This same result could also be attributed to the non-affective training data being data specific, since the used non-affective data should be more data-specific rather than data general, and the purely data-specific ICA type outperformed all other ICA types. Further research using non-affective data-general data-sets could test whether either of these hypotheses caused these results, or whether both could ring true. Furthermore, initial results showed a clear superior

performance under low threshold preprocessing, thus suggesting liberal approaches generate a better signal to noise ratio. Importantly, our best results, and the only results significantly outperforming other results came from the lowest threshold ICA, which was chosen to be even lower than suggested by the creators of the ICA preprocessing approach (“FIX/UserGuide - FslWiki,” n.d.). It is also

important to note that the highest threshold chosen for the current research was far below the maximum threshold. Heightening the threshold further would have likely resulted in lower accuracy scores; this would, however, need to be proven in further research. No significant results were found on the

aggressiveness parameter; thus, no conclusions can be drawn from this preprocessing choice. Further research could show the effect of this parameter in combination with other

preprocessing choices however.

One yet unexplained oddity in the data is the similarity in resulting MVPA score between data that has only received basic preprocessing and the data set that was preprocessed using the highest threshold in this study. Considering the proportion (Graph 1&2) of the components removed during the ICA cleaning, it does seem that there is a proportional increase in removed 0.0000 0.0050 0.0100 0.0150 0.0200 0.0250 0.0300 0.0350 0.0400 0.0450 T1 T5 T10 T20 T40 value 0.0097 0.0175 0.0222 0.0285 0.0401 Pro p o rtio n o f to ta l Threshhold level

Proportion of removed components

(18)

components comparable to the height of the threshold. It is therefore likely that

preprocessing effectiveness follows an inverted parabola shaped model with a quick increase and decrease outside a central point. Further research would have to investigate the nature of the removed components at each level to test why this effect occurs. While this research alone should not be taken as the definitive proof for the superiority of any particular preprocessing choice, it does highlight the usefulness of testing preprocessing measures to determine superior methods by means of MVPA – classifier based approaches. This method could be expanded to include a multitude of training sets that can be compared, giving insight into data-specific superior methods and resolving the reasons between previously found contending results of claimed superior preprocessing choices. Ideally, this type of preprocessing testing could be expanded to include parameters such as quality of data (Power, 2016) comparison with true physiology based approaches, and analysis methods that could yield further insights into the trends found in the results that currently remain inconclusive. These additions were left out of the current research due to time-restraints as well as lack of resources, computationally and otherwise.

It is imperative that methods such as the MVPA – classifier based preprocessing testing are developed and used to determine common denominators within the field on neuroscience. Contending ideas causing disputes about results, and retractions of findings when new preprocessing guidelines are considered cost the Neuroscientific community time, money, and manpower. With the available resources and methods currently commonplace in labs all across the globe, such a common framework should be established and agreed upon. This idea of common denomination is a matter of time, since optimal preprocessing choices will eventually produce more reproducible results. However, it is a matter of willpower that we shorten this process by using methods such as described in this paper, propelling the validity, and thereby the utility, of Cognitive

Neuroscience forward. 0.0360 0.0370 0.0380 0.0390 0.0400 0.0410 0.0420 Both WM Anticipation value 0.0388 0.0413 0.0379 Pro p o rtio n o f to ta l ICA-type

Proportion of removed components

(19)

References

Auffermann, W. F., Ngan, S. C., & Hu, X. (2002). Cluster significance testing using the bootstrap 161. Neuroimage., 17(2), 583–591. https://doi.org/10.1006/nimg.2002.1223

Boubela, R. N., Kalcher, K., Huf, W., Seidel, E.-M., Derntl, B., Pezawas, L., … Moser, E. (2015). fMRI measurements of amygdala activation are confounded by stimulus correlated signal fluctuation in nearby veins draining distant brain regions. Scientific Reports, 5(March), 10499.

https://doi.org/10.1038/srep10499

Caballero-Gaudes, C., & Reynolds, R. C. (2016). Methods for Cleaning the Bold Fmri Signal. NeuroImage. https://doi.org/10.1016/j.neuroimage.2016.12.018

Carpenter, P. a, Just, M. a, & Shell, P. (1990). What one intelligence test measures: a theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review, 97(3), 404–431. https://doi.org/10.1037/0033-295X.97.3.404

Conroy, B. R., Walz, J. M., & Sajda, P. (2013). Fast bootstrapping and permutation testing for assessing reproducibility and interpretability of multivariate fMRI decoding models. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0079271

Feldman Barrett, L., Bhaskar Satpute, A., & Feldman, L. (2013). Large-scale brain networks in affective and social neuroscience: towards an integrative functional architecture of the brain. Current Opinion in Neurobiology, 23, 361–372. https://doi.org/10.1016/j.conb.2012.12.012

FIX/UserGuide - FslWiki. (n.d.). Retrieved August 4, 2017, from https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIX/UserGuide

Glover, G. H., Li, T., & Ress, D. (2000). <Glover2000-Image-BasedRetrospectiveCorrectionOfNoise.pdf>, 167(March), 162–167.

Griffanti, L., Dipasquale, O., Laganà, M. M., & Nemni, R. (2015). Effective artifact removal in resting state fMRI data improves detection of DMN functional connectivity alteration in Alzheimer ’ s disease. Frontiers in Human Neuroscience, 9(August), 1–11. https://doi.org/10.3389/fnhum.2015.00449 Griffanti, L., Douaud, G., Bijsterbosch, J., Evangelisti, S., Alfaro-Almagro, F., Glasser, M. F., … Smith, S. M.

(2017). Hand classification of fMRI ICA noise components. NeuroImage, 154(June 2016), 188–205. https://doi.org/10.1016/j.neuroimage.2016.12.036

Griffanti, L., Salimi-Khorshidi, G., Beckmann, C. F., Auerbach, E. J., Douaud, G., Sexton, C. E., … Smith, S. M. (2014). ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging. NeuroImage, 95, 232–247. https://doi.org/10.1016/j.neuroimage.2014.03.034 Haynes, J. D. (2015). A Primer on Pattern-Based Approaches to fMRI: Principles, Pitfalls, and

Perspectives. Neuron, 87(2), 257–270. https://doi.org/10.1016/j.neuron.2015.05.025

Iacovella, V., & Hasson, U. (2011). The relationship between BOLD signal and autonomic nervous system functions: Implications for processing of “physiological noise.” Magnetic Resonance Imaging,

(20)

29(10), 1338–1345. https://doi.org/10.1016/j.mri.2011.03.006

Jenkinson, M., Beckmann, C. F., Behrens, T. E. J., Woolrich, M. W., & Smith, S. M. (2012). FSL. NeuroImage, 62, 782–790. https://doi.org/10.1016/j.neuroimage.2011.09.015

Junghöfer, M., Schupp, H. T., Stark, R., & Vaitl, D. (2005). Neuroimaging of emotion: Empirical effects of proportional global signal scaling in fMRI data analysis. NeuroImage, 25(2), 520–526.

https://doi.org/10.1016/j.neuroimage.2004.12.011

Kay, K. N., Rokem, A., Winawer, J., Dougherty, R. F., & Wandell, B. A. (2013). GLMdenoise: A fast, automated technique for denoising task-based fMRI data. Frontiers in Neuroscience, 7(7 DEC), 1– 15. https://doi.org/10.3389/fnins.2013.00247

Liu, T. T. (2016). Noise contributions to the fMRI signal: An overview. NeuroImage, 143, 141–151. https://doi.org/10.1016/j.neuroimage.2016.09.008

Ogawa, S., Lee, T. M., Kay, A. R., & Tank, D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proceedings of the National Academy of Sciences of the United States of America, 87(24), 9868–72. https://doi.org/10.1073/pnas.87.24.9868

Power, J. D. (2016). A simple but useful way to assess fMRI scan qualities. NeuroImage, 1–9. https://doi.org/10.1016/j.neuroimage.2016.08.009

Pruim, R. H. R., Mennes, M., van Rooij, D., Llera, A., Buitelaar, J. K., & Beckmann, C. F. (2015). ICA-AROMA: A robust ICA-based strategy for removing motion artifacts from fMRI data. NeuroImage, 112, 267–277. https://doi.org/10.1016/j.neuroimage.2015.02.064

Salimi-Khorshidi, G., Douaud, G., Beckmann, C. F., Glasser, M. F., Griffanti, L., & Smith, S. M. (2014). Automatic denoising of functional MRI data: Combining independent component analysis and hierarchical fusion of classifiers. NeuroImage, 90, 449–468.

https://doi.org/10.1016/j.neuroimage.2013.11.046

Schoelles, M. J., Bringsjord, S., Burrows, K., Colder, B., & Hamilton, B. A. (2003). Sage: five powerful ideas for studying and. Human Factors, 1019–1023.

Shaw, M. E., Strother, S. C., Gavrilescu, M., Podzebenko, K., Waites, A., Watson, J., … Egan, G. (2003). Evaluating subject specific preprocessing choices in multisubject fMRI data sets using data-driven performance metrics. NeuroImage, 19(3), 988–1001.

https://doi.org/10.1016/S1053-8119(03)00116-2

Shirer, W. R., Jiang, H., Price, C. M., Ng, B., & Greicius, M. D. (2015). Optimization of rs-fMRI

Pre-processing for Enhanced Signal-Noise Separation, Test-Retest Reliability, and Group Discrimination. NeuroImage, 117, 67–79. https://doi.org/10.1016/j.neuroimage.2015.05.015

Snoek, L. (2017). skbold Documentation Release 0.1. Retrieved August 4, 2017, from https://media.readthedocs.org/pdf/skbold/latest/skbold.pdf

Stelzer, J., Chen, Y., & Turner, R. (2013). Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA): Random permutations and cluster size control. NeuroImage, 65, 69–82. https://doi.org/10.1016/j.neuroimage.2012.09.063

(21)

Stephenson, C. L., & Halpern, D. F. (2013). Improved matrix reasoning is limited to training on tasks with a visuospatial component. Intelligence, 41(5), 341–357.