Clinical EEG and MEG Research with Machine Learning Applications: A Brief Systematic Review

(1)

Literature Thesis – 12 EC

October 2020 – December 2020

Clinical EEG and MEG Research

with Machine Learning Applications:

A Brief Systematic Review

Ana Radanovic

1288720- anaradanovica@gmail.com

Supervisor: Dr. Denis A Engemann

French National Institute of Computer Science (INRIA) ~

Co-Assessor: Dr. Jelle Zuidema

University of Amsterdam – Mathematics and Computer Science Institute for Language, Logic, and Computation

(2)

Abstract

Electroencephalography (EEG) and Magnetoencephalography (MEG) are important neuroimaging tools that have undoubtably impacted research both inside and outside of clinical research. While these tools have established a prominent path to our understanding of brain disorders such as schizophrenia and Alzheimer’s disease, these tools have not been firmly established in the diagnostic and prognostic process for most brain disorders. Application of machine learning techniques used in combination with clinical E/MEG research have started to show very promising results in areas such as brain-computer-interfaces for rehabilitation and seizure prediction. For applications in diagnosis or the prediction of clinical outcomes (subject-level analysis), however, using machine learning to build predictive models for subject-level analysis is in its early stages. In this systematic review, we overview the state of the field in terms of machine learning application to clinical E/MEG research. We begin with a broad overview, looking into the tool (E/MEG) used and the pathology studied. We then narrow down the papers we analyze to the last few years in order to conduct a more in-depth analysis, reviewing the aim and the performance of the models, the cross validation scheme used, and the details of the data collection (sample size, number of channels recorded from). These analyses reveal common trends that are known in present literature, such as performance decrease with the increase in sample size. However, there are clues to hint at apparent positive optimism in the performance of the developed models. Here we review pitfalls in the methodology that are apparent from the analysis and discuss the need for well established practices. Suggestions are made for good open science practices that forward development of these models as generalizable and accessible tools in clinical E/MEG research.

(3)

Introduction

The use of Electroencephalography (EEG) and Magnetoencephalography (MEG) in clinical neuroscience research is vast and has provided very impactful information on a wide variety of different brain disorders. The established practice of using of E/MEG in clinical settings, however, is narrow due to some inherit challenges in E/MEG research. Machine learning (ML) provides a promising path for the development of tools that use E/MEG in combination with ML for aiding in diagnostic or prognosis of brain disorders.

The introduction is structured such that it begins with an introduction on E/MEG, it’s current applications in clinical settings and its challenges, then introduces ML and how it could be useful in the field of clinical neuroscience.

Brief Introduction to the Similarities and Differences of MEG and EEG

EEG and MEG are two powerful methods of non-invasive brain imaging. The two methods have similar fundamental measurement principals in that they measure, with millisecond accuracy, electric fields on the surface of the head. Both EEG and MEG are used inside and outside of clinical research. EEG measures signal originating from electric potentials generated by extracellular currents from both tangential and radial pyramidal neurons firing together near the surface of the brain. MEG, on the other hand, measures signal generated by intra cellular currents from only tangential pyramidal neurons (Singh, 2014).

The subtle differences between the two methods provide them individual strengths and weaknesses, granting researchers the freedom to choose their desired method. For instance, EEG, with its lower spatial specificity, has a lead field that is more sensitive to scalp and skull conductivity (Singh, 2014). EEG also has the strength of mobility, both in terms of the instrumentation itself, which can move between labs and can be developed as a portable device, but also mobility in terms of the participant who is being recorded (although participant movement is highly unfavored). In contrast, MEG with its higher spatial specificity is more costly and requires more complex instrumentation that is not accessible for all labs (Hämäläinen et al., 1993). EEG and MEG can be recorded simultaneously as well, possibly increasing the value of the results of the experiment (Hämäläinen et al., 1993). Neither technique is considered the “better option” in terms of measurement accuracy, so the use and validity of the research conducted using E/MEG lies in the hands of the experimenter.

Clinical Applications of E/MEG and its Challenges

Clinical research using both E/MEG measurements has a long and diverse history. E/MEG have been used to investigate conditions such as attention deficit hyperactivity disorder (ADHD; Lenartowicz & Loo, 2014), schizophrenia (Jalili & Knyazeva, 2011), epilepsy (Acharya et al., 2013; Hämäläinen et al., 1993; Singh, 2014), Parkinson’s disease (Geraedts et al., 2018), and disorders of consciousness (DOC; (Harrison & Connolly, 2013). The simultaneous use of EEG and MEG is most commonly found for epilepsy and seizure diagnostics (Singh, 2014). E/MEG are valuable tools with a dynamic range of experimental targets.

Conducting clinical E/MEG research is not without its challenges. Firstly, the skull is not a perfectly spherical shape and scalp and skull thickness can vary the electrical conductivity. There are many sources of signal distribution in the brain that can contribute to the electrical potential measured at a given electrode (Michel & Brunet, 2019). Importantly, this leads to an issue in source localization, known as the inverse problem. This issue is amplified when field spread is varied among participants, and hence, applying a regression across participants proves to be more difficult than within the same participant. Second, E/MEG are subject to a low signal-to-noise ratio (Acharya et al., 2013; Jas et al., 2017) that requires strategic “bad data” removal for proper

(4)

focus on the task or are unable to keep still, such as patients with DOC (Cruse et al., 2011; Goldfine et al., 2013). Lastly, a large challenge in clinical E/MEG research is that the interpretation of results for diagnostics are subject to moderate-to-high inter-rater variability (Azuma et al., 2003; Halford et al., 2015; Struve et al., 1975; van Donselaar et al., 1992; Woody, 1968), and in many cases, a delayed or inaccurate diagnosis from EEG can be harmful for patients (Friedman et al., 2009; Oddo et al., 2009). Given the critical importance of accurate diagnosis in a clinical diagnostic setting, it is imperative to obtain an accurate prognosis. Overall, the aforementioned challenges limit the application of E/MEG clinically where interpretability and reproducibility are essential.

Introducing Machine Learning as a Tool for Clinical E/MEG Research

A solution to alleviate challenges with E/MEG research is the introduction of ML as a tool for assisting clinicians in diagnostics or prognosis. Put simply ML can be defined as function approximation, which can be understood as finding a function that best maps inputs and outputs (Hastie et al., 2009). These functions can be used for a variety of goals including predicting outcomes from new data or classifying data. The structure of ML algorithms is also diverse, including binary linear or non-linear classifiers such as the support-vector machine (SVM; Cortes & Vapnik, 1995), or more complex multi-layered deep learning architectures such as convolutional neural networks (CNN; Lecun et al., 1998). Adding to this, the boundary between traditional statistical techniques and ML can be blurry. However, a distinction can be made such that traditional statistical methods are aimed at making inferences about data and ML is geared towards making generalizable predictions instead of inferencing parameters (Bzdok et al., 2018, 2020). ML can also learn to represent a signal using by means of pattern recognition techniques. As such, when analyzing E/MEG data proves to be challenging, ML is a tool that can provide a “summary” of the signal, making E/MEG data more tractable. ML also relies less on a priori hypotheses regarding brain responses as you can automate feature search in your algorithm. Once an algorithm has recognized patterns, it will also make the same consistent predictions for the same subject, removing the issue of inter-rater variability. In this way, we believe that ML can be a very valuable tool for clinical neuroscience research.

The use of ML methods in clinical neuroscience has shown a rapid growth in popularity over recent years (Roy et al., 2019; Segato et al., 2020; Woo et al., 2017). ML has substantially impacted clinical E/MEG research with applications in brain-computer-interfaces (BCI; Selim et al., 2008). BCI utilizes machine learning algorithms that translate brain patterns to into messages or commands for directives in the user’s environment (Lotte et al., 2018). BCI systems are used clinically for a variety of aims including rehabilitation with stroke patients (Ang & Guan, 2013), paralyzed patients in wheelchairs (Pfurtscheller et al., 2008), and gait and movement preparation analysis for Parkinson’s patients (Velu & de Sa, 2013). Thus far, much of ML application in E/ MEG research, including BCI, is focused on event-level analyses such as particular brain responses to novel auditory stimuli or brain dynamics for movement intentions. This is logical, as developing well-performing algorithms to solve classification problems is much easier than generating subject-level predictions such as diagnosis of a schizophrenia patient or likelihood of treatment response (Sabbagh et al., 2020). These subject-level predictions are a very important aspect for clinical care and thus an area worthy of focus for fruitful development. It is unclear how the state of the field exists with how ML techniques are used in combating these analyses. In order to develop predictive models for patient care, methodologies and developments should be shared, reviewed, and scrutinized amongst the community for stronger availability and generalizability.

(5)

Objective of This Review

In this systematic review, we first present an overview of state of the field covering ML applications in clinical E/MEG research. We then aim to give an overview of the methodological underpinning from the recent years regarding data collection, validation methods, and performance metrics. Our ultimate goal with this systematic review is to address the question: how is ML used in combination with E/MEG to model clinical endpoints on a subject level? Abbreviations used in this review can be found in Box 1.

Methods

The target papers of the literature search were research articles written in English that apply ML techniques with E/MEG research on a clinical population. The search terms were established by first initiating a preliminary search on PubMed and Web of Science based on search terms used by Woo et al., (2017). Using the analysis tool Litsearchr (Grames et al., 2019), the preliminary search results were analyzed for popular key terms used in the titles, abstracts, and keywords. Many of the terms found from this analysis were not aligned with the current research goal and thus were added to the list of exclusion terms. Our final search terms included a broader range of conditions and pathologies (as compared to those targeted in Woo et al., (2017)), and supervised machine learning techniques in order to retrieve a broad range of ML applications with clinical E/ MEG. The search was conducted on November 2nd_{, 2020 and the results include all papers published up until} that date. Search terms used for each source (PubMed and Web of Science) are detailed in Table 1 (for details on search query per database source, see Supplementary Table 1).

Exclusion terms were decided with multiple considerations. First, we were not interested in retrieving brain-computer-interface (BCI) or related studies because although the field is wildly implemented in clinical populations, the clinical aim is different, and thus the processing and validation pipelines are unique to this field and do not align with the current research goal. Similarly, epilepsy and seizures were excluded because most of the research on these conditions, when used in combination with ML techniques, consists of event-level analyses utilizing unique processing and validation paradigms aimed at predicting seizures. Ultimately, while both clinical BCI and epilepsy research are important and have positively implicated the research conducted in clinical neuroscience in general, the aim of the current paper is geared towards predictive models for diagnosis, prognosis, and other subject-level analyses. Animal models were also excluded due to our interest in human population research. The remaining exclusion terms were those included from the Litsearchr key term analysis. The final terms were used to screen papers using the titles, abstracts, and author keywords.

ADHD – Attention Deficit Hyperactivity Disorder AUC- Area Under The Curve

BCI – Brain-Computer Interface CNN -Convolutional Neural Network DOC – Disorders Of Consciousness

EEG - Electroencephalography MEG -Magnetoencephalography ML -Machine Learning

PTSD – Post Traumatic Stress Disorder SVM – Support Vector Machine

(6)

Group

Term Used

Brain Sciences

Brain, Neuro*, Neural

Machine Learning Terms

machine learning , statistical learning, multivariate pattern, pattern classif*, pattern recog*, support vector, SVM, classif*, random forest, LASSO, ridge, trees, tree-based, linear discriminant analysis, LDA, deep learning, representation learning, neural network*, convolutional neural network*, ConvNet, CNN, recurrent neural network*, RNN, long short-term memory, LSTM

Data Collection Tool

EEG, MEG, electroencephalograph*, magnetoencephalograph*

Pathology or Condition

Parkinson*, PTSD, Post-traumatic, ADHD, attention-deficit, anxiety, schizo*, Psychosis, Bipolar, Depress*, Autis*, Pain, Alzheimer*, brain age, DOC, disorders of consciousness, coma, concuss*, stroke, anoxia, dementia, multiple sclerosis, insomnia, narcolepsy, chronic fatigue, obsessive compulsive, OCD, vegetative state, minimally conscious state, traumatic brain injury, unresponsive wakefulness

Exclusions

BCI, brain-computer interface, epilep*, seizure*, emotion regulation, driver fatigue, HCI, human-computer interaction, position paper, mouse, mice, rat*, piglet, BMI, Brain-machine interface

Inclusion and Exclusion Criteria

Inclusion

Exclusion

• Peer-reviewed research articles • English

• Machine learning application to clinical population using E/MEG aimed at clinical outcome analysis

• Supervised learning

• Clear description of dataset accumulation

• Reviews, conference and symposium papers, opinion articles, protocol guidelines

• Animal models • Neurofeedback papers • Unsupervised learning

• No sample size or no clear dataset origin provided

Table 1. Terms used to construct Boolean searches for literature retrieval. The asterisks indicate any characters

that can complete the word.

Papers were first screened for duplicates amongst the search results between the two databases using revtools (Westgate, 2019). Duplicate titles, abstracts, DOI’s, and authors were rejected. Assessing eligibility of inclusion was done by screening titles and abstracts according to the determined exclusion criteria (Table 2). Only papers which used supervised learning techniques were the target because both classification and regression (prediction) problems are types of supervised learning techniques and modeling clinical endpoints lend themselves toward supervised learning.

(7)

With the database of articles, the data considered for the studies were as follows: a) Year of publication

b) Data collection method used (E/MEG) c) Pathology or condition studied

After initial analysis, full-text in depth analysis was reserved for selected papers published between 2017-2020, with inclusion of the terms in Table 3. This full-text analysis was reserved for these papers due to time constraints of analysis. These terms were used to narrow down papers that studied the same pathologies as those analyzed by Woo et al., (2017), expect for the inclusion of DOC for personal interest.

Table 3. Selected terms for screening papers used for full-text analysis. Terms were screened in abstract and

title.

Group

Term Used

Machine Learning Tool

machine learning , statistical learning, multivariate pattern, pattern classif*, pattern recog*, support vector, SVM, random forest, deep learning, LASSO, representation learning, convolutional neural network*, ConvNet, CNN, recurrent neural network*, RNN, long short-term memory, LSTM

Pathology or Condition

Parkinson*, post-traumatic, PTSD, attention deficit, ADHD, anxiety, schizophrenia, psychosis, bipolar, depress*, autis*, pain, Alzheimer*, disorders of consciousness, DOC, minimally conscious state, vegetative state, unresponsive wakefulness

The data considered for the full text analysis were as follows: a) Sample size

b) Highest accuracy reached

c) Other model performance measurements (area under the curve (AUC), f-score) d) Number of E/MEG channels

e) Result verification scheme /cross validation schemes f) Research aim

The highest accuracy reached is reported as the highest true accuracy rate for the testing phase between the target pathology and the healthy control, unless the sample did not include healthy controls. The sample size reported is the sample size affiliated with the highest accuracy, as some studies included multiple datasets. The AUC and f-score are two other common measures of performance of a machine learning algorithm, taking into account sensitivity and specificity and precision and recall, respectively. If these are reported by the authors, they are recorded as those affiliated with the highest accuracy reached for consistence. The research aim was determined under 6 categories: a.) Diagnosis, which refers to classification of patients vs. healthy controls or classification of two psychological disorders from another (e.g. schizophrenia vs. bipolar disorder), b.)

Prognosis, which refers to a prediction of disease progression, c.) Risk group, which refers to predicting

if an individual is at risk for developing a disorder, d.) Subtype, which refers to classifying subtypes of a neurological disorder, e.) Symptom, which refers to predicting progression of symptom scores, and f.)

Treatment response, which refers to predicting an individual’s likelihood of treatment response.

Linear regression analyses were conducted on the correlations of channel number and sample size on the highest recorded performance of the models. These were analyzed because these factors are attributes of model development that can provide a broad indication of where the field is headed in terms of developing a well-performing, generalizable, and accessible model for use in a clinical setting.

(8)

Results

Literature Search Results

The Boolean search yielded 1048 results on PubMed and 742 results on Web of Science. After screening out papers according to the inclusion and exclusion criteria, the dataset had a total of 518 papers. Narrowing down the papers to only those published in 2017-2020 including the terms listed in Table 3 for the full-text review yielded 126 papers. The detailed workflow can be seen in Figure 1. Data manipulation, plotting, and statistical analyses were conducted in R (https://www.R-project.org/). Packages used are ggplot (Wickham, 2016), dplyr (https://CRAN.R-project.org/package=dplyr) , and revtools (Westgate, 2019). PubMed (n=1048) Web of Science (n=742) Original sample (n=1227) n = 518 Removal based on exclusion criteria

After filtering with selective keyterms & years 2017-2020 With duplicate removal

Used for: Initial analysis

n = 126 Used for:

In depth analysis

Figure 1. Workflow diagram for screening database results. The

search results were narrowed to exclude duplicates, then were further screened in titles and abstracts using previously decided exclusion criteria to establish a set of 518 papers for initial analysis. This set was further narrowed to the last three years and using select terms to yield 126 papers used for full-text in-depth analysis.

0 30 60 90 1990 2000 2010 2020 Year Count

Initial Analysis - Broad Overview

As seen in Figure 2, the trend of increasing popularity of ML methods in combination with clinical E/MEG remains. A large bulk of the papers are published within the last 10 years.

Figure 2. Number of papers published per year. This figure shows the count of the number of papers published per year in our dataset of 518 papers. This includes all articles published until November 2nd, 2020.

(9)

In terms of data acquisition, there is a large increase in the use of EEG within the last 10 years, but the same trend does not follow for MEG (Figure 3). The combined used of MEG and EEG is rare, as our database only contained 2 papers using combined methods.

0 25 50 75 100 1990 2000 2010 2020

Year

Count

Tool

EEG

MEG

MEG EEG

Figure 3. The number of papers published each year using each

tool (MEG, EEG, or both) in our database of 518 papers.

Certain conditions seem to dominate the field’s interest, however (Figure 4A). We see that the top four most common conditions studied are Alzheimer’s (122), depression (75), schizophrenia (68), and dementia (62). Research interest in Alzheimer’s, depression, and schizophrenia seem to have been steadily increasing through the past 10 years (Figure 4B), possibly following the trend of increased publications in general. ADHD research has increased fairly through the last five years but does not have the highest count overall, suggesting a more recent application of ML interest for ADHD.

(10)

Obsessive Compulsive DisorderAnxiety Brain Age Concussion Multiple SclerosisSleep PsychosisPTSD Bipolar DisorderComa Disorders of ConsciousnessPain Stroke Parkinson’s Disease Traumatic Brain InjuryADHD Autism Mild Cognitive ImpairmentDementia SchizophreniaDepression Alzheimer’s 0 25 50 75 100 125 Count Con dition Schizophrenia

Psychosis PTSD Sleep Stroke

Disorders of Consciousness

Multiple Sclerosis Pain

Brain Age Coma Concussion Dementia Depression

ADHD Anxiety 1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 Year Count Obsessive

Compulsive Disorder Mild CognitiveImpairment

Autism Traumatic Brain Injury Alzheimer’s Bipolar Disorder Parkinson’s Disease

A.

B.

Figure 4. Results for conditions studied in terms of yearly publication and number of times the conditions are studied in our database

of 518 papers. The abbreviation “ADHD” refers to attention deficit hyperactivity disorder and “PTSD” refers to post traumatic stress disorder. “Sleep” refers to sleep disorders researched, and “Pain” refers to any chronic pain disorder. A. This figure shows how many times the condition was studied in our dataset, including if it was studied alongside other conditions (e.g. a paper that studies anxiety and depression would give one count to depression and one to anxiety). B. This figure shows the trends of the number of publications per year according to each condition separately. The conditions are counted with the same procedure as figure A (e.g. a paper published in 2015 studying anxiety and depression would give a count to both anxiety and depression for the year 2015).

(11)

Full-Text Analysis Results

The most common goal was diagnosis followed by subtype classification (Figure 5A). In terms of model validation, the most common cross-validation scheme used was k-folding followed by leave-one-out (Figure 5B). Data regarding classification accuracy or f-score in function of sample size or number of channels are inconclusive (Figure 5C). However, with a simple linear regression it was found that sample size was correlated with AUC scores with significance (β=-3.809, p=0.01*, adjusted R2_{=0.1165, Figure 5C).}

The results of the in-depth analysis reveal that in the last few years, the field is dominated by diagnosis aimed model development including biomarker development or as a diagnostic tool. These models are most commonly validated with k-folding techniques. An increase in sample size was correlated with a decrease in only the AUC performance measure. Sample size does not show a strong effect on accuracy, and with only a few datapoints the effect on f-score cannot be determined. The number of channels recorded from, although ranging from 1-300+, does not show to affect the performance of the model. In the following section these factors will be discussed with their resulting impact on the field.

(12)

Figure 5. Results from in-depth analysis looking into research goal, cross validation scheme, sample size, model performance,

and number of channels recorded from. A. The number of papers using each cross-validation scheme in our dataset of 126 papers. All of the different k-fold strategies were counted in the same overarching group “k-fold”. B. Research goals for the papers in our database of 126 papers. C. Comparing performance metric scores according to sample size and number of channels. The scale of the sample size is corrected to the log10 scale. Number of channels is represented in both the color and the size of the points, with larger sizes and lighter colors indicating higher channel numbers. The top graph represents this analysis with accuracy scores of the highest accuracy reached between patient population and healthy controls, or between two pathological conditions if healthy controls were not included in the sample. The sample size is that associated with the highest accuracy score. The middle and bottom graphs show the same comparison with AUC and F-score values, respectively. These scores are those associated with the highest accuracy reached or the highest AUC/F-score reached between patient vs healthy control if the accuracy was not reported. Regression analysis revealed sample size correlated with AUC scores (β=-3.809, p=0.01*, adjusted R2_=0.1165). 60 70 80 90 100 Acc ur ac y 60 70 80 90 100 AUC 80 85 90 95 10 100 1000 Sample Size F− Sc ore Number of Channels 100 200 300 0% 25% 50% 75% 100% Percentage Diagnosis 80.16% Prognosis 3.97% Risk group 3.17% Subtype 7.14% Symptom 0.79% Treatment response 4.76 % K−fold Leave-one subject-out Custom Hold out Monte Carlo Leave-one-out 0 20 40 60 Number of Papers Cross −V ali dation Sc heme

A.

B.

C.

(13)

Discussion

EEG and MEG are two valuable neuroimaging tools for clinical research, and when used in combination with ML, have a great potential for driving the development of useful diagnostic and prognostic tools for a variety brain disorders. In this review, we analyzed research articles that use ML techniques in combination with E/ MEG aimed at developing models for classification and prediction within a clinical population. The interest in this field continues to grow, evident in the number of publications within the field of ML and clinical E/ MEG increasing every year (Figure 2). This growth is also consistent with trends found in previous reviews (Roy et al., 2019; Segato et al., 2020; Woo et al., 2017). This trend is not isolated to this field, it follows the general trend across most disciplines in science as the number of publications increases every year (Bornmann & Mutz, 2015). An increasing number of studies on clinical applications of ML with E/MEG does impose a necessity for systematically reviewing research to develop a clear understanding of methodologies (both in their promises and their pitfalls), routes for further investigation, and the establishment of common practices. This discussion is structured such that it will begin with reviewing the key findings of the broad initial analysis, then discussing and reflecting on key findings in the full-text in-depth analysis, finishing with suggestions and recommendations for the establishment of good research practices in order to forward the field of ML applications in clinical E/MEG research.

Targeted Conditions and Pathologies

We report that the most common disorders studied in our set were Alzheimer’s, depression, schizophrenia, and dementia (Figure 4A). It is not surprising to find Alzheimer’s at the top, as neurological disorders such as Alzheimer’s account for some of the largest burden of deaths from neurological deficits globally (Feigin et al., 2020), with brain disorders increasing the economic burden every year in general (Olesen et al., 2012). For conditions such as ADHD, which has high rates of misdiagnosis (Schwandt & Wuppermann, 2016), ML could be beneficial in not only identifying biomarkers for the disorder but to aid in proper and consistent diagnosis of the disorder.

The Decision to Use EEG versus MEG

EEG, as opposed to MEG, shows a steep increase in the amount of use throughout the years as seen in Figure 3. Plausibly, this could be an effect paralleled with the increase in the number of publications in general. However, MEG does not follow the same trend, suggesting the dominance of EEG in our sample may be due to some inherit strengths of EEG as a data collection tool. EEG is cheaper and less complex than MEG (Hämäläinen et al., 1993). Also, EEG is more common, as every hospital is likely to be equipped with an EEG, whereas MEG availability is rare (Coquelet et al., 2020). When developing a model that relies on either EEG or MEG data and is accessible for all clinicians, EEG may lend itself as the more optimal tool.

Research Aims

Upon analysing a narrow group of papers for in depth analysis, we found that the most common research aim is diagnosis (Figure 5B), a trend found in similar previous reviews (Woo et al., 2017). A tool to aid in diagnosis is critical for some pathologies such as DOC, as many patients are incorrectly diagnosed (Schnakers et al., 2009; Wannez et al., 2017). An unreliable diagnosis leads to an even less reliable prognosis thus an accurate initial diagnosis is important. Diagnoses made from E/MEG data are human made labels that have high inter-rater variability (Azuma et al., 2003; Halford et al., 2015; Struve et al., 1975; van Donselaar et

(14)

al., 1992; Woody, 1968), and ML can aid in alleviating this problem. However, for other pathologies such as dementia that already have well established tools of diagnosis and where EEG may not be able to outperform molecular tests (Koenig et al., 2020), the necessity lies in developing models that can predict subject-level clinical outcomes that are continuous variables. These include predicting symptom progression or likelihood of treatment response. On the other end, a well-performing model for diagnosis can aid in developing biomarkers for not only the disorder itself, but possibly for other subtypes of the disorder.

Relation of Sample Size, Number of Channels, and Model Performance Measure

Accuracy seems to decrease with the increase in sample size, an effect that is even more pronounced with AUC scores (Figure 5C). Only a select few papers in our dataset reported f-score values so a trend could not be interpreted. Decreasing model performance in function of increasing sample size is in line with trends found in literature (Vabalas et al., 2019; Whelan & Garavan, 2014). However, many of the models obtain an accuracy of up to 100% which is encouraging but raises scepticism. For example, in clinical contexts, the diagnosis of disorders such as depression and anxiety have very high inter-rater variability and low reliability for diagnosis (Freedman et al., 2013). That said, ML could improve the accuracy of the diagnoses by finding patterns in the brain signal that are more reliable in classifying patients than clinical profiles (Drysdale et al., 2017). However, the data is trained on human labeled data and many of the selected papers reached accuracies above 90% for diagnosing depression, for example. This indicates that there is likely bias in the performance and findings should be taken with careful consideration.

Optimistic bias of model performance can stem from multiple sources. First, analysis conducted before classification—such as predictor selection before feeding the data into the testing phase—removes the independence between the training and the testing sets (Whelan & Garavan, 2014; Woo et al., 2017). The test set and the training set are required to be independent from another to avoid optimistic bias in performance. Also, many researchers trained multiple classifiers in order to find the highest classification rate then report the top performing model, which results in overfitting. Lastly, the cross-validation scheme is important in terms of analyzing and developing a model but especially with small datasets, can provide an inaccurate vision of model performance. K-folding was found to be the most common cross-validation scheme used (Figure 5A), but it was found that cross-validation across folds typically leads to an error of ± 10% in the measure of model performance (Varoquaux, 2018). In order to reduce this error, the sample sizes would have to reach to sizes that are upwards of 1000 participants. This is troubling, as we only had a handful of papers that surpassed a sample size of 400. Leave-one-out cross validation is the second most common method, although it is suggested that this form of validation should be avoided due to higher variation in performance measure (Varoquaux et al., 2017). These issues are amplified in the current set as most experiments typically maintain strict criteria for participant inclusion. This makes the chosen population and the resulting data more homogenous and necessitates independent and variable data. In order to truly measure model performance, models would need to be trained on much larger sample sizes and on completely independent datasets, otherwise current model performance may be too optimistic.

Inconsistencies in the reported performance measure complicates the comparison of two predictive models. Comparing two models and their performance is difficult when one only reports an AUC score and the other only reports a f-score. To make matters worse, both measures of performance can individually have different means of calculation (Forman & Scholz, 2010) and the calculation is not always explicitly stated. On the other hand, what measure you choose to assess model performance also depends on the dataset and the question at hand, so there is not a one size fits all approach. In our dataset of 126 papers, what the authors chose to report was highly variable; some chose to only report accuracy, some accuracy and AUC, some only report AUC. This makes it very difficult to compare two models developed for the same goal.

(15)

It may be reasonable to think that an increased number of channels provides more room for removal of bad channels and can better provide topological and spatial information that may be important for model training. In fact, a study in DOC patients showed that increased number of channels did in fact improve model performance (Engemann et al., 2018). However, it is encouraging to see researchers to reach high model performance with a low number of channels (Cai et al., 2018; Khatun et al., 2019; Liang et al., 2020). Methods of achieving high accuracy using less than four channels are questionable in terms of data manipulation and performance on independent datasets. The question then becomes: Is it good practice to keep manipulating data and trying different classification algorithms in order to get the best performance with the fewest number of channels? On one hand, this process could be viewed as a phenomenon similar to p-hacking. On the other hand, achieving high accuracies with low channel numbers could be useful to develop a tool that could be used on a multitude of recording devices, regardless of the channel number. The ability to record on almost any device improves accessibility. Taken to the degree, however, of trying to establish a model that performs well with only a few channels, may not be productive in establishing a generalizable model. However, a model that only generalizes when using a high channel density EEG decreases accessibility of model use. Overall, it paramount that there be a fine line between performance, accessibility, and generalizability.

Comments and Good Research Practice Suggestions

As Borsook et al. (2011) have noted, developing a model as a clinical tool in clinical neurosciences is similar to the development of a drug for treatment. It requires broad exploration, sensitivity and specificity, and many trials of testing across multiple populations. This requires, on an individual study level, the practice of providing clear descriptions of methodology from details of data acquisition to data manipulation, to model testing and validation. A good practice is to pre-register the project on a platform such as the one provided by the Center for Open Science (https://osf.io/prereg/), in which detailed experimental and methodological pipelines are pre-registered and pre-approved. Pre-registration prevents bias in reported analyses, improves transparency and reproducibility, and also encourages the publication of results that don’t support hypotheses. In the domain of ML applications in clinical neuroscience, an openness to sharing code and other processing tools used is key to improve model development. Most neuroscientists are not computer scientists and may not be able to or nor have the time to properly document code or have it tested for accuracy (Gorgolewski & Poldrack, 2016), but should not be discouraged from sharing code as it improves reproducibility. Sharing, building on, and validating models from shared code improves reproducibility and provides a promising route for model development. In this domain, replication is powerful in testing well-performing models on new independent datasets for testing generalizability. On a community level, this means contributions to public and shared datasets (other recommendations and details can be found in Gorgolewski & Poldrack (2016)). Good research practices are key for developing predictive models that can be utilized in a clinical setting.

Limitations of the Current Review

Due to time constraints, the analysis was conducted on limited information. More detailed analyses can be conducted on the specifics of the methodologies. For example, analyses on model specifications such as the specific ML techniques used (SVM, random forest, CNN, etc) and their construction to be able to compare individual techniques and their performance could be included. Analyses could also be conducted on the datasets, such as whether a public or personally acquired data, or both, is used. Then analysis could dive into the validation of the models, if they were validated on completely independent datasets or not. These details could provide an even clearer understanding of the state of model development in clinical neuroscience, but the present review has provided a useful and informative broad overview of current practices and is a

(16)

Conclusion

In this review, we aimed to address the question of “how is ML used in combination with E/MEG to model clinical endpoints on a subject level”. Although the use of ML in combination with clinical E/MEG research is rapidly expanding, it seems that modeling clinical endpoints is still in its very early and explorative stages. Model performance provides optimistic results but are still subject to bias and have not been properly validated. Most of the models developed are aimed at diagnosis, understandably so as binary classifications are easiest to conduct. It is clear, however, that EEG will be more adapted in the model development as it is inherently cheaper and more accessible, which are two key advantages in a clinical setting. With the contribution of larger datasets, clear outlines of methodologies, and other open science research practices, the field has a very promising outlook in aiding clinicians in providing better and more accurate diagnostics, prognosis, and treatment plan.

Acknowledgement

I would like to thank Dr. Denis Engemann for his supervision and support through the process of writing this thesis. I would also like to thank Franck Porteous for helping with basic R programming and formatting and design of the final product. Finally, I would like to thank Dr. Jelle Zuidema for taking the time to be the final assessor for this project.

(17)

Acharya, U. R., Vinitha Sree, S., Swapna, G., Martis, R. J., & Suri, J. S. (2013). Automated EEG analysis of epilepsy: A review. Knowledge-Based Systems, 45, 147–165. https://doi.org/10.1016/j.knosys.2013.02.014

Ang, K. K., & Guan, C. (2013). Brain-Computer Interface in Stroke Rehabilitation. 8.

Azuma, H., Hori, S., Nakanishi, M., Fujimoto, S., Ichikawa, N., & Furukawa, T. A. (2003). An intervention to improve the interrater reliability of clinical EEG interpretations. Psychiatry and Clinical Neurosciences, 57(5), 485–489. https://doi.org/10.1046/j.1440-1819.2003.01152.x Bornmann, L., & Mutz, R. (2015). Growth rates of modern

science: A bibliometric analysis based on the number of publications and cited references: Growth Rates of Modern Science: A Bibliometric Analysis Based on the Number of Publications and Cited References. Journal of the Association for Information Science and Technology, 66(11), 2215–2222. https://doi. org/10.1002/asi.23329

Borsook, D., Becerra, L., & Hargreaves, R. (2011). Biomarkers for chronic pain and analgesia. Part 1: The need, reality, challenges, and solutions. Discovery Medicine, 11(58), 197–207.

Bzdok, D., Altman, N., & Krzywinski, M. (2018). Statistics versus machine learning. Nature Methods, 15(4), 233– 234. https://doi.org/10.1038/nmeth.4642

Bzdok, D., Engemann, D., & Thirion, B. (2020). Inference and Prediction Diverge in Biomedicine. Patterns, 1(8), 100119. https://doi.org/10.1016/j.patter.2020.100119 Cai, H., Han, J., Chen, Y., Sha, X., Wang, Z., Hu, B., Yang,

J., Feng, L., Ding, Z., Chen, Y., & Gutknecht, J. (2018). A Pervasive Approach to EEG-Based Depression Detection. Complexity, 2018, 1–13. https://doi. org/10.1155/2018/5238028

Coquelet, N., De Tiège, X., Destoky, F., Roshchupkina, L., Bourguignon, M., Goldman, S., Peigneux, P., & Wens, V. (2020). Comparing MEG and high-density EEG for intrinsic functional connectivity mapping. NeuroImage, 210, 116556. https://doi.org/10.1016/j. neuroimage.2020.116556

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi. org/10.1007/BF00994018

Cruse, D., Chennu, S., Chatelle, C., Bekinschtein, T. A., Fernández-Espejo, D., Pickard, J. D., Laureys, S., & Owen, A. M. (2011). Bedside detection of awareness in the vegetative state: A cohort study. The Lancet, 378(9809), 2088–2094. https://doi.org/10.1016/S0140-6736(11)61224-5

Drysdale, A. T., Grosenick, L., Downar, J., Dunlop, K., Mansouri, F., Meng, Y., Fetcho, R. N., Zebley, B., Oathes, D. J., Etkin, A., Schatzberg, A. F., Sudheimer, K., Keller, J., Mayberg, H. S., Gunning, F. M., Alexopoulos, G. S., Fox, M. D., Pascual-Leone, A., Voss, H. U., … Liston, C. (2017). Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature Medicine, 23(1), 28–38. https://doi.org/10.1038/ nm.4246

Engemann, D. A., Raimondo, F., King, J.-R., Rohaut, B., Louppe, G., Faugeras, F., Annen, J., Cassol, H., Gosseries, O., Fernandez-Slezak, D., Laureys, S., Naccache, L., Dehaene, S., & Sitt, J. D. (2018). Robust EEG-based cross-site and cross-protocol classification of states of consciousness. Brain, 141(11), 3179–3192. https://doi.org/10.1093/brain/awy251

Feigin, V. L., Vos, T., Nichols, E., Owolabi, M. O., Carroll, W. M., Dichgans, M., Deuschl, G., Parmar, P., Brainin, M., & Murray, C. (2020). The global burden of neurological disorders: Translating evidence into policy. The Lancet Neurology, 19(3), 255–265. https://doi.org/10.1016/ S1474-4422(19)30411-9

Forman, G., & Scholz, M. (2010). Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement. ACM SIGKDD Explorations Newsletter, 12(1), 49–57. https://doi.org/10.1145/1882471.1882479 Freedman, R., Lewis, D. A., Michels, R., Pine, D. S., Schultz,

S. K., Tamminga, C. A., Gabbard, G. O., Gau, S. S.-F., Javitt, D. C., Oquendo, M. A., Shrout, P. E., Vieta, E., & Yager, J. (2013). The initial field trials of DSM-5: New blooms and old thorns. The American Journal of Psychiatry, 170(1), 1–5. https://doi.org/10.1176/appi. ajp.2012.12091189

(18)

Friedman, D., Claassen, J., & Hirsch, L. J. (2009). Continuous electroencephalogram monitoring in the intensive care unit. Anesthesia and Analgesia, 109(2), 506–523. https://doi.org/10.1213/ane.0b013e3181a9d8b5

Geraedts, V. J., Boon, L. I., Marinus, J., Gouw, A. A., van Hilten, J. J., Stam, C. J., Tannemaat, M. R., & Contarino, M. F. (2018). Clinical correlates of quantitative EEG in Parkinson disease: A systematic review. Neurology, 91(19), 871–883. https://doi.org/10.1212/ WNL.0000000000006473

Goldfine, A. M., Bardin, J. C., Noirhomme, Q., Fins, J. J., Schiff, N. D., & Victor, J. D. (2013). Reanalysis of “Bedside detection of awareness in the vegetative state: A cohort study.” The Lancet, 381(9863), 289–291. https://doi.org/10.1016/S0140-6736(13)60125-7 Gorgolewski, K. J., & Poldrack, R. A. (2016). A Practical

Guide for Improving Transparency and Reproducibility in Neuroimaging Research. PLOS Biology, 14(7), e1002506. https://doi.org/10.1371/journal. pbio.1002506

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2020). dplyr: A Grammar of

Data Manipulation. R package version 1.0.2. https:// CRAN.R-project.org/package=dplyr

Halford, J. J., Shiau, D., Desrochers, J. A., Kolls, B. J., Dean, B. C., Waters, C. G., Azar, N. J., Haas, K. F., Kutluay, E., Martz, G. U., Sinha, S. R., Kern, R. T., Kelly, K. M., Sackellares, J. C., & LaRoche, S. M. (2015). Inter-rater agreement on identification of electrographic seizures and periodic discharges in ICU EEG recordings. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology, 126(9), 1661– 1669. https://doi.org/10.1016/j.clinph.2014.11.008 Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., &

Lounasmaa, O. V. (1993). Magnetoencephalography— Theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics, 65(2), 413–497. https://doi.org/10.1103/ RevModPhys.65.413

Harrison, A. H., & Connolly, J. F. (2013). Finding a way in: A review and practical evaluation of fMRI and EEG for

detection and assessment in disorders of consciousness. Neuroscience & Biobehavioral Reviews, 37(8), 1403– 1419. https://doi.org/10.1016/j.neubiorev.2013.05.004 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The

Elements of Statistical Learning. Springer New York. https://doi.org/10.1007/978-0-387-84858-7

Jalili, M., & Knyazeva, M. G. (2011). EEG-based functional networks in schizophrenia. Computers in Biology and Medicine, 41(12), 1178–1186. https://doi.org/10.1016/j. compbiomed.2011.05.004

Jas, M., Engemann, D. A., Bekhti, Y., Raimondo, F., & Gramfort, A. (2017). Autoreject: Automated artifact rejection for MEG and EEG data. NeuroImage, 159, 417– 429. https://doi.org/10.1016/j.neuroimage.2017.06.030 Khatun, S., Morshed, B. I., & Bidelman, G. M. (2019).

A Single-Channel EEG-Based Approach to Detect Mild Cognitive Impairment via Speech-Evoked Brain Responses. IEEE Transactions on Neural Systems and Rehabilitation Engineering: A Publication of the IEEE Engineering in Medicine and Biology Society, 27(5), 1063–1070. https://doi.org/10.1109/ TNSRE.2019.2911970

Koenig, T., Smailovic, U., & Jelic, V. (2020). Past, present and future EEG in the clinical workup of dementias. SI: Imaging in Neurodegeneration, 306, 111182. https://doi. org/10.1016/j.pscychresns.2020.111182

Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278– 2324. https://doi.org/10.1109/5.726791

Lenartowicz, A., & Loo, S. K. (2014). Use of EEG to Diagnose ADHD. Current Psychiatry Reports, 16(11), 498. https://doi.org/10.1007/s11920-014-0498-0 Liang, Z., Shao, S., Lv, Z., Li, D., Sleigh, J. W., Li, X., Zhang,

C., & He, J. (2020). Constructing a Consciousness Meter Based on the Combination of Non-Linear Measurements and Genetic Algorithm-Based Support Vector Machine. IEEE Transactions on Neural Systems and Rehabilitation Engineering : A Publication of the IEEE Engineering in Medicine and Biology Society, 28(2), 399–408. https:// doi.org/10.1109/TNSRE.2020.2964819

Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy, A., & Yger, F. (2018). A review

(19)

of classification algorithms for EEG-based brain– computer interfaces: A 10 year update. Journal of Neural Engineering, 15(3), 031005. https://doi. org/10.1088/1741-2552/aab2f2

Michel, C. M., & Brunet, D. (2019). EEG Source Imaging: A Practical Review of the Analysis Steps. Frontiers in Neurology, 10, 325. https://doi.org/10.3389/ fneur.2019.00325

MJ Westgate (2019) revtools: An R package to support article screening for evidence synthesis. Research Synthesis Methods http://doi.org/10.1002/jrsm.1374

Oddo, M., Carrera, E., Claassen, J., Mayer, S. A., & Hirsch, L. J. (2009). Continuous electroencephalography in the medical intensive care unit. Critical Care Medicine, 37(6), 2051–2056. https://doi.org/10.1097/ CCM.0b013e3181a00604

Olesen, J., Gustavsson, A., Svensson, M., Wittchen, H.-U., Jönsson, B., on behalf of the CDBE2010 study group, & the European Brain Council. (2012). The economic cost of brain disorders in Europe: Economic cost of brain disorders in Europe. European Journal of Neurology, 19(1), 155–162. https://doi.org/10.1111/j.1468-1331.2011.03590.x

Pfurtscheller, G., Muller-Putz, G. R., Scherer, R., & Neuper, C. (2008). Rehabilitation with Brain-Computer Interface Systems. Computer, 41(10), 58–65. https:// doi.org/10.1109/MC.2008.432

R Core Team (2020). R: A language and environment for statistical computing. R Foundation

for Statistical Computing, Vienna, Austria. URL https:// www.R-project.org/.

Roy, Y., Banville, H., Albuquerque, I., Gramfort, A., Falk, T. H., & Faubert, J. (2019). Deep learning-based electroencephalography analysis: A systematic review. Journal of Neural Engineering, 16(5), 051001. https:// doi.org/10.1088/1741-2552/ab260c

Sabbagh, D., Ablin, P., Varoquaux, G., Gramfort, A., & Engemann, D. A. (2020). Predictive regression modeling with MEG/EEG: from source power to signals and cognitive states. NeuroImage, 222, 116893. https://doi. org/10.1016/j.neuroimage.2020.116893

Schnakers, C., Vanhaudenhuyse, A., Giacino, J., Ventura, M.,

Diagnostic accuracy of the vegetative and minimally conscious state: Clinical consensus versus standardized neurobehavioral assessment. BMC Neurology, 9(1), 35. https://doi.org/10.1186/1471-2377-9-35

Schwandt, H., & Wuppermann, A. (2016). The youngest get the pill: ADHD misdiagnosis in Germany, its regional correlates and international comparison. Health and the Labour Market, 43, 72–86. https://doi.org/10.1016/j. labeco.2016.05.018

Segato, A., Marzullo, A., Calimeri, F., & De Momi, E. (2020). Artificial intelligence for brain diseases: A systematic review. APL Bioengineering, 4(4), 041503. https://doi. org/10.1063/5.0011697

Selim, A. E., Wahed, M. A., & Kadah, Y. M. (2008). Machine Learning Methodologies in Brain-Computer Interface Systems. 2008 Cairo International Biomedical Engineering Conference, 1–5. https://doi.org/10.1109/ CIBEC.2008.4786106

Singh, S. P. (2014). Magnetoencephalography: Basic principles. Annals of Indian Academy of Neurology, 17(Suppl 1), S107-112. https://doi.org/10.4103/0972-2327.128676

Struve, F. A., Becka, D. R., Green, M. A., & Howard, A. (1975). Reliability of Clinical Interpretation of the Electroencephalogram. Clinical Electroencephalography, 6(2), 54–60. https://doi. org/10.1177/155005947500600202

Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. PLOS ONE, 14(11), e0224365. https://doi. org/10.1371/journal.pone.0224365

van Donselaar, C. A., Schimsheimer, R.-J., Geerts, A. T., & Declerck, A. C. (1992). Value of the Electroencephalogram in Adult Patients With Untreated Idiopathic First Seizures. Archives of Neurology, 49(3), 231–237. https://doi.org/10.1001/ archneur.1992.00530270045017

Varoquaux, G. (2018). Cross-validation failure: Small sample sizes lead to large error bars. New Advances in Encoding and Decoding of Brain Signals, 180, 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061 Varoquaux, G., Raamana, P. R., Engemann, D. A.,

(20)

Hoyos-and tuning brain decoders: Cross-validation, caveats, and guidelines. Individual Subject Prediction, 145, 166– 179. https://doi.org/10.1016/j.neuroimage.2016.10.038 Velu, P. D., & de Sa, V. R. (2013). Single-trial classification of

gait and point movement preparation from human EEG. Frontiers in Neuroscience, 7. https://doi.org/10.3389/ fnins.2013.00084

Wannez, S., Heine, L., Thonnard, M., Gosseries, O., Laureys, S., & Coma Science Group collaborators. (2017). The repetition of behavioral assessments in diagnosis of disorders of consciousness. Annals of Neurology, 81(6), 883–889. https://doi.org/10.1002/ana.24962

Whelan, R., & Garavan, H. (2014). When Optimism Hurts: Inflated Predictions in Psychiatric Neuroimaging. Mechanisms of Aging and Cognition, 75(9), 746–748. https://doi.org/10.1016/j.biopsych.2013.05.014

Woo, C.-W., Chang, L. J., Lindquist, M. A., & Wager, T. D. (2017). Building better biomarkers: Brain models in translational neuroimaging. Nature Neuroscience, 20(3), 365–377. https://doi.org/10.1038/nn.4478

Woody, R. H. (1968). Inter-judge reliability in clinical electroencephalography. Journal of clinical Psychology.

(21)

Supplementary

Supplementary Table 1. Boolean searches for each database.

Database Boolean Search

PubMed ((Brain[Title/Abstract]) OR (neural[Title/Abstract]) OR (neuro*[Title/Abstract])) AND ((machine learning[Title/Abstract]) OR (statistical learning[Title/Abstract]) OR (multivariate pattern[Title/Abstract]) OR (pattern classif*[Title/Abstract]) OR (pattern recog*[Title/Abstract]) OR (support vector[Title/ Abstract]) OR (SVM[Title/Abstract])OR (classif*[Title/Abstract]) OR (random forest[Title/Abstract]) OR (deep learning[Title/Abstract]) OR (neural network[Title/Abstract]) OR (LASSO[Title/Abstract]) OR (ridge[Title/Abstract]) OR (trees[Title/Abstract]) OR (tree-based[Title/Abstract]) OR (linear discriminant analysis[Title/Abstract]) OR (LDA[Title/Abstract]) OR (deep learning[Title/Abstract]) OR (representation learning[Title/Abstract]) OR (neural network*[Title/Abstract]) OR (convolutional neural network*[Title/Abstract]) OR (ConvNet[Title/Abstract]) OR (CNN[Title/Abstract]) OR (recurrent neural network*[Title/Abstract]) OR (RNN[Title/Abstract]) OR (long short-term memory[Title/Abstract]) OR (LSTM[Title/Abstract])) AND ((EEG[Title/Abstract]) OR (electroencephalograph*[Title/Abstract]) OR (magnetoencephalograph*[Title/Abstract]) OR (MEG[Title/Abstract])) AND ((Parkinson*[Title/ Abstract]) OR (PTSD[Title/Abstract]) OR (Post-traumatic[Title/Abstract]) OR (ADHD[Title/Abstract]) OR (attention-deficit [Title/Abstract]) OR (attention deficit[Title/Abstract]) OR (anxiety[Title/ Abstract]) OR (schizo*[Title/Abstract]) OR (Psychosis[Title/Abstract]) OR (Bipolar[Title/Abstract]) OR (Depress*[Title/Abstract]) OR (Autis*[Title/Abstract]) OR (Pain[Title/Abstract]) OR (Alzheimer*[Title/ Abstract]) OR (brain age[Title/Abstract]) OR (DOC[Title/Abstract]) OR (disorders of consciousness[Title/ Abstract]) OR (coma[Title/Abstract]) OR (concuss*[Title/Abstract]) OR (stroke[Title/Abstract]) OR (anoxia[Title/Abstract]) OR (dementia[Title/Abstract]) OR (multiple sclerosis[Title/Abstract]) OR (insomnia[Title/Abstract]) OR (narcolepsy[Title/Abstract]) OR (chronic fatigue[Title/Abstract]) OR (obsessive compulsive[Title/Abstract]) OR (OCD[Title/Abstract]) OR (vegetative state[Title/ Abstract]) OR (minimally conscious state[Title/Abstract]) OR (traumatic brain injury[Title/Abstract]) OR (unresponsive wakefulness[Title/Abstract]))NOT ((BCI[Title/Abstract]) OR (brain-computer interface[Title/Abstract]) OR (epilep*[Title/Abstract]) OR (seizure*[Title/Abstract]) OR (emotion regulation[Title/Abstract]) OR (driver fatigue[Title/Abstract]) OR (HCI[Title/Abstract]) OR (human-computer interaction[Title/Abstract]) OR (Position Paper[Title/Abstract]) OR (mouse[Title/Abstract]) OR (mice[Title/Abstract]) OR (rat*[Title/Abstract]) OR (piglet[Title/Abstract] OR BMI[Title/Abstract] OR Brain-machine interface[Title/Abstract]))

Web of Science TI=((Brain OR neural OR neuro*) AND (“machine learning” OR “statistical learning” OR “multivariate pattern” OR “pattern classif*” OR “pattern recog*” OR “support vector” OR SVM OR classif* OR random forest OR “deep learning” OR “neural network*” OR LASSO OR ridge OR trees OR tree-based OR “linear discriminant analysis” OR LDA OR “deep learning” OR “representation learning” OR “neural network*” OR “convolutional neural network*” OR ConvNet OR CNN OR “recurrent neural network*” OR RNN OR “long short-term memory” OR LSTM) AND (EEG OR electroencephalography OR magnetoencephalography OR MEG) AND (Parkinson* OR PTSD OR Post-traumatic OR ADHD OR attention-deficit OR “attention deficit” OR anxiety OR schizo* OR Psychosis OR Bipolar OR Depress* OR Autis* OR Pain OR Alzheimer* OR “brain age” OR DOC OR “disorders of consciousness” OR coma OR concuss* OR stroke OR anoxia OR dementia OR “multiple sclerosis” OR “insomnia” OR “narcolepsy” OR “chronic fatigue” OR “obsessive compulsive” OR OCD OR “vegetative state” OR “minimally conscious state” OR “traumatic brain injury” OR “unresponsive wakefulness”)NOT (BCI OR brain-computer interface OR epilep* OR seizure* OR emotion regulation OR driver fatigue OR HCI OR

(22)

Web of Science OR

AB=(( Brain OR neural OR neuro*) AND (“machine learning” OR “statistical learning” OR “multivariate pattern” OR “pattern classif*” OR “pattern recog*” OR “support vector” OR SVM OR classif* OR random forest OR “deep learning” OR “neural network*” OR LASSO OR ridge OR trees OR tree-based OR “linear discriminant analysis” OR LDA OR “deep learning” OR “representation learning” OR “neural network*” OR “convolutional neural network*” OR ConvNet OR CNN OR “recurrent neural network*” OR RNN OR “long short-term memory” OR LSTM) AND (EEG OR electroencephalograph OR magnetoencephalograph OR MEG) AND (Parkinson* OR PTSD OR Post-traumatic OR ADHD OR attention-deficit OR “attention deficit” OR anxiety OR schizo* OR Psychosis OR Bipolar OR Depress* OR Autis* OR Pain OR Alzheimer* OR “brain age” OR DOC OR “disorders of consciousness” OR coma OR concuss* OR stroke OR anoxia OR dementia OR “multiple sclerosis” OR “insomnia” OR “narcolepsy” OR “chronic fatigue” OR “obsessive compulsive” OR OCD OR “vegetative state” OR “minimally conscious state” OR “traumatic brain injury” OR “unresponsive wakefulness”)NOT (BCI OR brain-computer interface OR epilep* OR seizure* OR emotion regulation OR driver fatigue OR HCI OR human-computer interaction OR mouse OR mice OR rat OR piglet OR BMI OR brain-machine interface))

OR

AK=(( Brain OR neural OR neuro*) AND (“machine learning” OR “statistical learning” OR “multivariate pattern” OR “pattern classif*” OR “pattern recog*” OR “support vector” OR SVM OR classif* OR random forest OR “deep learning” OR “neural network*” OR LASSO OR ridge OR trees OR tree-based OR “linear discriminant analysis” OR LDA OR “deep learning” OR “representation learning” OR “neural network*” OR “convolutional neural network*” OR ConvNet OR CNN OR “recurrent neural network*” OR RNN OR “long short-term memory” OR LSTM) AND (EEG OR electroencephalograph OR magnetoencephalograph OR MEG) AND (Parkinson* OR PTSD OR Post-traumatic OR ADHD OR attention-deficit OR “attention deficit” OR anxiety OR schizo* OR Psychosis OR Bipolar OR Depress* OR Autis* OR Pain OR Alzheimer* OR “brain age” OR DOC OR “disorders of consciousness” OR coma OR concuss* OR stroke OR anoxia OR dementia OR “multiple sclerosis” OR “insomnia” OR “narcolepsy” OR “chronic fatigue” OR “obsessive compulsive” OR OCD OR “vegetative state” OR “minimally conscious state” OR “traumatic brain injury” OR “unresponsive wakefulness”)NOT (BCI OR brain-computer interface OR epilep* OR seizure* OR emotion regulation OR driver fatigue OR HCI OR human-computer interaction OR mouse OR mice OR rat OR piglet OR BMI OR brain-machine interface))