• No results found

Towards Decoding Complex Mouth Movements: A high-field fMRI study

N/A
N/A
Protected

Academic year: 2021

Share "Towards Decoding Complex Mouth Movements: A high-field fMRI study"

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Movements

A high-field fMRI study

Jakob Andrée 6249450

MSc Brain and Cognitive Sciences, University of Amsterdam Track Cognitive Neuroscience

Supervisor: MSc Martin Bleichner and Prof. Nicolas Ramsey UMC, Utrecht Co-Assessor and UvA representative: Dr. Tomas Knapen

(2)

Abstract

One crucial aspect of developing a successful brain-computer interface (BCI) is to identify neural events that when translated into commands allow for flexible control for the user. In this proof-of-principle study, we used high-field 7T fMRI to examine whether the cortical signatures of phoneme-production could be reliably identified. A correlation analysis was used where single-trials were compared to averaged prototypes of four different speech sounds and the match that exhibited the greatest similarity was chosen as label. The classification performance was on average 49%. Reduction of the number of categories to be distinguished to two and three phonemes did not lead to substantial increases in classifiability and this and other results indicate that there is excessive overlap between the cortical representations of phonemes. However, data from one participant performing single, isolated movements with either: jaw, lips, tongue, or larynx, reached a classification of 100% and 95% for two separate data sets. These results demonstrate that the representations of unique speech muscles are stable and separable. The implications of these findings are discussed with regard to the organization of primary motor cortex and recommendations are given for further research seeking to identify multidimensional BCI control signals based on language production.

Introduction

The underlying aim of the present study was to gauge the feasibility of employing phoneme production and its concomitant neural activation over the primary motor cortex (M1) and sensory cortex (S1) as the neurophysiological signal with which to control a brain-computer interface (BCI). By classifying and translating brain signals into commands, BCIs have huge potential to assist patient groups suffering from loss of motor function (i.e., locked in syndrome (LIS), spinal cord lesions, and amyotrophic lateral sclerosis (ALS)) (Hochberg et al., 2012). These neurophysiological signals can be measured through a number of different techniques and these generally come with trade-offs between invasiveness on the one hand, and signal resolution and signal power on the other. A promising approach is electrocorticography (ECoG) –based BCIs in which a grid of electrodes is placed on top of the exposed cortical surface. A number of cognitive faculties have been shown to enable control of ECoG-based BCIs (cognitive control: VanSteensel et al., 2010; visuospatial attention:

(3)

Multidimensional BCI control signals, such as the four directions of covert attention reported by Andersson and colleagues (2012), that allow for reliable differentiation between several discrete neural activation patterns are highly desirable since they enable greater control and flexibility for the user. One practical constraint imposed on such a multidimensional signal through ECoG is that it should be confined to a relatively small area of the cortex in order to minimize both the extent of the craniotomy needed for introducing the electrode grid, as well as the risk of infection inherent to invasive surgery.

Candidate neural events for an ECoG-based BCI control signal can and have been evaluated beforehand with fMRI (Hermes et al., 2011). In that study, Hermes and colleagues employed fMRI and electroencephalography (EEG) to investigate activation patterns during motor imagery for potential use in ECoG-based BCI research. The area previously identified by fMRI was co-localized with significant EEG power changes which were in turn classified to an excellent degree of accuracy (91%).

Identification of potential ECoG control signals with fMRI is possible due to the considerable overlap that has been reported between the high-frequency ECoG components (upper gamma-range: >60Hz) and the BOLD response (Lachaux et al., 2007; Hermes et al., 2012). This relationship is of particular interest to the present study since high gamma frequency components are augmented over motor cortex during speech production (Korzeniewska, et al., 2011; Cho-Hisamoto et al., 2012). Patterns of haemodynamic activation captured with high-field fMRI as subjects produce the phonemes can therefore be used to inform researchers of optimal placement of ECoG grids on an individual basis.

Speech, and its associated patterns of neural activity over the primary motor cortex (M1), represents a potentially ideal candidate for a multidimensional ECoG-based BCI control signal for several reasons. Firstly, speech as a cognitive action is intuitive, effortless, and well-trained. Secondly, using a signal based on actual language processes would also minimize the mapping needed between production of the control signal and the desired output. Further, the related activation is confined to a relatively small spatial area and therefore accessible through a single, restricted craniotomy as well as recording through a single high-density ECoG electrode grid.

The inherent structure of functional somatotopic organization (discussed below) of M1 adds further advantages to this region as a candidate source for an effective BCI signal. Here, differentiation of various speech effectors (unique muscle groups mediating speech) involved in the production of phonemes based on their cortical representations may extend the dimensionality of the signal in a manner not possible over areas that do not exhibit somatotopy or other forms of orderly organization. Since the manner in which these effectors contribute to an individual phoneme is variable, each phoneme therefore produces distinct cortical activation patterns which may then, hypothetically, be classified.

The principles underlying the somatotopic organization of the primary motor cortex (M1) which maps parts of the body onto the cortical surface has long been the subject of debate. That larger segments of the body are represented over discrete areas with the lower limb in the ventral-most portion of the M1, the head most laterally, and the representation of the upper limb lying intermediately between the two, is essentially undisputed. Whether or not the smaller body parts within these segments are mapped in a similarly discrete fashion is however questionable. Results from numerous studies seeking to elucidate the underlying representational principles of different body parts indicate a considerable overlap between

(4)

proximal body parts such as lips, tongue, fingers, hand, and arm (Schieber and Hibbard, 1993; Dechent and Frahm, 2003; Meier et al., 2008). This has been seen to indicate that the principles of somatotopic organization hold true for representations of larger segments, i.e., a

between-limb distinction (Grafton et al., 1991), but not for representations of smaller body

parts within those areas such as facial muscles and fingers, i.e., the within-limb distinction (Schieber and Hibbard, 1993; Lotze et al., 2000; Kapreli et al., 2007).

In the issue of between- and within-limb somatotopy there are essentially two opposing interpretations. One view holds that the distinction reflects a homuncular organization with discrete centers of representation that manifest in an overlapping manner due to, for example, bidirectional horizontal connections (Huntley and Jones, 1991). The other that it is indicative of a shared neural substrate which possesses a rough mapping of the major body parts but is overlapping and distributed for smaller parts (Sanes, 1995; Schieber, 2001). While this debate is ongoing, there is evidence supporting a functional somatotopy, that is, fine-scale muscle groups such as individual digits or facial muscles being mapped according to a somatotopic principle over an area that is overlapping with adjacent muscles’ maps (Dechent and Frahm, 2003; Rathelot and Strick, 2006; Plow et al., 2010). The cortical representation of a given muscle group would then be organized as a quantitative predominance for that muscle group over an area shared by adjacent groups as well (Dechent and Frahm, 2003).

Very little research has been done on the mapping of speech muscles over M1/S1 and from the extant literature none directly addresses the question of whether or not adjacent speech muscles like lips, tongue, and jaw display a variable overlap dependent on the target mouth movement being performed. The literature does suggest, however, that some speech effectors display a clearer somatotopic representation over M1 in comparison to others, when it comes to normal speech and word production (Takai et al., 2010). These are areas controlling the muscles of the lips, tongue, pharynx, and larynx (Lotze et al., 2000; Meier et al., 2008; Brown, Ngan, and Liotti, 2008) and as such are involved in nearly all human speech.

What complicates matters is that areas identified for the different speech effectors are the result of contrasting different movements to each other or to rest, with the consequence that any overlap between movements is not taken into account (Brown et al., 2009; Takai et al., 2010). For example, Takai and colleagues (2010) performed an extensive literature review and calculated the activation likelihood estimate (ALE), a tool for comparing neuroimaging results in meta-analyses, for reported foci of BOLD activity for each of four different speech-related movements which were subsequently compared. The authors concluded that while there was significant overlap between the centers of brain activation clusters reported from the selected studies for respiration, lip movement, tongue movement, and swallowing, there does seem to be considerable support for distinct representation for each of the compared movements over the area of the M1 representing the face. Whether or not these identified areas would be discernible for complex movements that involves the activation of several

(5)

to localize the vocal center and contrast activation between voiced or unvoiced speech (Brown et al., 2008; Brown et al., 2009; for a review see: Simonyan & Horwitz, 2011).

In this proof-of-principle study we first performed a pilot experiment in order to identify four phonemes that are maximally differential with regard to the muscle activity they elicit of the main speech effectors. Participants produced a large number of phonemes while their electromyographic (EMG) activity and voice signal were recorded and analyzed. The underlying assumption was the relationship between speech effectors, their differential contribution to the production of phonemes, and their topographic representation in the motor cortex.

For the main experiment we employed high-field fMRI (7T) scanning of subjects as they produced four phonemes. The rationale was to evaluate single-trial classification of the associated BOLD activation patterns and its suitability as a multidimensional control signal for ECoG-based BCI. Additionally, to get an indication of whether or not isolated movements of the different speech effectors could be classified to a higher extent than the complex movements that are involved in phoneme production, one subject was scanned using a modified version of the task from the main study involving four single movements of speech-related muscle groups.

Materials and Methods

Pilot Study

Four native Dutch speakers were recruited as participants in a pilot study designed to identify a subset of 4 phonemes that are maximally different in terms of their required muscle activity. Participants were asked to produce 30 phonemes (the Dutch alphabet) while their muscle activity from selected facial muscles was recorded. The underlying assumption was a relationship between unique muscle groups (or speech effectors), their differential contribution to the production of phonemes, and their topographic representation in the motor cortex. Assuming that the primary motor cortex exhibits at least a rough somatotopic representation (as supported by several studies, e.g., Dechent & Frahm, 2003), identifying suitable candidates that hypothetically would lead to the most discriminable BOLD patterns should be possible based on analysis of the electromyographic (EMG) responses from phoneme-production. These EMG responses were recorded from the most prominent speech effectors as participants produced a range of speech sounds. At least one previous study has reported an overlap between EMG activity and BOLD activity and that EMG activity successfully predicts brain activity (van Rotselaar et al., 2007).

As method of selecting the phonemes and electrode placements that were most informative for successful differentiation, we chose to combine comparison of the averaged EMG responses from all ten trials of each phoneme along with a clustering analysis.

In order to find out which phonemes were maximally similar to themselves over trials and maximally different from each other based on the EMG data, we performed an Affinity Propagation (AP) clustering analysis. The AP clustering analysis is a method that has been shown to result in lower errors than other approaches, taking only a fraction of the time, and

(6)

has since its development been applied to a range of scientific fields, from image segmentation and gene transcript analysis from microarray data, to machine learning (Frey & Dueck, 2007; Leone et al., 2007; Lu, 2008).

The current fMRI setup included the possibility of recording simultaneous EMG activity from two channels during scanning which makes possible reliable measurements of subject performance during the task. Seeking to ensure that the information captured by the EMG recordings during scanning would be maximized, we added the goal of identifying the optimal electrode-pair placements. This was defined as the two placements that contributed with the most information of each phoneme’s associated muscle-activation patterns to the pilot study.

Procedure

Four participants took part in this study (all male and native Dutch-speakers) where the task consisted of producing 10 trials of 30 phonemes (Dutch alphabet) in response to a visual cue. The participants were seated in front of a computer screen that displayed a white fixation cross on a black background. Every two seconds a word was displayed in grey in which the central letter(s) corresponding to the target speech sound was displayed in white. The target phonemes were presented in the context of a word to ensure correct pronunciation. In the instructions preceding the experiment, the participants were told to produce the sound in as relaxed a manner as possible in order to mimic conditions inside the fMRI scanner. While performing the task, the participants were fitted with ten (Ag/Au) bipolar EMG electrodes (see fig. 1 for electrode placement) of which 8 were placed over five main speech effectors, one served as ground, and one as common reference. The latter two electrodes were located over the mastoid processes behind each ear. The placements were taken from Fridlund and Cacioppo’s (1986) guideline-paper on electrode placement for facial EMG measurements. Additionally, a microphone mounted above the participants recorded each trial in order to provide voice onset times and reliable measurements of subject performance.

Figure 1 Electrode placements of the 8 EMG channels used in the pilot study (blue) and the 2 locations that were subsequently selected for the main study (red). Note: the ground and reference electrodes (placed over mastoids) are omitted from this figure. Adapted from Fridlund and Cacioppo, 1986

(7)

Data analysis EMG

All analyses of the EMG measurements were performed off-line in MATLAB (7.14 (R2012a), Mathworks Inc., Natick, MA, 2000). After initial file conversion, the EMG data was re-referenced. This entails removal of noise unrelated to the task by subtracting the activity recorded by the common reference from the active channels and was performed with the re-referencing function supplied by the MATLAB toolbox Fieldtrip (http://www.ru.nl/neuroimaging/fieldtrip). The data was then smoothed with a smoothing kernel of 70 points and epoched into 1.5 second long sequences centered on voice onset. The epochs were subsequently re-shifted to center each segment on the peak value found within that stretch of data. This was done in order to ensure that the whole movement was captured within each 1.5s sequence. Subsequently, the epochs underwent trial-rejection based on the recorded voice samples. We then calculated the average response for each phoneme along with their standard deviation (STD) to gauge which phonemes elicited the greatest responses and get an indication of inter-trial variability in the manner of producing the sounds across the ten trials. This step was then repeated for all possible combinations of electrode-pairs to identify the two placements that were the most informative.

An affinity propagation (AP) clustering analysis was next performed to serve as a measure of which phonemes’ EMG pattern were the most differential with respect to each other. Since it was unknown which aspects of the phoneme and its concurrent muscle-activity would be optimal for differentiation, performing a clustering analysis could highlight which aspects of the responses were decisive for telling the phonemes apart. For example, would phonemes that activate relatively few muscle groups be easier to differentiate than phonemes that involve coordinated movements involving more effectors? Would plosives, consonants that involve brief blocking of airflow through the vocal tract, be easier to tell apart due to their more explosive muscle movements?

Additionally, since the AP analysis classified each single trial independently, it too served as a measure of inter-trial variability which could be taken into account along with the indications from analysis of the averaged EMG responses. This was important since we wished to identify candidate phonemes that consistently exhibited robust muscle activity that differed maximally from each other.

The AP clustering analysis (Frey & Dueck, 2007) takes measures of similarity between pairs of data points (e.g., all EMG samples for each trial) as input and considers all data points as potential exemplars. Exemplars are data points that represent the ideal representative for each cluster and are formally defined as the points where the sum of squared errors between data points and their nearest centers is smallest. The potential exemplars are then assigned weights which in turn are iteratively updated until the ideal number of exemplars has been identified and each data point has been assigned to one cluster. The number of clusters that the analysis can result in can be influenced by the preference value given as a parameter by the user. For data where all data points are equally suitable to serve as exemplars, a shared common value should be employed, the size of which influences the number of clusters identified. Using the median of the data as input results in a moderate number for clusters whereas inputing their minimum results in a conservative number of clusters. Since our ambition was to explore which phonemes would be most likely to be successfully classified in the main study (as measured by their tendency to be clustered separately in a consistent manner across subjects), we chose to use the median of the EMG responses as preference value for the AP clustering analysis.

(8)

We used an iterative approach in analyzing the data from the averaged responses and the clustering analysis. The information from the averaged EMG responses was inspected to identify suitable candidate phonemes that evinced a substantial and robust pattern over all trials. These were then compared to the output from the clustering analysis and the number of candidates refined based on which candidates were consistently clustered over all ten trials as either independent clusters or together with other phonemes that did not overlap between the candidates. We then sought to identify the electrode-pair that exhibited the largest response for each candidate phoneme by comparing the averaged EMG responses and their STD. The channels that exhibited distinct responses for all trials for each phoneme along with small STDs were then used as input for a second iteration of the AP clustering analysis. The analysis then continued by repeating these two steps until the ideal electrode-pair and phonemes were identified.

Results

Analysis of the averaged responses lead to the identification of the electrode channels 2 (see fig. 2), capturing the activity of both the Zygomaticus Major and Minor muscles whose function in speech is raising the upper lip , and 5, placed over the Mentalis muscle which is involved in protruding and closing the lips (Ladefoged, 2005). The affinity propagation clustering analysis identified on average 29 clusters (for subjects 1-4: 31; 31; 28; 27, see fig. 3) with the preference constant set to the median of the EMG activity for each trial and all channels. Re-performing the analysis with only two electrodes identified through analysis of the averaged

EMG responses lead to 28 clusters but a significantly lower trial-by-trial accuracy for each phoneme.

The iterative analysis of the averaged EMG responses along with their STD and the clustering analysis resulted in the selection of phonemes j, o, l, and ee.

All Good Trials of pho j for chan 6 All Good Trials of pho j for chan 8

All Good Trials of pho j for chan 5 All Good Trials of pho j for chan 2

Figure 2 Representative examples of EMG responses for all ten trials for phoneme j taken from two electrodes (6 and 8; see fig. 1 for corresponding placement) that displayed weak responses (upper row) and two electrodes (2 and 5) that exhibited clear task-related responses. Seen in the upper portion of all four plots are the responses for each of the trials with the vertical lines denoting voice onset times. Plotted in the lower portion are the averaged responses in black with the std depicted in grey

(9)
(10)

Figure 3 Result of the AP clustering analysis for subject 1. Depicted on the y-axis (and also plotted on the x-axis, though not visualized) is each of the 30 phonemes used in the pilot study. Each green dot represents one good trial and the cell in which it is positioned along the y-axis corresponds to the phoneme it has been clustered with. Further, each cell contains ten steps, one for each trial per condition, where trial 1 is plotted at the bottom of the cell. The height within each cell in which the cluster is plotted represents the trial which served as exemplar for that cluster. The red dot denotes a rejected trial. Encircled in red are the 4 identified phonemes that were selected for the main study. These were the phonemes: j, o, l, and ee.

(11)

Main study

Participants and Procedure

Native Dutch-speaking participants (n=9, 2 female) were recruited for the main experiment which was designed to investigate whether or not BOLD responses from phoneme production could be reliably classified. The study was approved by the ethical committee of the University Medical Center Utrecht and all subjects fulfilled the inclusion criteria established by the METC protocol and signed a document of informed consent. They subsequently underwent a brief training session while seated in front of a computer screen prior to entering the scanner. This was done in order to verify that subjects pronounced the phonemes phonemically (as speech sounds) and not phonetically (as letter sounds) and to familiarize them with the procedure. In the oral instructions given to each subject, emphasis was placed on pronouncing the phonemes in a relaxed manner in order to minimize head movement, yet enunciate as clearly as possible.

The participants were then prepared and fitted with four MRI-compatible EMG electrodes. Two channels recorded the electromyographic activity related to speech production from the Zygomatic Major/Minor and Mentalis muscles (see fig. 1 for location) while the other two served as ground and common reference, respectively, and were placed on the mastoid process behind each ear. Based on the outcome of the pilot study the four different phonemes that were selected as stimuli for the present experiment were: j, o, l, and ee.

Once inside the fMRI scanner participants laid supine on the scanner bed with foam padding surrounding their heads to fixate its position. The visual cues were projected onto a plastic surface mounted on the head coil which in turn could be viewed through prism glasses worn by the participants via a mirror set at a 45 degree angle facing the projection surface. Presentation® (v.14.9, Neurobehavioral Systems, www.neurobs.com) was employed for

stimulus presentation. Each trial consisted of a visual cue presented for 750 ms, indicating which phoneme was to be pronounced, followed by central white fixation cross on a black background present for the remainder of the trial (i.e., 14.85s for the slow event-related sequence and of random length between 2.6 and 18.2s for the fast event-related sequence). The visual cues contained the letter (two letters in the case of condition ee) corresponding to the target phoneme in white on a black background. The participants were instructed to produce the sound as soon as they saw the visual cue.

For the one participant (8) who performed a different version of the task with single movements instead of phonemes, the task involved four isolated movements of muscle groups intrinsic to speech production (clench jaw, pucker lips, voiced speech sound (i.e., humming), and touch the upper right and then left row of teeth with the tip of the tongue) in response to a visual cue. The muscle groups involved in these movements (jaw, lips, larynx, and tongue) are used to produce the majority of speech sounds and were selected based on findings in the literature indicating that they are discretely represented over M1 (Tamura et al., 2003; Hesselmann et al., 2004; Takai et al., 2010).

(12)

Participant 8 was the last one to be scanned and we hypothesized that these isolated movements would be easier to classify compared to the phonemes since phoneme production involves complex interactions between many muscle groups that would more likely be overlapping in their cortical representation. This follows from the findings of core regions representing fine-scale within-limb segments being surrounded by mappings of adjacent, greater body parts (e.g., the distinction between orofacial muscles and muscles controlling neck movement) and yet, importantly, showing a great degree of overlap in their representations (Dechent and Frahm, 2003; Rathelot and Strick, 2006; Plow et al., 2010). Speech-related, highly interacting and coordinated movements would then suffer to a higher extent of representational overlap in the M1 and be more difficult to classify correctly based on the BOLD response in comparison to movements mainly involving a single speech effector.

Data acquisition

fMRI

Functional data was gathered from a 7T Philips Achieva system with a 32-channel head coil using an EPI sequence (TR/TE =1300/27 ms; FA=70; 20 slices; and slice thickness 1.5 x 1.5 x 1.5 mm). Field of view (FoV) covered primary motor and sensory cortices over the left hemisphere (see fig. 4). The experiment consisted of one fast event-related sequence with 40 trials per condition and two slow event-related sequences with 10 trials per condition each. The fast event-related sequence was made up of 535 volumes with a randomized inter-trial interval (ITI) and overlapping trials according to an interleaved m-sequence (Buracas and Boynton, 2002). The slow event-related sequences were composed of 481 volumes with a fixed ITI that allowed the BOLD response to return to baseline levels between each trial. For both sequences the ordering of conditions was mixed throughout the experiment. The main reason for using two different scan sequences was that the fast event-related sequence allowed for the collection of many trials while minimizing the scan time needed. The slow event-related data, on the other hand, gave us single trials for which the whole time course of the BOLD response was sampled with no overlap present between trials.

(13)

Three of the subjects (see table 1 for an overview) were scanned with a high-density multi-element 16-channel surface coil placed within

the head coil over the homologous FoV as that scanned with the 32-channel head coil. The reason for this was to evaluate whether classification rate could be improved by increasing the signal-to-noise ratio (SNR) afforded by the high-density surface coil. The reason for this is that smaller coil elements allow for greater SNR and the high-density surface coil contains elements much smaller than the standard 32-channel coil. Since it is placed directly over the head of the participant it is in effect much closer to the tissue of interest, something that also boosts the SNR. In a recent study, Petridou and colleagues

(2012) reported a four-fold increase in SNR for the surface coil over a standard 16-channel head coil and even greater increases in SNR if the use of the surface coil was combined with use of sensitivity encoding (SENSE; Pruessmann et al., 1999), which we employed. While the SNR for 16-channel head coils is lower than for the 32-channel head coils, there is still a significant gain in SNR with the high-density surface coil (Petridou, N. personal communication, August 6, 2012).

EMG

The EMG data was collected using four MRI-compatible electrodes (Precess, InVivo), sampled at a rate of 500 Hz and recorded directly by the software controlling the scan parameters. The Precess system was originally designed for heart-rate measurements during MRI scanning and sends out a gating TTL pulse which is designed to trigger the scanner. This gating signal is based on fitting the measured activity to known components of the electrocardiogram and thus introduces artifacts in the signal when the system is used for EMG purposes. Unfortunately, this component could not be separated from the actual EMG signal and thus rendered any classification of the EMG responses in order to get a measurement of subject performance unreliable and potentially disinformative. Trial rejection of the fMRI data based on these EMG measurements was therefore not performed.

Table 1 Overview of the 7 participants scanned for the main study and the one participant (8) who performed

single movements.

Figure 4 Field of view for the acquired fMRI scans. The chosen area covers primary motor and sensory cortices of the left hemisphere as well as the premotor area and parts of the frontal and parietal cortex.

(14)

Note: Movement correction parameters are expressed as the maximal displacement between two consecutive scans when all scans for the whole session are taken into account. For participant 8 movement parameters are reported for both slow data sets as they were classified individually as test sets.

Data analysis

fMRI

For classification of the BOLD patterns from the phoneme production we employed a correlation analysis where a prototype, a representative example, for each category of phonemes was created from unique features taken from a training data set. These prototypes were then compared to the single trials taken from a separate test set and classified as the prototype to which it was most like based on a similarity measure.

In this study the data from the fast event-related sequence served as training set from which averages of the task-dependent BOLD activation for each phoneme served as prototypes. The main reason for using the fast event-related data for this purpose is that it contained four times as many trials as the slow data set and thus represented a more robust measure of the BOLD activation for each phoneme. The fast data was analyzed using an interleaved m-sequence that generates a design matrix ensuring maximally independent regressors on which to model the haemodynamic response . We then performed a finite impulse response (FIR) analysis which constructed 20 regressors for each phoneme that were successively shifted versions of the actual scanning sequence and reconstructs an estimated BOLD response for each experiemental condition based on the correlation between the data and stimulus events (Gitelman et al., 2003). The major advantage of the FIR analysis is that it allowed for identification of voxels that showed task-dependent changes in activation without making any assumptions about the shape of the BOLD response itself.

The resulting beta values for each voxel in the data represented the averaged, estimated BOLD signal amplitude evoked by each of the phonemes. These beta values were then divided by the residual error to obtain t-statistics of which voxels displayed a significant change in signal intensity due to the phoneme production. Four condition-specific t-maps were created where task-related activity in the fifth scan of each condition was contrasted with the other scans within the same and all other conditions to identify the voxels that showed task-dependent activity. The fifth scan corresponded to the predicted peak of the BOLD response. With a repetition time of 1.3s, this corresponded to 5.2-6.5s (variability due to the m-sequence) following stimulus presentation where the approximated reaction time of the participants is 0.5s. Siero and colleagues (2011) demonstrated that the peak of the BOLD response over motor cortex was reached between 4 and 5s following stimulus presentation for the very same 7T fMRI scanner.

(15)

For participant 8 who performed the modified task with isolated muscle-movements, a mask was created based on the grey and white matter segmentation of the T1 structural scan in order to counter visible effects of muscle-related BOLD activation outside of the actual skull (i.e., originating from the temples and associated with the condition for clenching the jaw).

Pre-processing

All functional MRI data was initially corrected for in standard pre-processing steps in SPM8 (Wellcome Trust). In order to address differences in slice acquisition time across the brain, all scans were corrected for using sinc interpolation by the built-in function in SPM. Subsequently, head movement was corrected for by shifting each initial functional scan for each session to the initial space from the first session. Following motion correction, one subject was excluded from further analysis due to excessive head movement.

Prototypes

To select the features to include in the prototype for each phoneme, the task-dependent voxels from the four t-maps were ordered according to statistical significance (t-value). The 3000 voxels per condition that evinced the greatest signal-change were then added to each prototype. The origin of these voxels was unconstrained since we did not wish to make any assumptions about where the most informative voxels for classification might be situated. In some cases the identified voxels overlapped between prototypes.

For participant 8 the prototypes were based on the slow event-related data. Here, a general linear model (GLM) was employed which contained four contrasts that detailed the time point at which events occurred and convolved these with a canonical haemodynamic response function (HRF) supplied by SPM. The resulting beta values were then transformed into t-statistics which were averaged across the ten trials making up each slow set. This was done to create a representative distribution from which to select the features to include in the prototypes. As for the prototypes created from the fast data sets, the activation at the 5th scan

was used to identify voxels with task-related activity. To rule out that classification based on prototypes created from the slow data set would not lead to significantly different results than classification on prototypes created from the fast data set, classification was re-performed for all participants with two slow data sets. Here, the steps performed for creating the prototypes and subsequent classification were identical to those described above for participant 8.

Due to the possibility that small movement artifacts would contribute to the classification results, the movement correction parameters were added to the GLM as additional regressors. This would give an indication of whether or not classification was affected by movement artifacts as it would be reflected in the classification results.

(16)

Single-trials

Following pre-processing, the data from the slow event-related sequences was de-trended, i.e., the slow drift caused by the scanner was removed, and normalized by dividing the mean activation of each voxel with the standard deviation (STD) of the activation for the whole set. The data was then epoched and ordered for each condition. Beyond this, no further steps were taken for the single trials prior to classification.

Classification

Each single trial’s similarity to the four prototypes was calculated and the trial was subsequently classified as the prototype to which it was most alike. The classification was performed iteratively taking the most significant voxels into account at the start and gradually adding voxels that displayed lower task-dependent significance. As similarity measure we used correlation coefficients between each trial to be classified and the four prototypes. By comparing the expected classification (based on the known condition of each trial) with the actual classification, a measure of the performance could be established for all trials. To verify whether fewer categories of phonemes could be successfully classified to a higher extent than all four phonemes, we also applied our analysis to trials of all possible combinations of two and three phonemes. In order to compare the fit of the estimated BOLD response from the FIR analysis and the standardized HRF supplied by SPM, a linear regression analysis was performed on the training set, fitting the data to a canonical HRF and then repeating the classification.

To investigate whether the location of the voxels selected for the four phonemes differed in a systematic manner, we calculated the center of mass (CoM) for each condition. The CoM measure was developed to allow for comparison of estimated centers of brain activation clusters across fMRI, PET, TMS, and EEG studies and is advantageous in that it allows for localization of neighboring and partly overlapping functional areas (Fesl et al., 2008). The CoM calculation took the coordinates in individual space of the 150 most significant voxels from each prototype and calculated the Euclidian distance between each pair of phonemes/movements according to the following formula:

Estimated distance between phon. j and o =

(

X

j1

-X

o1

) ^

2 + (Y

j1

-Y

o1

) ^

2 + (Z

j1

-Z

o1

) ^

2

These estimated distances between each condition were then normalized by the voxel size (1.5 mm) to result in a measure of millimeters and were subsequently compared across subjects and also between the two tasks of phoneme production and single movements.

(17)

8 participants took part in the study of whom 7 performed phoneme production and one participant performed production of single movements. Results from the classification can be seen in table 2. Since classification of the single trials necessarily resulted in one selected match out of the four templates, the chance-level was at 25% for all trials. For the classification of the phonemes the maximal accuracy for the 7 participants was highly variable, ranging from 35% to 80% for all trials. The average maximal classification accuracy for all subjects and all data sets reached 49.5% (STD 12.59%). Adding a linear regression analysis that fitted the canonical HRF used by SPM8 to the test data resulted in a variable change in classification accuracy across the subjects while the average accuracy over all subjects remained almost unchanged at 52.5% (STD 10.12%). Averaged mean classification over 1500 voxels for each data set was 38.5% (STD 13.25%) for the participants performing phoneme-production.

For the classification of single movements maximal accuracy reached 100% for one of the data sets and 95% for the other with a mean classification of 96% and 92.5% when considering classification over 1500 voxels.

Figure 5 depicts classification results for the 7 participants who performed phoneme production as an increasing number of voxels were used for classification. Maximal classification rates were obtained using between 50 and 900 voxels. To identify whether the activation of some phonemes were confused consistently, we plotted the classification results as confusion matrices. Seen in figure 6 are confusion matrices for the classification of all data sets where ideal classification is represented by a perfect diagonal where each prototype is classified for all ten trials of each phoneme.

Reducing the number of phonemes to be classified to three led to a classification accuracy ranging from 77.5% to 34% with a chance-level of 33.3% (see table 3). The average maximal accuracy was between 47.5% and 55.5% for the four possible combinations of three phonemes to be classified, while the average maximal classification rate over all participants and combinations was 50.75% (STD 3.47%). Reducing the number of phonemes to be classified further to differentiation between all combinations of two phonemes resulted in classification accuracies ranging from chance-level, at 50%, to 95%. There was at least one pair of phonemes that could be classified with an accuracy of 70% for each of the participants and for five of the participants the accuracy was higher than 80% for at least one pair. Differentiation between phonemes j and l, and l and ee reached a mean accuracy of 72% and 78%, respectively. The average maximal classification was 70.16% (STD 5.68%) for all six possible combinations of two phonemes. For participant 8 the classification of two and three single movements rendered maximal classification of 100% and 97.5%, respectively, for the two training sets.

The results of the center of mass (CoM) calculation can be seen in figure 9 and illustrates the estimated distances between centers of activation for all conditions. The estimated relative location of the voxels for each prototype is clearly variable across participants but on average less than 3 mm for all four phonemes, corresponding to the length of two voxels (1.5 mm isotropic resolution). There is a considerable increase in distance for three of the CoM for single movements (participant 8), namely between jaw and voiced condition; the tongue and voiced condition; and lips and voiced condition.

(18)

Analysis of movement correction parameters revealed that movement between consecutive scans was on average: 0.118 (STD 0.038), 0.187 (STD 0.09), and 0.274 (STD 0.16) mm translation; and 0.005 (STD 0.002), 0.002 (STD 0.001), and 0.003 (STD 0.003) degrees transformation (see table 1). For six of the participants the maximal displacement between the first and last scan was less than 1.5 mm and 1.5 degrees. For three participants the maximal displacement of one dimension in translational or rotational planes was around 2 mm/degrees. The results of re-performing classification by using one of the slow data sets as training set and the other as test set are displayed in table 4. The accuracy for the new classification led to maximal accuracies ranging from 32.5% to 60% with a chance-level of 25%. The average maximal classification accuracy for the 4 participants reached 40% (STD 12.42%).

Table 2 Classification accuracy in percentages for all 8 subjects. Results in parentheses denote performance with

inclusion of a linear regression to the canonical HRF in SPM.

Note: Mean classification represents the performance when 1500 voxels were included in the analysis. The accuracy scores reported for participant 8 correspond to when that set has been used as test set and the other used as training set.

(19)

Towards Decoding Complex Mouth Movements – a High-Field fMRI Study Jakob Andrée

Figure 5 Classification accuracy for each data set and participant as an increasing number of voxels are taken into account. Drawn on the y-axis of each plot is the classification accuracy from 0 to 1. Drawn on the x-axis are number of voxels taken into account, ranging from 4 to 1800.

7

(20)

1 2 2

334 456

77

Figure 6 Confusion matrices from classification of the four phonemes for all data sets where the conditions j, o, l, and ee are plotted from top to bottom on the y-axis and left to right on the x-axis. Each column in each plot represents the correct phoneme to be classified and each row represents the resulting classification. The prototype that represents the most similar match for each trial is conveyed through brighter intensity values. Perfect performance would lead to a diagonal of entirely white cells.

(21)

Figure 8 Correlation matrices for two participants are presented to indicate degree of similarity between the prototypes, datasets, and single trials. Brighter cells denote higher r-values. Seen in figure 8a are the correlation matrices for data from participant 4 and in figure 8b from participant 8. Upper left: Correlation between each of the four prototypes and the other three. Upper right: Correlation between averaged responses for the ten trials of each condition in the test set. Lower left: Correlation between the four prototypes and the averaged responses from the data in the test set. Lower right: Correlation between each of the single trials from the test set and all the other.

b a

(22)

Table 3 Classification accuracy in percentages for all combinations of two and three categories of phonemes for

paticipants 1-7

Note: Mean accuracies in parentheses reflect the averaged maximal classification rate for the participants with two slow data sets.

(23)

1 2 3 4 5 6 7 8 0 2 4 6 8 10 12

Estimated Distance Between Each Phoneme/Movement

J/Jaw vs O/Lips J/Jaw vs L/Tongue J/Jaw vs EE/Voiced O/Lips vs L/Tongue O/Lips vs EE/Voiced L/Tongue vs EE/Voiced Participants D is ta n ce ( m m )

Figure 9 Estimated distances between centers of mass for all phonemes and single movements based on the 150 most significant voxels selected from the test set and used for classification.

Discussion

This study evaluated whether the neuronal activation over primary motor (M1) and sensory (S1) cortex associated with the production of phonemes can be used as a multidimensional control signal of a brain-computer interface (BCI). The main requirement for such a flexible control signal is that its underlying brain states can be reliably differentiated and correctly identified. To explore this issue we performed an fMRI experiment in which the BOLD activation of single trials involving production of four speech sounds were differentiated based on a basic correlation analysis. The single trials were compared to averaged representations of production of each phoneme and the match that ceded the highest correlation was chosen as classification label. While classification of the four phonemes resulted in moderate results, classification of single movements of speech-related muscles was near-perfect. In the following discussion the results of the study are considered in light of our predictions and their theoretical rationale. The potential reasons for the reported difference in performance between distinguishing phonemes and single movements are also discussed. We conclude by outlining the suitability of phoneme-production as a multi-state control signal for BCI, and providing recommendations for the future direction of the project.

(24)

Classification of four phonemes

In relation to classification of the four phonemes, our approach reached an average classification accuracy of 49.5% (chance-level = 25%). Across participants performance ranged from 80% to 35%. The acquired data allowed for individual classification of two separate data sets for four of the participants and the difference in accuracy between the two sets was on average 15.6%, although it reached 37.5% for participant 2. The neural activation associated with each of the four categories was thus quite variable. These classification results were reached by including between 50 and 900 voxels in the comparisons and the classification generally remained stable for each data set until inclusion of 1800 voxels (fig. 5). This is further corroborated by the averaged mean classification of 38.5% over participants when 1500 voxels were included in the analysis (table 2). That the addition of a linear regression, which fitted the data to a canonical haemodynamic response function, did not improve classification performance is an indication that the estimated BOLD response, supplied by the finite-impulse response analysis, was successful in capturing the task-dependent activity for each data set.

While the results we obtained are clearly significant, it is far from what would be required of a control signal that may represent the only manner of communication for the target patient groups of an effective BCI system. A number of additional analyses were therefore performed to investigate why the individual phonemes could not be classified to higher degree.

We first sought to elucidate whether some of the phonemes could be more easily differentiated than others by reducing the number of conditions from four to three. If certain speech sounds would be easier/more difficult to distinguish, then this would be reflected in the classification of combinations of a reduced number of categories. That is, certain combinations of phonemes would lead to better/worse performance than others. While overall performance increased slightly for classification of three speech sounds, average maximal performance for all participants remained essentially unchanged at ~51% (chance-level 33%). The responses associated with each phoneme were thus equally difficult to distinguish at this level and the number of conditions to be classified was further reduced to two phonemes. While at least one combination of two phonemes could be successfully told apart to a rate of 80% or higher for five of the participants, the average classification accuracy remained virtually unchanged at 70% when the chance-level of 50% was taken into account.

These results are interesting since they demonstrate that the limited classification accuracies were not a result of certain speech sounds being consistently misidentified as others. In general, participants with lower accuracies for all four phonemes also showed lower accuracies for distinguishing between two and three phonemes. This relationship is particularly clear for participants 5 and 6 (see table 3) whose performance was barely above chance-level on all three levels of comparison. Performance reached 35% and 47.5%

(25)

(chance-That classification performance did not increase significantly for classification of three instead of four categories, and only moderately for two categories, contrasts with findings from a similar study classifying the BOLD response of hand gestures over homologous areas as in the present study (Bleichner et al., in preparation). In that study, reducing the number of categories to be classified led to robust increases in classification performance; this is to be expected if some of the prototypes are less unique with respect to the others. Our results indicate that the prototype for phoneme j was generally easier and the prototype for o more difficult to classify across participants in comparison to others. This point is reflected in the performance on classification of two and three categories of phonemes for the combinations that included these two speech sounds (table 3) as well as in the confusion matrices generated from the classification of all four phonemes (fig. 6). The confusion matrices clearly indicate that j was mistaken considerably less often with the other phonemes for the majority of participants while the opposite is true for o.

The four phonemes were selected based on the results of a pilot study in which participants produced the Dutch alphabet while their speech-related muscle activity was recorded. The four speech sounds that were the most dissimilar to each other were selected for use in the main study. The chosen phonemes are all voiced, meaning that they are produced with laryngeal activation, and show overlap with respect to the other speech effectors, i.e., unique speech-related muscle groups, they recruit. For example, the phonemes j and l involve tongue movements while o and ee depend on changing the positions of the lips. That all four are voiced is surprising since surface electromyographic (EMG) measurements of the larynx are difficult to perform given that the relevant muscles are embedded relatively deep within the vocal tract. Further, neither of the two electrodes that were placed over this area with the purpose of measuring laryngeal activation was maximally informative (see fig. 1 for selected electrode placements). The selected pair of electrode placements that did result in the most information for differentiation based on EMG activity was placed over muscle groups whose function involved modulating the positions of the lips (Ladefoged, 2005). None of the phonemes is a plosive, consonants that depend on brief closure of the vocal tract and fast dynamics of changes between the positions of lips or jaw, which would hypothetically lead to a more distinct muscle response. The assumption that the classifiability of speech-related muscle activity would be predictive of the ability to classify speech-related haemodynamic activity over M1 and S1 may be questionable. There are no previous studies that have investigated this, to the best of our knowledge, and, indeed, the results from the present study speak against it.

Additionally, there seemed to be a high degree of variability between the data sets for those participants where independent classification was performed on two data sets. The prototypes for each complex movement as well as the averaged responses from the single trials in the test set seemed distinguishable for most participants based on the calculation of correlation matrices (see fig. 8a for a representative example). Nevertheless, the similarity between prototypes and averaged responses from the single trials, and the similarity between single trials from the same category were both low. The BOLD activation was hence highly variable over trials and between data sets.

This begs the question whether the manner in which participants produced the phonemes varied to a high extent. The same speech sound can be produced via many different motor actions and studies of trial-by-trial variability of main speech articulators such as upper lip, lower lip, and jaw, indicate that the motor schemas for language production are not formulated in terms of spatial coordinates but rather focused on the acoustic consequences of

(26)

the movements (Gracco and Abbs, 1986). This would imply that the involvement of different speech muscles may have changed between trials for the participants and therefore led to variations in the distribution of the BOLD response over M1 and S1. Important to note, however, is that the findings mentioned above were from production of whole words and not individual phonemes, as in the present study. That coarticulation of speech sounds are characterized by lack of invariance even within subjects is a well-known phenomenon in psycholinguistics (Harley, 2007). The lack of invariance has been explained by context-induced variation by the surrounding phonemes within a word and differences in speech conditions where, for example, the speech-tempo influences the degree of variance (Appelbaum, 1996). Therefore, one would not expect the lack of invariance to be applicable to the production of single phonemes produced in a repetitive manner. Further, considering that participants were explicitly instructed and underwent a practice session prior to the data collection, the variability of involvement of the articulators was likely relatively restricted. Another potential factor that might have contributed to the limited success of the classification is motion. For three of the participants the displacement between the first and last scans of the session reached 2 mm/degrees. Upon inspection of the applied motion correction, however, it was clear that the movement either took the form of slow, continuous drift throughout the whole session or jumps in-between scanning sequences (i.e., data sets). This motion should therefore have been dealt with during pre-processing and minimally affected classification. Still, smaller speech-related movements present throughout the whole session might have caused displacements that affected the results. These speech-related movements were identified as a potential problem from the outset and while precautions were taken (e.g., explicit instructions to the participants to lie still as well as the use of generous amounts of foam padding to fixate their heads) it cannot be ruled out that residual motion had an impact on the results.

Increasing the signal-to-noise ratio (SNR) of the data acquisition did not result in higher classification accuracy. This was contrary to expectations that a higher SNR would increase the ability to identify task-dependent information conveyed by the BOLD response. This was demonstrated by the classification performance for participants 5, 6, and 7 (table 2), who were scanned using a high-density surface coil (shown to increase SNR), compared to other participants scanned with a 32-channel head coil (Petridou et al., 2012). Although the results obtained using the high-density surface coil are clearly below those obtained with the head coil, there is a potential confound of coil-movement that should be addressed. During experiments the surface coil was placed directly over the homologous cortical area to that scanned with the head coil and fixated with straps and foam padding. Despite this, it is likely that the small, speech-related movements of each participant slightly shifted the positioning of the surface coil, thereby abolishing the effect of increased SNR for classification purposes. The density surface coil is of in-house design and primarily employed for high-resolution scanning of the visual areas in the occipital cortex where the coil is placed under

(27)

Functional somatotopy and classification of single movements

The long-standing issue regarding the degree of fine-scale somatotopy over M1 for within-limb segments remains unresolved despite having been a source of debate for decades. An increasing amount of data, however, supports the view of a functional somatotopy for within-limb segments (Kleinschmidt et al., 1997; Dechent and Frahm, 2003; Plow et al., 2010). This functional somatotopy has been proposed to be organized in such a way that individual muscle groups, such as the muscles controlling the thumb, are mapped over the representation of the upper-limb in an overlapping manner with the maps of adjacent muscle groups, such as those controlling the index finger. This interpretation is further supported by findings from microstimulation, where cortico-motoneuronal (CM) cells projecting to individual muscles have been shown to be widely dispersed over a large area and overlapping with CM cells of adjacent muscle groups (Huntley and Jones, 1991; Donoghue et al., 1992; Rathelot and Strick, 2006).

Given the results of phoneme-classification, we hypothesized that the complex movements elicited overlapping patterns of activation which were too broad to be successfully differentiated from each other. According to reports, gradients in the cortical representation of within-limb segments appear when location markers, such as centers of mass, of the activation of adjacent muscle groups are contrasted (Plow et al., 2010). This led us to suspect that single movements of the speech effectors would facilitate classification performance since the centers of activation for each movement should be further apart.

To test this hypothesis we had one participant (8) perform four single speech-related movements and applied the same analysis for classification. The results were 100% classification accuracy for one data set and 95% for the other (table 2). Results also showed that not only were the prototypes for each movement clearly distinguishable from each other, they were also highly similar across both data sets (fig. 8b). As expected, reducing the number of movements to be classified led to a slight increase in performance for triplets (100% and 97.5% (STD 2.41%), for each data set) and perfect accuracies for all combinations of pairs for both data sets. The data for single movements thus differed greatly from the data for the phonemes.

Calculation of centers of mass for each category led to the realization that activation centers for participant 8 were on average situated at a greater distance from each other than those of the different phonemes (fig. 7). This increase in distance between categories was not uniform, however, and seemed to be the result of one of the movements, laryngeal activation, being situated considerably farther, >7mm, from the other muscle groups. This result is in line with recent studies examining the location of the laryngeal representation of the M1 (Brown et al., 2008; Brown et al., 2009) where its location is reported to lie ventrally of the representation of the lips, which in turn are located ventrally of the jaw and tongue representations. Contrasting findings have been reported, though, where the laryngeal area is surrounded by the representations of the lips and jaw (Simonyan and Horwitz, 2011). Although our results correspond to the findings of Brown and colleagues, it should be emphasized that these data are taken from a single participant, and should therefore be interpreted with caution.

Increased distances between centers of activations for phonemes was not indicative of a higher degree of distinguishability. This can be illustrated by the results for participant 1 whose center of activation for the phoneme ee was situated farthest from the rest yet did not exhibit any increased classifiability. Since the selection of voxels for inclusion in the

(28)

prototypes was unrestricted (discussed further below) these estimates of activation centers for the different phonemes/movements will have resulted from averaging over voxels that were widely distributed over a large portion of the entire left hemisphere. While the largest task-related activation can be assumed to originate from the primary motor and sensory cortices, it cannot be excluded that other language-related foci, for instance Broca’s area of the inferior frontal gyrus, also evinced sizeable activation. This complicates a clear interpretation of the results of the CoM calculation with regard to the discussion of M1 organization. Although the results from the present study must be said to support the idea of functional somatotopy, this was not the aim for which the experiment was designed, and any such conclusions cannot be said to be fully supported.

A phonotopic organization, where each phoneme is represented discretely by different neural ensembles, has been identified for the auditory cortex (Blakely et al., 2008) and been suggested to exist for the M1 as well (Guenther et al., 2006; Pulvermüller et al., 2006). Previous studies seeking to classify phoneme production over M1 have shown a considerable degree of success using either classification of local-field potentials or extracellular action potentials (Guenther et al., 2009; Kellis et al., 2010; Brumberg et al., 2011). These measurements correspond to a spatial resolution several orders of magnitude greater than that possible with high-resolution fMRI, 1.5 mm3 resolution in the case of the present study. It

might therefore be the case that the representation of articulatory gestures of phonemes is too fine-scale to capture in the haemodynamic dimension. An argument against this interpretation comes from a study similar to the present in which researchers (Bleichner et al., in preparation) employed the same fMRI scanner and the same paradigm for decoding complex hand gestures. The study reports that activation was confined to the hand-representation of M1 and that the contribution of single digits for each movement was variable. With single-subject classification performance reaching 95% and averaging at 68%, these findings argue against articulatory gestures being inaccessible with 7T fMRI. The hand movements employed were complex in terms of contribution of several adjacent muscle groups, similar to the phonemes, yet each gesture led to a high degree of identification. We can only speculate as to why this difference should exist between the ability to decode speech and hand gestures. One factor that probably played a part was the overlap in contribution of individual articulators inherent to the production of the four phonemes. As previously mentioned, each phoneme involved laryngeal activation and at least some degree of movement of tongue, lips, and jaw. Therefore, using four phonemes that instead were identified based on the isolated contribution of few and non-overlapping speech muscles might have facilitated decoding. On the other hand, all speech sounds are produced by complex interaction between said muscle groups to a higher or lesser degree, and there are no four that do not share any overlap. Summarily, with the goal of identifying a suitable control signal for BCIs aimed at patient groups, phoneme-production as it is performed in normal speech does not appear characteristic enough to allow for reliable differentiation.

(29)

supervised learning algorithm such as a support-vector machine (SVM) that has been shown to successfully classify phoneme-production over M1 (Brumberg et al., 2011), albeit with data from intracortical measurements, could very well lead to improved performance. Averaging the task-dependent activity leaves correlation analysis vulnerable to the presence of a small number of voxels that cause the means to be unrepresentative of the data as a whole. In the event of even moderate amounts of trial-by-trial variability in the manner of task execution, as discussed above, a technique like SVM that does not assume any specific model of the data for each category, but instead seeks to maximize the margin between the categories, might prove superior.

In a comparison between classifiers reported by Ku and colleagues (2008) on high-field data (7T) from object processing, the two approaches of SVM and correlation analysis performed comparably over the majority of categories to be distinguished. However, a significant improvement was seen for both approaches when outliers in the data were removed prior to feature selection, though the SVM performed better than the correlation analysis on average. Thus, inclusion of outlier removal and comparison of the results between a SVM and the current approach might prove beneficial for increasing classification performance of phoneme-production.

Lastly, the origin of the voxels selected for classification was not constrained in any way in this study unlike in other research that has successfully sought to classify BOLD responses for BCI-purposes (Andersson et al., 2012; Sorger et al., 2012). Limiting the area over which task-dependent voxels are selected by defining a region-of-interest (ROI) centered on the primary motor and sensory cortices, could reduce the influence of language-related but category non-specific activation that otherwise decreases the uniqueness of the prototypes. Incorporating an ROI-analysis would arguably be more informative with respect to what the data can tell us of the organizational principles of the M1 for complex mouth movements.

Conclusion

Based on the findings of the present study, further evaluation of single movements as a potential multidimensional control signal for BCI is recommended. The results indicate that the cortical representations of isolated movements are stable and unique. The limited success of phoneme classification with the current approach does not warrant further data collection. However, employing supervised learning models, such as a support-vector machine, could potentially lead to greater performance in line with recent findings of similar data acquired with intracortical electrodes. An additional approach that should be examined is restricting the analysis to the primary motor and sensory cortices, something that would reveal additional information regarding the representation of phonemes over primary motor cortex.

References

Andersson, P., Ramsey, N. F., Raemaekers, M., Viergever, M. A., and Plum, J. P. W. (2012) Real-time decoding of the direction of covert visuospatial attention. Journal of Neural

(30)

Appelbaum, I. (1996) The lack of invariance problem and the goal of speech perception.

IEEE Spoken Language ICSLP 96 Proceedings 3:1541-1544. DOI:

10.1109/ICSLP.1996.607912.

Blakely, T., Miller, K. J., Rao, R. P. N., Holmes, M. D., and Ojemann, J. G. (2008) Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids. IEEEEngineering in Medicine and Biology Society 4964-4967.

Brown, S., Ngan, E., and Liotti, M. (2008) A larynx area in the human motor cortex. Cerebral

Cortex 18: 837-845.

Brown, S., Laird, A. R., Pfordresher, P. Q., Thelen, S. M., Turkeltaub, P., and Liotti, M. (2009) The somatotopy of speech: phonation and articulation in the human motor cortex.

Brain and Cognition 70: 31-41.

Brumberg, J. S., Wright, J. E., Andreasen, D. S., Guenther, F. H., and Kennedy, P. R. (2011) Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex. Frontiers in Neuroscience 5: (65) DOI: 10.3389/fnins.2011.00065.

Buracas, G. T. and Boynton, G. M. (2002) Efficient design of event-related fMRI experiments using m-sequences. NeuroImage 16: 801-813.

Cho-Hisamoto, Y., Kojima, K., Broen, E. C., Matsuzaki, N., and Asano, E. (2012) Cooing-and babbling-related gamma-oscillations during infancy: Intracranial recording. Epilepsy Cooing-and

Behavior 23: 494-496.

Cramer, S. C., Weiskoff, R. M., Schaechter, J. D., Nelles, G., Foley, M., Finklestein, S. P., and Rosen, B. R. (2002) Motor cortex activation is related to force of squeezing. Human

Brain Mapping 16: 197-205.

Dechent, P. and Frahm, J. (2003) Functional somatotopy of finger representations in human primary motor cortex. Human Brain Mapping 18: 272-283.

Donoghue, J. P., Leibovic, S., and Sanes, J. N. (1992) Organization of the forelimb area in squirrel monkey motor cortex: representation of digit, wrist, and elbow muscles.

Experimental Brain Research 89: 1-19.

Fesl, G., Braun, B., Rau, S., Wiesmann, M., Ruge, M., Bruhns, P., Linn, J., Stephan, T., Ilmberger, J., Tonn, J. C., and Brückmann, H. (2008) Is the center of mass (CoM) a reliable parameter for the localization of brain function in fMRI? European Radiology 18: 1031-1037.

Referenties

GERELATEERDE DOCUMENTEN

Mental health care in general practice in the context of a system reform (prof PFM Verhaak, prof FG Schellevis, prof DH de Bakker, dr DP de Beurs). Zon

In the form of a distinction between the Russian and the American program, the Space Race even provided Catholics with a clear example of how technology and

Zoals eerder gemeld had een aantal leraren liever wat meer vragen over de eindtermen uit het domein Alge- bra gezien en iets minder over het domein Meetkunde. Bij navraag

Synthesis of cRGDfK 4’ and cRADfK 5’ was performed following previously described protocol.. Solvents were evaporated and LCMS confirmed the

It is the quality of marriage based on the relationship between husband and wife, and the relationship between the wife and extended family members, that determine

- In 2015 “practically all” teachers of the participating schools are sufficiently able to provide differentiation in their teaching and deal with individual

Uit de resultaten bleek het verband tussen stress en slaapkwaliteit significant, maar er werd geen significant verband gevonden tussen alcoholgebruik en stress, of alcoholgebruik

There is consensus between the views of the organisation and the views of the respondents as both parties see the purpose of the personal effectiveness appraisal as