• No results found

Computer Methods and Programs in Biomedicine

N/A
N/A
Protected

Academic year: 2021

Share "Computer Methods and Programs in Biomedicine"

Copied!
9
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Citation/Reference Moeyersons J., Elena Smets, John Morales, Amalia Villa, Walter De Raedt, Dries Testelmans, Bertien Buyse, Chris Van Hoof, Rik Willems, Sabine Van Huffel, Carolina Varon (2019),

Artefact Detection and quality assessment of ambulatory ECG Computer methods and programs in biomedicine, 182, 105050.

Archived version The content is identical to the content of the published paper

Published version https://reader.elsevier.com/reader/sd/pii/S0169260719312817?token=E5CF1 EEEC7D3F12ABF79478DF27A6CB62FD0184C656C40C76A4EDFC0C9574 02F7E20DFDFC7552E3C2D40A0D93B57D5FB

Journal homepage https://www.sciencedirect.com/journal/computer-methods-and-programs-in- biomedicine

Author contact Jonathan.moeyersons@kuleuven.be 0479427028

Abstract

IR Na.

(2)

ContentslistsavailableatScienceDirect

Computer Methods and Programs in Biomedicine

journalhomepage:www.elsevier.com/locate/cmpb

Artefact detection and quality assessment of ambulatory ECG signals

Jonathan Moeyersonsa,, Elena Smetsb, John Moralesa, Amalia Villaa,Walter De Raedtb, Dries Testelmansc, Bertien Buysec, Chris Van Hoofa,b, Rik Willemsd,Sabine Van Huffela, Carolina Varona

a Department of Electrical Engineering, KU Leuven, B-3001 Leuven, Belgium

b imec, B-3001 Leuven, Belgium

c Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, B-3001 Leuven, Belgium

d Department of Cardiovascular Sciences, KU Leuven, B-3001 Leuven, Belgium

a rt i c l e i n f o

Article history:

Received 1 August 2019 Revised 23 August 2019 Accepted 23 August 2019

Keywords:

ECG

Ambulatory monitoring Artefacts

Quality assessment

a b s t r a c t

BackgroundandObjectives:The presenceofnoisesourcescould reducethe diagnosticcapabilityofthe ECGsignal andresult ininappropriatetreatmentdecisions. Tomitigate thisproblem,automated algo- rithmstodetectartefactsandquantifythequality oftherecordedsignalareneeded.Inthisstudywe presentanautomatedmethodforthedetectionofartefactsandquantificationofthesignalquality.The suggestedmethodologyextractsdescriptivefeaturesfromtheautocorrelationfunctionandfeedstheseto aRUSBoost classifier.The posteriorprobabilityofthecleanclassisused tocreateacontinuous signal qualityassessmentindex.Firstly,therobustnessoftheproposedalgorithmisinvestigatedandsecondly, thenovelsignalqualityassessmentindexisevaluated.

Methods:Datawereusedfromthreedifferentstudies:aSleepstudy,thePhysioNet2017Challengeand aStressstudy.Binary labels,cleanorcontaminated,wereavailablefromdifferentannotatorswithex- perienceinECGanalysis.TwotypesofrealisticECGnoisefromtheMIT-BIHNoiseStressTestDatabase (NSTDB)wereaddedtotheSleepstudy totestthequalityindex.Firstly,themodelwastrainedonthe Sleepdatasetandsubsequentlytestedonasubsetoftheothertwodatasets.Secondly,allrecordingcon- ditionsweretakenintoaccountbytrainingthemodelonasubsetderivedfromthethreedatasets.Lastly, theposteriorprobabilitiesofthemodelforthedifferentlevelsofagreementbetweentheannotatorswere compared.

Results: AUC values between 0.988 and 1.000 were obtained when training the model onthe Sleep dataset. Theseresultswerefurther improvedwhentrainingonthe three datasetsandthus takingall recordingconditionsintoaccount.APearsoncorrelationcoefficientof0.8131wasobservedbetweenthe scoreofthecleanclassandthelevelofagreement.Additionally,significantqualitydecreasespernoise levelforbothtypesofaddednoisewereobserved.

Conclusions:ThemainnoveltyofthisstudyisthenewapproachtoECGsignalqualityassessmentbased ontheposteriorcleanclassprobabilityoftheclassifier.

© 2019TheAuthors.PublishedbyElsevierB.V.

ThisisanopenaccessarticleundertheCCBY-NC-NDlicense.

(http://creativecommons.org/licenses/by-nc-nd/4.0/)

Corresponding author.

E-mail addresses: jonathan.moeyersons@esat.kuleuven.be (J. Moeyersons), elena.smets@imec.be (E. Smets), john.morales@esat.kuleuven.be (J. Morales), amalia.villagomez@esat.kuleuven.be (A. Villa), walter.deraedt@imec.be (W. De Raedt), dries.testelmans@uzleuven.be (D. Testelmans), bertien.buyse@uzleuven.be (B. Buyse), chris.vanhoof@imec.be (C. Van Hoof), rik.willems@kuleuven.be (R. Willems), sabine.vanhuffel@esat.kuleuven.be (S. Van Huffel), carolina.varon@esat.kuleuven.be (C. Varon).

1. Introduction

Electrocardiogram(ECG) isone of theprimary screening tools ofthecardiologist. Itmeasures theelectrical activityofthe heart duringa shortperiodoftimeto obtainabetter understandingof its functioning. Duringthis screening,the subjectis requested to laystillinasupinepositiontoavoidpossiblesignaldistortions.

Despitehighdetectionrates,someproblemsmightremainun- noticeddueto thelimitedintermittentsamplingandlack ofcar- diacstressduringthemeasurement.Therefore,thepatientisoften equippedwithanambulatorymonitoringdevice[1].

https://doi.org/10.1016/j.cmpb.2019.105050

0169-2607/© 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license. ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

(3)

2 J. Moeyersons, E. Smets and J. Morales et al. / Computer Methods and Programs in Biomedicine 182 (2019) 105050

Ambulatory devices allow patients to be monitored during dailylife,insteadofina protectedhospitalenvironment.This re- sultsin a drastic increase of thedetection window and, thereby, the likelihood of identifying dysfunctions. However, taking the recordingprocedure outofthehospital increasestheexposureto noise.

Noisecanoriginatefromavarietyofsources,suchasthepower line,muscle activityor electrode movement.All of thesesources affect the recording in a different way. For instance, muscle ac- tivity and power line interference respectively cause abrupt and continuousalterations of the signal. All these alterations can be categorized as artefacts. Since artefacts could profoundly reduce the diagnostic quality of the recording, it is important to deal with them accordingly. Two distinct approaches can be used to handleartefacts:ECG denoisingandassessmentoftheacceptabil- ity of the recorded ECG [2]. This study will focus on the latter approach.

Silva et al. stated that: ”Rigorous quality control is essential foraccurate diagnosis, since alterationsduringthe actual record- ingmightresultininappropriatetreatmentdecisions”[3].Tomiti- gatethisproblem,automatedalgorithmsareneededtodetectarte- factsand to quantifythe quality ofthe recorded signal [4]. Two closelyrelatedapproaches canbe distinguished toassess the ac- ceptabilityoftherecorded ECG:detectionandquantification.The firstapproachconsists ofasimple binary,cleanorcontaminated, classification. One example of where this might be used is at the front end ofthe acquisition process. The user could be pro- videdwith rapid, binaryfeedback and, if required,make adjust- mentsintherecordingset-upor,worstcasescenario,re-startthe recording.

A variety of signal quality indices and algorithms were pro- posed in line with this approach as a result of the Phys- ionet/ComputinginCardiology(CinC)challengeof2011[5–8].The challenge aimed atencouraging the developmentof software for mobilephonesbyrecordinganECGandprovidingusefulfeedback about its quality [3]. The best performing algorithm was devel- opedbyXiaetal.[5].Inlinewithotherproposed methods,their algorithmconsists of multiplestages, namely:flat line detection, missing channel identification and auto- and cross-correlation thresholding.Amplitudefeaturessuchas,minimumandmaximum amplitudeor range and differences with previous samples were also frequently used. However, due to different saturation levels betweenrecordingdevices,thesefeatures restrictthe usability of thealgorithms.

Therationaleforthesecondapproach,qualityquantification,is thatdifferent studyobjectivesrequiredifferentquality levels. For instance, heart rate variability (HRV) studies do not require the same,high,quality levelasT-wavealternans orbeatclassification studies.This prompteda numberof authors to propose a multi- level quantification of the signal quality [9–11]. For example, Li etal. proposed a five level signal quality classification algorithm whichdivided thesignalsinfivebins: clean,minornoise,moder- atenoise,severenoiseandextremenoise[11].

In this study, we propose a novel method for the detection ofartefacts and quantification of thesignal quality.The autocor- relation function (ACF) is used to characterize the ECG signal, sinceit facilitates the separation ofclean andcontaminated seg- ments [12]. From the ACF, descriptive features are extracted and fedtoaRUSBoostclassifier.Weproposetousetheposteriorprob- ability of the clean class as an indication of the quality of the signal.

The first objective of this studyis to evaluate the robustness ofthe proposed artefact detection method by training it on one datasetandtestingitontwoindependentdatasets.Thesecondob- jectiveisto evaluatethenovel signalquality indexbycomparing ittotheagreementoftheannotatorsandthelevelofnoise.

2. Methods 2.1. Data

ECG recordings were used from three different studies: (1) a sleepstudyconductedbytheUniversityhospitalsLeuven(UZLeu- ven), Belgium, (2) the PhysioNet/Computing in Cardiology (CinC) Challengeof2017and(3)astressstudyconductedbyimec.

2.1.1. Sleepdataset

Thisdatasetwasintended forsleepapnea classification,butit hasbeenpreviouslyusedforartefactdetectionin[12].Allsignals were reviewed for artefacts and annotated by a medical doctor withexperiencein interpretingpolysomnographic signals,includ- ingECG.

ThedatasetwasrecordedinthesleeplaboratoryoftheUZLeu- venand consistsof16 single-lead(leadII)ECG recordings, origi- natingfrom16differentpatients.Atotalof152hoursand12min- utesofsignalwasacquiredwithasamplingfrequencyof200Hz.

In sleep apnea research, it is common practice to first divide therecordings inone minutesegments andsubsequentlyanalyse eachrecordingonaminute-by-minute basis[12].Implementation ofthisprocedureresultedinatotalof9132one-minutesegments, whichincluded295,or3.2%,contaminatedsegments.

2.1.2. CinCdataset

ThePhysioNet/Computing inCardiologyChallengeof2017was intended for the differentiation of atrial fibrillation (AF) from noise,normalandotherrhythmsinshorttermECGrecordings.All recordings,lastingfrom9to60seconds,weresampledat300Hz andstoredas16-bitfileswithabandwidthof0.5–40Hz.

Intotal,thedatasetconsistsof12,186ECGrecordings,ofwhich 8528recordingswereprovidedfortraining.Onlythenormalrythm andnoisyclasswereselectedforthispaper.Theyaccountedfora totalof5334recordings.Amoredetaileddescriptionofthedataset canbefoundin[13].

2.1.3. Stressdataset

This dataset was recorded to detect and quantify stress in dailylife usingdifferentphysiologicalmetrics,includingECG. The single-lead (lead II) ECG signals were recorded with the Health Patch(imec,Leuven,Belgium),whichhasasamplingfrequencyof 256Hz.Thedevicewaspreviouslyusedinanotherstressstudyto measureECGandacceleration[14].

The subjects were instructed to wear this patch forfive con- secutivedaysandtoremoveitonlywhileworkingout.Theywere alsoprovided withawaterproof covertoprotect thepatchwhile showering.One day of one subjectwas selectedto test thepro- posedalgorithm.

ThisECGsignalwasdividedinsegmentsof30seconds,result- inginatotalof2879segments.

2.2. (Re)labellingofthedata

DuringtheCinCchallenge,participantsnotedthatsomerecord- ingsofthenon-noisyclasseswereactually verynoisy[13].There- fore,theorganizersdecidedtore-checkallrecordingsandprovide new labels ifnecessary. However, despite these adjustments, the organizersofthe CinCchallengestatedthat: ”Thekappa value,κ,

betweenmanydataremainedlowevenafterrelabelling,indicating thatthetrainingdatacouldbeimproved.”[13].Additionally,nola- bellingoftheStressdatasetwaspresentpriortothisstudy.There- fore,itwasoptedtohaveourown(re)labellingprocedureforthe CinCandStressdatasets.

Eachsegmentwas(re)labelledby fourindependentannotators accordingtothefollowingrule:

(4)

Fig. 1. Comparison between a clean (top) and a contaminated (bottom) segment. A clear difference between the shape of both ACF’s can be observed.

”If theannotator is ableto confidentlydistinguishall R-peaksin thesegment,thenthesegmentislabelledasclean.Otherwise,itis labelledascontaminated.”

All annotators are engineersworking in the signal processing domainwithvaryingexperienceinECGanalysis.Themostexperi- encedannotatorhassevenyearsofexperienceandtheleastexpe- rienced hasaboutone year.The annotatorshadno knowledgeof theoriginallabels.

2.3. Pre-processing

EachECGsegmentwasfilteredbymeansofzerophase,2ndor- der high- and4th orderlow-pass Butterworthfilters withcut-off frequencies at1 Hz and 40 Hz,respectively. These filters ensure theremovalofbaselinewanderandhighfrequencynoise,without altering the information contained in the characteristicwaves of theECG.

2.4. Featureextraction

First, we segmentedthe filteredECG segments intoepochs of five seconds,with80% overlap.Thiswindow widthwasselected, sincein[12],itprovidedthebestresults.

Then, each fiveseconds epoch is characterized by its ACF. An exampleofthedifferencebetweentheACF’sofacleanandacon- taminatedsegmentisshowninFig.1.Themaximumtimelagwas restricted to 250 ms to ensure inclusion of the different wave- forms,withoutincludingconsecutiveheartbeats.

Based on previous work, we selected three features fromthe ACF’stocharacterizetheECGsegments[15]:

2.4.1. First(local)minimum

WeassumethattheshapeoftheACFduringthefirsttimelags isprimarilydefinedbytheshiftoftheR-peaktowardstheS-wave.

Following this assumption, the first (local) minimum of the ACF should coincidewiththeshift oftheR-peaktowards thedeepest pointoftheS-wave.

Asafirststep,thefirst(local)minimumwasselectedinevery ACFderived fromthe fivesecondsepochs. Afterwardsthe overall minimumofthewholesegmentwascomputed,whichresultsina singlevaluepersegment.

Extensive lengthening of this interval could be an indication of a flat line, while extensive shortening could relate to high- frequencyartefacts.

2.4.2. Maximumamplitudeat35ms

Thesecondfeatureishighlyrelatedtothefirstfeature,sinceit isameasureoftheamplitudeofanestimationofthefirst(local) minimuminanaveragecleansegment.

Toderiveone valueforthewholesegment,asimilar approach asthefirstfeaturewasapplied.First,theamplitudeat35mswas selectedineveryACFderivedfromthefivesecondsepochs.After- wards, theoverall maximumof theseamplitudes wascomputed, whichresultsinonevaluepersegment.

Highvaluesofthisfeaturecouldindicateaflatlineoratechni- calartefactwithahighamplitudeandlargewidth.

2.4.3. Similarity

The third feature,was selected based onthe assumption that ECG signals do not have abrupt alterations between consecutive small windows. To detect these alterations, we crafted a feature that represents the similarity between the different five seconds epochs.

Itisdefinedbycomputingthemaximumeuclideandistancebe- tweenallACF’s,computedfromthefivesecondsepochs,inatime lagintervalbetween30msand115ms.Thesevalueswereempir- icallydetermined.

Thesteepness ofthefirst downwardslopeisan indicationfor erroneoussegments. However, itis alreadytakeninto account in thefirsttwofeatures.Hencewedidnotincludethefirst30msfor thisfeature.115mswasselectedasanendpointforthesimilarity featuresinceweobservednoaddedvalueforanextendedwindow length.

The larger the distance, the less similar the ACF’s inthat in- tervalandthemorelikelythatthesegmentcontainsabruptalter- ations.

2.5.Classification

Instead of randomly selecting the training set, we used the fixed-sizealgorithm [16].Thisalgorithm maximizesthe Renyien- tropy ofthetrainingset, such thatthe underlyingdistribution of theentire datasetis approximated. Asa result, themain charac- teristics ofthe whole datasetare represented inthe trainingset, whichconsistsof70%ofthewhole dataset.Thisalgorithmisap- pliedonthethreedatasetsseparatelytoderivethreetrainingand testsets.

Thefirstobjectiveofthisstudyistotesttherobustnessofthe proposedalgorithm.Hence,onedatasetwasusedtotraintheclas- sifierand the other two were used for independent testing. The Sleep dataset was used for the initial classifier training, since a goldstandardwasprovidedbyamedicaldoctor.

Thepercentageofcontaminated segmentsoftheSleepdataset issubstantially lower compared to the cleanpercentage, 3.2% vs.

96.8%.Thisclassimbalancechallengestraditionalclassificational- gorithms, sincethe majorityclass mightbe favoured [17]. Inthis study, we used the classification algorithm proposed by Seiffert etal.,whichcombinesrandomundersampling(RUS)andboosting into a hybrid,ensemble classification algorithm named RUSBoost [18].Previousresearchhasshowngreatreferenceresultsincaseof classimbalance[17].

Decision trees were selected as weak learners, since they are well-suited forensembleclassifiers,such astheone we use[19]. EachdecisiontreewastrainedwiththeCARTalgorithmanddeep treeswereusedwithaminimalleafsizeoffive.Thelearningrate oftheensemblewassetat0.1,whichrequiresmorelearningiter- ations,butoftenachievesbetteraccuracy.

(5)

4 J. Moeyersons, E. Smets and J. Morales et al. / Computer Methods and Programs in Biomedicine 182 (2019) 105050

Inadditiontotheselectionoftherightweaklearner,oneofthe mostimportanthyper-parametersisthenumberofweak learners oftheensemble.We useda standard10-fold cross-validationap- proachtoselectthenumberofweaklearners.Themaximumnum- berofweaklearnerswasfixedat500andthemeansquaredclas- sificationerrorofthefoldswasusedasdecisivemetric.

Toevaluatetherobustnessoftheproposedapproach,wecom- pared the classification performance of two models. The first model is trained on the Sleep dataset and the second model is trainedon theentiredataset. Additionally,to illustrate theeffec- tivenessoftheproposedalgorithm,theperformance oftwoother algorithms,aheuristic[20]andamachinelearningalgorithm[21], werealsoevaluatedandcomparedwiththeresultsobtainedwith ourmodels.

2.6.Qualityindicator

RUSBoost isaprobabilistic classifier,whichmeansthat itdoes notonly output themostprobableclass, butalsothe probability foreach class. Hence, besides providing a clean orcontaminated label, alsothe probability of each class isprovided based on the relativeweightacrossthesetofdecisiontrees.

For each sample, each class and each decision tree,this rela- tiveweightisobtainedbydividingthenumberoftrainingsamples ofeach class by the total number oftraining samples inthe se- lected leaf.These weights are averaged over all decision trees in theensembletoobtaintheoverallclassweight.Whenwenormal- izethese weights, we obtain a probability foreach class ranging between0and1.

Mounce et al.used the class probabilities to produce a more finegrainedandfocussedassessmentoftheriskofironfailure in drinkingwater[17].Inthisstudywe propose asimilar approach, butapplied tothe levelofquality ofECG signals.Theprobability ofthecleanclasswastransformedtoaqualityscorerangingfrom 0%to 100%,orinother words,fromtoocontaminated toprocess toperfectlyclean.

The obtained score istaken asthe quality level ofthe signal.

Since this score reflects the certainty of the algorithm, it is ex- pectedthatitcloselyrelatestothecertainty/agreementofthean- notators.

Toevaluatetheproposed algorithm,wecreatedadatasetwith known SNR levels. We corrupted the clean signals of the Sleep datasetwithrecordedartefactsfromthePhysioNetnoisestresstest database (NSTDB) [22]. This database contains samples of three typesofnoise: electrodemotion(EM),baseline wander(BW) and muscle artefact (MA). Only EM and MA were considered, since baselinewanderisusuallynotacauseforerroneousR-peakdetec- tion. Contaminatedsignals were generated at five contamination levels,accordingto[11].

To enlargetheamountofsegments, we dividedthe cleansig- nalsandartefact samplesin10 seconds.Foreachclean ECGseg- ment,anartefactsegmentwasrandomlyselectedandacalibrated amountwasaddedtothecleanone.TheSNRoftheresultingsig- nals wasdefinedas describedin [21].An example ofthe result- ingsignalsisshowninFig.2.Theobtainedresultswerecompared withthe modulation spectrum-based ECG quality index (MS-QI), describedin[23].

2.7.Performancemetrics

Toquantifytheagreementbetweenthedifferentannotators,we computedtwodifferentmetrics: (1)thepercentageofagreement and(2)theFleiss’kappa.

Fig. 2. Impact of Electrode Motion on ECG signal quality. EM: Electrode Motion noise, Level 0: Clean ECG signal, Level 1: Minor contamination, Level 2: Moder- ate contamination, Level 3: Severe contamination, Level 4: Extreme contamination.

The quality of the ECG signal decreases with the increase of electrode motion (EM) noise increases.

2.7.1. Percentageofagreement

All segments wereclassified intothree classes:‘Perfect agree- ment’, when all four annotators agreed, ‘Moderate agreement’, whenthreeannotatorsagreed,or‘Disagreement’,whennomajor- ityvotingwaspossible.

2.7.2. Fleiss’kappa

One of the drawbacks of the percentage of agreement is the lack ofcorrection dueto chance. The Fleiss’kappa is a relatively simple,yetpowerful,metricthatconsidersthepossibilitythatthe agreement has occurred by chance [24]. Kappa values can vary between-1and1,butarebetween0and1iftheobservedagree- ment is due to more than chance alone. The closer to 1, the strongertheagreement[25].

To evaluate the classification model on the two independent datasets,we computedfourdifferentperformance metrics: sensi- tivity(Se),whichistheproportionofcleansignalsthathavebeen correctlyclassified;specificity,wichistheproportionofcontami- nated signalsthat havebeencorrectlyclassified;negativepredic- tive value (NPV), which is the proportion of correctly classified contaminatedsignalsoverallsignalslabeledascontaminated;and the area under the ROC curve (AUC). The latter is more suitable thantheoverallaccuracy, sinceitisnotsensitivetotheclassim- balance[18].

The aforementioned metrics are applicable when the ground truthisclearlypredefinedorwhenallannotatorsagree.However, for the labelling of artefacts, it might be possible that different annotators havedifferentopinions,andsubsquentlyassign differ- entlabels forthesame segment.Therefore,we assessed theper- formance by the weighted sensitivity (wSe), weighted specificity (wSp), and accordingly, the weighted AUC (wAUC), as proposed by Ansarietal.[26].In contrasttomajorityvoting, thesemetrics takeallraters’opinionsintoaccount.Thus,alsotheminorityvotes, whichwouldbe ignored whenusingmajorityvoting, are usedto

Referenties

GERELATEERDE DOCUMENTEN

B Children with school phobia need help to overcome their fears.. C School phobia can be caused by schools giving too

Examining oral reading fluency among rural Grade 5 English Second Language (ESL) learners in South Africa: An analysis of NEEDU 2013.. Corresponding author: Kim Draper, Centre

Maar vanwege de grote onzekerheid over de effectiviteit op de lange termijn moet deze behandeling niet worden verbreed naar andere patiëntengroepen.. Het Zorginstituut adviseert

➤ With all other factors being equal (availability of oxygen, fine ore content of feed material and temperature, etc.), a furnace operating with an acid slag regime generates a

Maar, en ineens keek hij weer zuinig, de groep moest dan wel om negen uur komen, niet later.. De volgende avond stonden ze op de stoep, om negen uur, de groep

The impact on stock return is estimated in the form of Cumulative Abnormal Returns (CARs) to analyse the short-term effect of M&A around the announcement date in three

laser energy penetrates deeper within the material as laser pulse duration increases, thereby increasing the local interacting volume in the irradiated sample, (b) larger