• No results found

Empirical research methods for technology validation: Scaling up to practice

N/A
N/A
Protected

Academic year: 2021

Share "Empirical research methods for technology validation: Scaling up to practice"

Copied!
13
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

ContentslistsavailableatScienceDirect

The

Journal

of

Systems

and

Software

jo u r n al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s

Empirical

research

methods

for

technology

validation:

Scaling

up

to

practice

Roel

Wieringa

DepartmentofElectricalEngineering,Mathematics,andComputerScience,UniversityofTwente,TheNetherlands1

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received30November2012 Receivedinrevisedform 11November2013 Accepted16November2013 Availableonlinexxx Keywords:

Empiricalresearchmethodology Technologyvalidation Scalinguptopractice

a

b

s

t

r

a

c

t

Beforetechnologyistransferredtothemarket,itmustbevalidatedempiricallybysimulatingfuture prac-ticaluseofthetechnology.Technologyprototypesarefirstinvestigatedinsimplifiedcontexts,andthese simulationsarescaleduptoconditionsofpracticestepbystepasmorebecomesknownaboutthe tech-nology.Thispaperdiscussesempiricalresearchmethodsforscalingupnewrequirementsengineering (RE)technology.

Whenscalinguptopractice,researcherswanttogeneralizefromvalidationstudiestofuturepractice. Ananalysisofscalinguptechnologyindrugresearchrevealstwowaystogeneralize,namely induc-tivegeneralizationusingstatisticalinferencefromsamples,andanalogicgeneralizationusingsimilarity betweencases.Botharesupportedbyabductiveinferenceusingmechanisticexplanationsofphenomena observedinthesimulations.IllustrationsoftheseinferencesbothindrugresearchandempiricalRE researcharegiven.Next,fourkindsofmethodsforempiricalREtechnologyvalidationaregiven,namely expertopinion,single-casemechanismexperiments,technicalactionresearchandstatistical difference-makingexperiments.AseriesofexamplesfromempiricalREwillillustratetheuseofthesemethods,and theroleofinductivegeneralization,analogicgeneralization,andabductiveinferenceinthem.Finally, thefourkindsofempiricalvalidationmethodsarecomparedwithlistsofvalidationmethodsknown fromempiricalsoftwareengineering.Thelistsarecombinedtogiveanoverviewofsomeofthemethods, instrumentsanddataanalysistechniquesthatmaybeusedinempiricalRE.

©2013ElsevierInc.Allrightsreserved.

1. Introduction

Empiricalassessmentoftechnologycomesintwoflavors,which inthispaperwillbecalledtechnologyvalidationandtechnology evaluation,respectively.Technologyvalidationisdefined hereas theassessmentofasimulationofthetechnologyinasimulation ofitsintendedcontextofuse,inordertopredictwhatwould hap-penifthetechnologywereactuallyusedbystakeholdersinthis intendedcontext.Wetaketheterm“simulation”inaverywide senseastherepresentationof thefunctioningofonesystemor processbymeansofthefunctioningofanother.2Forexample,a

newrequirementsprioritizationtechniquemaybetestedby exper-imentingwithitinaclassroom.Thisisavalidationiftheclassroom experimentrepresentssomeaspectsofwhatwouldhappenifthe techniquewasusedinpractice.

Validationalwaysinvolvesscalinguptopractice,whichmeans that successive tests take place under increasingly realistic

∗ Tel.:+31534894189.

E-mailaddress:r.j.wieringa@utwente.nl 1http://www.cs.utwente.nl/roelw. 2http://www.merriam-webster.com.

conditions.Forexample,theinventorofarequirements prioritiza-tiontechniquemayusethistechniqueinareal-worldproject.This validationwouldresemblereal-worlduseofthetechniquemore thana classroomexperiment,exceptthatit isstilltheinventer herselfwhousesthetechnique.

Atechnologyhasbeentransferredtopracticeifithasbeen pack-aged,marketed,distributed,soldorotherwisemadeavailableto users,andisnowbeingusedindependentlyfromthecontextin whichitwasinventedortested.Aftertransfertopracticeother peo-plethanitsinventorsareusingit,andtheyareusingittoachieve theirowngoals,withouthelporotherkindofinterventionfromits inventors,andafterinvestmentoftheirowntimeand/ormoneyto learntousethetechnology.

Technologyvalidationistobecontrastedwithtechnology eval-uation,definedhereastheempiricalassessmentofatechnology asandwhenusedinpractice.Forexample,anREresearchermay studyhowaprioritizationtechniqueisusedinreal-worldprojects bymeansofobservationalcasestudies.Whereavalidationstudy aimstomakepredictions,basedonsimulations,abouthowa tech-nologywouldperformiftransferredtopractice,anevaluationstudy assesseswhathashappenedintheactualuseofthetechnology after ithas beentransferred inpractice. Thisfollows terminol-ogycommonlyusedinthesocialsciences, whereanevaluation studyisanempiricalassessmentofsomesocialinterventionthat 0164-1212/$–seefrontmatter©2013ElsevierInc.Allrightsreserved.

(2)

Fig.1.ThenewdrugdevelopmentandreviewprocessoftheU.S.FoodandDrug Administration.

hasbeen performed, suchas a recently implemented teaching methodinschools, toinvestigateitsimpactin practice(Babbie, 2007).

Technologyvalidationisaprocessofscalinguptopracticeinall engineeringsciences.Forexample,theinventorsofthejetengine validatedtheirdesignsbybuildingincreasinglyrealisticprototypes andtestingtheminincreasinglyrealisticenvironments(Constant, 1980).In this paper Iwill summarizeand analyzethe ways in whichwecanscalerequirementsengineering(RE)technologyup topractice.

TheREtechnologybeingvalidatedcouldbetechniques, meth-ods, notations, algorithms, etc. used for various requirements engineeringtaskssuchasrequirementselicitation,goalanalysis, requirementsspecification,requirementsprioritization, traceabil-itymanagement,requirementsmaintenance,etc.Requirementsin this paperaredefined asdesired properties ofa system.Goals, bycontrast,arestatesoftheworlddesiredbystakeholders,and for which the stakeholders have committed a budget (time or money)toachievethem.Allrequirementsaregoalsbecausethey aredesiredbystakeholders,andstakeholdershavecommitteda budgettoachievethem.Butnotallstakeholdergoalsaresystem requirements.Stakeholdershavemanygoalsnotstatedintermsof desiredsystempropertiesatall.

DavisandHickey(2004)proposedusingthemethodologyof NewDrugDevelopmentforscalingupREtechnology.Iwill pur-suethis ideafurtherinSection2 andfocus inparticularonthe inferencesusedinNewDrugDevelopmentwhengeneralizingfrom theobjectofvalidationresearchtoinstancesofreal-worlduseof thetechnology,andshowthattheseinferencescanbeusedinRE researchtoo.In Section3, Ipresent fourmethods for empirical technologyvalidation, and show how thegeneralization meth-odsidentifiedinSection2canbeusedinthem.Thisisillustrated bya seriesof examplesfromempirical requirements engineer-ing.Finally,inSection4,Ireviewtheempiricalsoftwarevalidation methodsidentified byZelkowitzand Wallace(1997, 1998) and byGlassetal.(2001)andshowhowtheyfitintotheframework presentedinthispaper,andaddalistofexamplesoftechniques for measurement and data analysis used in empirical software engineering.Section5endsthepaperwithabriefsummaryand outlook.

2. Scalingup

2.1. Scalingupindrugresearch

DavisandHickey(2004)werethefirsttoapplytheNewDrug DevelopmentandReviewProcessoftheU.S.FoodandDrug Admin-istration toREtechnologytransfer. Isummarize theprocess in Fig.1.Thefollowingdescriptionisbasedoninformationprovided

bytheFDA3andtheexplanationsgivenbyDavisandHickey(2004),

Cowan(2002)and MolzonandPharm(2005).Myanalysisgoes beyondthatofDavisandHickeybyanalyzingthethreekindsof inferenceusedinthisprocess.Iwillindicatetheanalogyofeach stageoftheNewDrugDevelopmentprocesswithastageinscaling upREtechnology.

2.1.1. Pre-clinicalresearch

Pre-clinicalresearchistheexplorationandvalidationofdrugs beforetestingitonpeople.Itconsistsofasynthesisandpurification task,andoftestingthedrugonanimals.

Insynthesisandpurification,achemicalisidentifiedinthe lab-oratoryasapotentiallyeffectivedrug,basedonearlierexperience reportedintheliterature,biochemicalknowledge,andknowledge ofthehumanbody.Atheoryaboutwhythiscouldbeaneffective drugtoimproveamedicalconditionispostulated.Thisistheinitial versionofadesigntheorythatwillbetestedandelaboratedinthe followingstagesoftheNewDrugDevelopmentprocess.It corre-spondsinREresearchtotheinitialideaforanewREtechniqueand theinitialjustificationthatthisideamightworktosolvesomeRE problem.Therestoftheprocessaimstovalidateandelaboratethis designtheoryinincreasinglyrealisticcontexts.

Animaltestsaredonetoshowthatthedrugwouldbesafeto usein peopleand toinvestigateinmore detailthebiochemical mechanismsthatproducethedrug’seffects.Ifthereareno nega-tiveeffectsintheinvestigatedcontexts(i.e.inanimals),andifthe mechanismsfoundinthesecontextsareexpectedtobesimilarto thoseinhumanbodies,thenthisisevidencethatthedrugis prob-ablysafeforhumans,andarequestissubmittedtoinstitutional reviewboardsforpermissiontotestthedruginhumans.Usually twodifferentanimalspeciesaretaken,becauseadrugusuallyhas differenteffectsindifferentspecies(Cowan,2002).Shortterm test-inginanimalscantakeuptothreeyearsbutontheaveragetakes 18months.

Theanimalsareusedfortestingasnaturalmodelsoftheintended real-worldcontextofthedrug,namelythehumanbody.Theanalog inREresearchwouldbetestinganewREtechniqueonstudentsin alaboratory,tostudytheeffectsofthetechniqueandthe mech-anismsbywhichtheseeffectsareproduced.Althoughthegoalof thisresearchwouldnotbetoestablishevidenceforsafetyofthe technique,thepurposewouldstillbetoassesswhetherthebenefit ofusingthetechniqueinpracticewouldoutweighthecostandrisk ofdoingso.

Long-termanimalresearchinvestigatesthelong-termeffectsof usingadrug,andmaycontinueintothepost-availabilitystage.This isanexampleofvalidationresearchthatcontinuesaftertransferof thenewtechnologyintopractice.Long-termanimalresearchhas thelogicofvalidationresearch,asitsimulatestheeffectsofadrug byusingamodel,andisusedtopredictwhatwouldhappento humanbodies.ThiscanbedoneinREresearchtoo.Forexample, theeffectofusingtheUMLonthecomprehensionofprogramscan beinvestigatedinthelaboratory,usingstudentsassubjects,long aftertheUMLhadbeentransferredtopractice.Insightsfromsucha studycouldbeusedtopredicttheeffectofUMLoncomprehension ofprogramsinpractice.

2.1.2. Clinicalresearch

Inclinicalresearch,thedrugistestedonpeople.Itconsistsof threephases.

• Inphase1,randomsamplesof20–80healthysubjectsareused totestthedrug.Thegoalistoinvestigatesideeffectsandthe

3http://www.fda.gov/Drugs/DevelopmentApprovalProcess/ SmallBusinessAssistance/ucm053131.htm.

(3)

so-calledmechanismofactionofthedrug,whichisthe biochem-ical interactionby a which a drug produces pharmacological effects.Ifpossible,theeffectiveness(positiveeffects)ofthedrug isinvestigatedtoo.Thisphasemaylastseveralmonthsandends whenresearchersaresufficientlycertainthatthedrugissafeto useinpatients.

• Inphase2,severalhundredpatientsareusedtoinvestigatethe effectofthedrugincontrolledstudiesbymeansofrandomized controlledtrials.Thegoalistoinvestigatetheeffectivenessin patientswithaspecificdiseaseorcondition,andtoinvestigate short-termside-effectsandidentifypossiblerisks.Phase2may lastseveralmonthstotwoyearsandendswhenknowledgeabout effectiveness,side-effectsandrisksisdeemedsufficientlywell establishedtodolarge-scaletestsinphase3.

• Inphase3,controlledanduncontrolledtrialswithseveral hun-dredtoseveralthousandpatientsaredonetogatheradditional evidenceabouteffectivenessandsafety.Thegoalistofindoutif theeffectivenessandsafetyclaimscanbegeneralizedtothe pop-ulationofallpossiblepatients.Phase3maylastonetofouryears andendswhenresearchersthinktheclaimsabouteffectiveness andsafetyofthedrugcanbegeneralizedtothepopulation.

As we will see below, when validating RE technology, similar researchgoalsexist,andtheyarepursuedjustasindrugresearch byexperimentalresearchinthelabandinthefield.

2.1.3. Post-availabilitystudies

Afteradrughasbeenapprovedandismadeavailabletoa mar-ket,assessmentcontinuesinso-calledphase4studies,forexample bysurveysorobservationalcasestudiesofpatientsusingthedrug. Inourterminology,theseareevaluationstudies.Post-availability studiesaredoneinREresearchtoo,forexamplewhenresearchers investigatetheeffectofusingtheUMLoncodingerrorsin real-worldprojects.

2.2. Designtheories:artifactincontextproduceseffectsby mechanisms

Iwillnowanalyzethelogicofdrugvalidationinmoredetail, withaviewtodrawingconclusionsaboutthelogicofREtechnology validation.Inotherwords,Itreatthedrugdevelopmentprocessasa modeloftheREtechnologytransferprocess,thatwecaninvestigate tolearnsomethingabouttheREtechnologyvalidationprocess,just aswecanuseanimalsasmodelsofpeopletolearnsomethingabout howpeoplerespondtodrugs.Toabstractfromwhetherwetalk aboutdrugsorREtechnology,Iwillcallthetechnologytobescaled upanartifact.

InwhatfollowsIpresentanumberofobservationsoftheprocess describedinSection2.1.Thefirstobservationisthatthevalidation tasksinnewdrugdevelopmentaredividedintothreestages: Con-ceptualvalidation,modeling,andfieldtests.Inconceptualvalidation (correspondingtosynthesisandpurification),anartifactistested by observing itsbehavior in a very artificialcontext suchas a test tube.Mostofthevalidation is doneonpaperand consists ofcomputations,workedexamples,mathematicalproofs,informal argumentstestedoutwithcolleagues,etc.Inthemodelingstage (correspondingtoanimaltesting andphase-1clinicalresearch), theartifactistestedoutonamodel.Indrugdevelopment,theseare animalsfirstandhealthypeoplenext.Thereareimportantethical constraintsinbothkindsoftestsandtheNDAprocessrecognizes theneedforanethicalreviewboardatleastwhentransitioning totestswithpeople.In thefield testingstage(correspondingto phase-2andphase-3clinicalresearch),real-worldcasesareused totesttheartifacton.Thesereal-worldcasesaretreatedasmodels ofarbitrarypopulationelements.

Mysecondobservationofdrugvalidationisthatwhatis val-idatedisnottheartifactbuttheartifactinacontext,e.g.adrug inabody.Validationistheattempttotestthefollowing predic-tion(Wieringa,2009):

[Artifact×Context]willproduceEffects.

Effectsmaychangeifthecontextchanges,andsotheartifact mustbeinvestigatedindifferentcontextsuntilitisclearinwhat rangeofcontextswhatrangeofeffectsisproduced.Forexample, anewtechniqueforelicitingrequirementsmaybetestedonits effectsinsmallprojects,largeprojects,embeddedsystemsprojects, informationsystemsprojectsetc.andbefoundtobeeffectivein somebutnotallofthesecontexts.

Third,whenvalidatinganartifactincontext,researchersshould notonlyaimatidentifyingtheregularproductionofaneffectin certaincontexts,theyshouldalsoaimtoexplainthiseffectinterms ofunderlyingmechanisms.Indrugresearchthesearecalledthe mechanismsofaction.Thistermindicatestheinteractionsbywhich adrugproducesa pharmacologicaleffect,includingthebinding ofthedrugtomoleculartargets,itseffectonthesetargets,and theeffectonbiochemicalpathwaysinthebody.Forexample, caf-feine hasseveral mechanismsof action, two of which are that itantagonizesabiochemicalcompound(adenosine)thatinhibits neurotransmitters,andthatitincreasestheactivityof neurotrans-mitterssuchasdopamine(Nehligetal.,1992).Thesemechanisms explainwhycaffeinehasapsychostimulanteffect.

Theconceptof amechanism ofactionis similartothat ofa principleofoperationusedinengineeringmethodology(Vincenti, 1990),whichisthetop-leveltheoryofthemechanismbywhich anartifactproducesaneffectinacontext.Forexample,the prin-cipleofoperationofanairplaneisthatbytheshapeofitswings, airabovethewingflowsfasterrelativetothewingthanairbelow it,whichaccordingtoBernoulli’sprincipleproducesupwardlift. Butwheretheprincipleofoperationisthehighest-levelviewof howanartifactproducessomeeffectinacontext,amechanism ofactionistheactualrealizationofthisprincipleinthe interac-tionsbetweencomponentsofalow-levelarchitecturesofrealized artifactinarealcontext.Theprincipleofoperationofanairplane explainswhyairplanesfly.Themechanismofactionofa particu-lartypeofairplaneconsistsofthedetailed,low-levelinteractions amongaircraftcomponentsandthesurroundingairthatactually producethecapabilityofthistypeofairplanetofly.The mecha-nismsofactionexploittheBernoulliprinciple,butdoalotmore.

InREtoo,mechanismsofactioncanbeidentifiedthatexplain observedeffects.Forexample,DamianandChisan(2006)describe theintroductionofREtechniquesinanorganizationandidentify cross-disciplinarygroupmeetings,andtheirinteractionwithother partsofthesoftwareengineeringorganization,asamechanismthat causesfewerdefects,lessrework,andimprovedeffortestimates.

Validationresearchthusaimsatmakingpredictionsoftheform [Artifact×Context]willproduceEffectsbyMechanisms.

Wewillcallthisadesigntheory(Wieringaetal.,2011). Whenresearchersdesignandvalidateanartifact,theystartfrom aninitialideaabouttheprincipleofoperationoftheartifact,which isasolutionidea,butnotyetanimplementedandworkingsolution. Thisstatesthetop-levelprincipleofoperation.Whenscalingupan artifacttoconditionsofpractice,thisinitialtheoryistestedand elaborated,untilfinallyastreet-testedarchitecturewith mecha-nismsofactionisdelivered,thatimplementsthetop-levelprinciple ofoperation.

Thetheoryofanimplementedartifactmaybeincompleteabout themechanismsthatproducetheeffects,andintheextremecase betotally silentabout them. For example,engineers mayhave foundwhatthedetailedstructureandtextureofawingsurface

(4)

• Designtheory: [ArtifactX Context]producesEffectsby Mechanisms

• Valuetheory: [EffectsX Stakeholders]producesValuation.

Fig.2.Thestructureofdesigntheoriesandofvaluetheories.

isthatismostconducivetofuelefficiency,withoutunderstanding theprecisemechanismsbywhichthishappens.Ifmechanismsare notunderstood,aslightlydifferentdesignoranexistingdesignin anew,previouslyun-encounteredcontextmayfailforunknown reasons.Forthisreason,inthehealthsciences,evidenceof regu-larityisnotgoodenoughtoclaimregularproductionofaneffect: Knowledgeoftheunderlyingmechanismsisneededtoo(Russoand Williamson,2007).Inengineering,intheabsenceofknowledgeof underlyingmechanisms,safetyrisksaremanagedbytestingdesign changesand sensitivitytocontext onlyin smallsteps(Petroski, 1994).

Afourthobservationofthedrugdevelopmentprocessisthat thereisasecondtheory,thatisstakeholder-related(Wieringaetal., 2011):

[Effects×Stakeholders]willproduceValuation.

Iwillcallthisavaluetheory.Thetheorystatesthatvariouskinds ofstakeholderswhoexperiencetheeffectswillattachapositive, negative,orindifferentvaluetoit.

Thegoalofdrugresearchisnotonlytoidentifyeffectsand mech-anismsofanartifactincontext,butalsotoidentifythevalueofthese effectsforstakeholders.Stakeholderslikesomeeffectsanddislike others.Effectsthatarelikedarecalled“benefits”andeffectsthatare dislikedareoftencalled“side-effects”.Contextpropertiesthattend toproduceeffectsthataredisliked,arecalled“contra-indications” indrugresearch.

Finally,asallscientifictheories,designtheoriesaswellasvalue theoriesarefallible.Theresearcherisnottotallycertainaboutthem andmuststatetheextentofhisorheruncertainty.Theuncertainty withwhicheffects,benefits,costsandside-effectscanbepredicted, arecalled“risks”.

AlloftheseconceptsarerelevantforREresearchtoo.For exam-ple,theuseofmobileREtechnologytoelicitrequirementshasthe benefitthatuserrequirementsmaybemoreconcrete,detailedand completethanispossiblebyotherelicitationtechniques.Butthat isnotcertain,andthisisarisk(ofabenefitnotmaterializing).It mayalsohaveassideeffectthattheusermayhavetheexpectation thateachandeveryneedsheenters,willbesatisfiedinthenear future.Thistooisarisk(ofanegativeoutcome).Also,mobileRE technologymayresultinhugeamountsoftextualandmultimedia datathatmustbeanalyzedmanually,whichisacost.Inshort,when validatingREtechnology,theREresearcherisnotonlyinterested intheeffectsofanartifactincontextandthemechanismsthat pro-duceit,butalsointhebenefit,costandriskofusingthistechnology insomecontexts(Fig.2).Bothkindsoftheoryareimportant,butin therestofthispaper,Iwillfocusonthedevelopmentandvalidation ofdesigntheories.

2.3. Inferencesthatsupportdesigntheories

Lookingoncemoreatthedrugdevelopmentandreviewprocess inFig.1,weseethattwokindsofinferencesareusedtogeneralize fromexperimentstothepopulationofpotentialpatients:Inductive generalizationandanalogicgeneralization.Inductive generaliza-tionisthestatisticalinferencefromasampleoftestsubjectsto thepopulationofsubjects.Analogicgeneralizationistheinference frommodels(suchasanimals,andhealthyvolunteers)topatients. ThisisrepresentedinFig.3,whereinductivegeneralizationisthe horizontaldimensionand analogicgeneralizationisthevertical dimension.

Fig.3.Twokindsofinferenceswhenscalinguptopractice:inductive generaliza-tionfromsamplestopopulation(horizontal)andgeneralizationbyanalogyfrom experimentalcasestoreal-worldcases(vertical).

Wediscussthesegeneralizationsinthenexttwoparagraphs. Next,wediscussathirdkindofinference,calledabductive infer-ence, thatcanbeusedtosupportanalogicaswellasinductive inference.Finally,wecombinetheseinferences intwo kindsof reasoning:

• Incase-basedreasoning,analogicgeneralizationaboutcasesis supportedbyabductiveinference(verticaldimensionofFig.3); • Insample-basedreasoning,inductivegeneralizationabout

sam-plesissupportedbyabductiveinference(horizontaldimension ofFig.3).

2.3.1. Inductivegeneralization

Inductivegeneralizationisthegeneralizationfromsamplesto populationsusingstatisticalinference,suchasstatistical hypoth-esis testing or statistical parameterestimation(Hacking, 2001). Sample sizes in drugresearch start at about 30 elements, and increasetohundredsoreventhousandsofelements.Thelargerthe sample,thelargerthepoweroftheexperimenttodiscernsmall effects.

Icallthiskindofinferenceinductive.Theterm“induction”is given different meaning by different people,but here I follow Douven(2011)incallinganinferenceinductiveifitisbasedpurely onstatisticaldata.Inthecontextofthis paper,thismeansthat inductiveinferenceisstatisticalinference,inwhichsampledata areusedtoestimateastatisticalpopulationparameterortotest astatisticalhypothesis aboutapopulationparameter.Inductive inferenceisthehorizontaldimensioninFig.3.

2.3.2. Generalizationbyanalogy

TheverticaldimensionofFig.3isgeneralizationbyanalogyofthe objectofstudy(OoS)tothereal-worldpopulationunitstowhich theresearcherwishestogeneralize.The objectofstudy hasthe structure

(modeloftheartifact)×(modelofthecontext),

andis theentitystudiedbytheresearcher.SeealsoFig.4.The modeloftheartifactisoftenanartifactprototype,andthemodel ofthecontextcanbeasimulatedcontextinthelaboratory.Infield research,themodel ofthe contextis a real-worldcontext that standsasmodelforotherreal-worldcontexts.Thetreatmentand measurementelementsofFig.4willbediscussedlater.

Generalizationbyanalogyreasonsaboutcases.Forexample,if inoneagiledevelopmentprojectperformedforasmallcompany, wehaveobservedthatthecompanylackedtheresourcestoputa customeron-site,wemayinferthatinsimilarcases,asimilarthing mayhappen.Eachgeneralizationbyanalogyreasonsfromoneor moresimilarsourcecasestooneormoresimilartargetcases.

(5)

Fig.4.Thestructureofvalidationresearch.

This contrasts with inductive generalization, which reasons fromasampleofcasestothepopulationofallcases.Forexample, ifwehaveobservedthatinarandomsampleof100agileprojects performedforsmallcompanies,90%ofthecompaniesdonotputa customeron-site,wemayestimatefromthisaconfidenceinterval fortheproportionofthepopulationinwhichnocustomerisput on-site.

Toaddmoreillustrationswediscusstheroleofanalogic gener-alizationintheNewDrugDevelopmentprocess.Insynthesisand purification,researchersbuildaprototypeofthedrug,interacting withsomebiochemicalprocesses,thathassufficientsimilaritywith processesinthehumanbodytobeabletodrawsomepreliminary conclusionabouttheeffectofthedruginthehumanbody.InRE research,thiswouldbesimilartohand-testinganewtechnique,or formallyprovingsomepropertiestoshowwhatthetechniquecan doinanidealizedcontext.

Next,drugresearchersexperiment withthedruginanimals, usedasnaturalmodelsofthehumanbody.Aspointedoutearlier, thiswouldbeanalogoustotheuseofnewREtechniquesin stu-dentprojects,whicharethenusedasnaturalmodelsofreal-world projectswithpractitioners.Thereisdetailedresearchinsomefields ofdrugresearchtoassess whichanimal speciesarevalid mod-elsofhumanswithrespecttowhichresearchquestions,andfor whichresearchquestionstheyarenot(Willner,1991).Wefindthe samekindof“similarityresearch”inengineeringresearchtoo.For example,tobeabletoreasonfromobservationsofamodelina windtunneltothebehaviorofairfoilsinrealflight,theremustbea theoryofsimilaritybetweenwind-tunnelmodelsandreal-world flight(Vincenti,1990).Tomyknowledge there islittleresearch inthisareainRE,buttherehasbeensomesimilarityresearchin softwareengineeringthatstudieswithrespecttowhichresearch questionsstudentbehaviorinstudentprojectsissimilaror dissim-ilartothebehaviorofprofessionalsoftwareengineersinsoftware projects(Höstet al., 2000;Runeson, 2003;Sjoberg et al.,2003; Svahnbergetal.,2008).

Intheclinicalphase,drugresearchersstartwithhealthy peo-ple,andthencontinuewitheverlargersamplesofpatients.InRE researchthiswouldcorrespondtousingnewtechnologyfirstin pilotprojectsincompanieswithamatureREprocess,continuing witheverlargersamplesofpilotprojectsincompanieswithlow levelsofREmaturity.

Generalizationbyanalogyalsoincludesreasoningbyextreme cases,inwhichonecaseisknowntobesimilartoothercasesin somerelevantaspects,butextremelydifferentinanotheraspect. Forexample,fromtheobservationsthatanREtechniqueiseasy tounderstandandusebyMaster’sstudentsinsoftware engineer-ing,onemight concludethat itwillalsobeeasy tounderstand andusebyexperiencedsoftwareengineers.Master’sstudentsand softwareprofessionalsaresimilarinsomerespects,buttheyare dissimilarintheextentofexperienceinsoftwareengineeringthat theyhave.Studentsareanextremecasew.r.t.extentofexperience. Thetheoryofsimilarityusedtosupportthisanalogicinferenceis thatincreaseinexperienceofotherwisesimilarsubjects,preserves understandabilityandusabilityofatechnique.

In general,generalization by analogy mustbe supportedby sometheoryofsimilaritybetweentheOoSandallpopulation ele-ments,thatexplainswhyaphenomenonobservedinamodelcould leadtoconclusionsaboutpopulationunits.Whattheoryisneeded, dependsonthequestionasked.Studentsmaybegoodmodelsof practitionerswhenvalidatingeffortestimationtechniquesbutnot when validating multi-stakeholderprioritizationtechniques (cf. theexperimentbyHöstetal.,2000).

2.3.3. Abductiveinference

There is a third kind of inference used in drug validation research,calledabductive,andthatcanbeusedtosupportboth inductiveandanalogicgeneralization.Abductiveinferenceis rea-soningfromobservedphenomenatowhatisconsideredthebest explanation ofthephenomena(Douven, 2011).There aremany kindsofabduction,andhereIdefineonekind,thatIcall mech-anistic abduction, in which observed phenomena are explained intermsofcomponent-basedmechanismsthatproducedthem.I defineacomponent-basedmechanism,inturn,asarepeatable pro-cessinwhichsystemcomponentsinteracttorespondtoastimulus. Thisconceptofmechanismisknowninobject-orientedsoftware engineering, where a UML collaboration diagram represents a mechanismconsistingofsoftwareobjectsthatpasseachother mes-sageswhenrespondingtoastimulus(CookandDaniels,1994).But component-basedmechanismscanoccurinanykindofsystem,as wesawwhenwediscussedtheconceptofmechanismofaction ofadrug.Component-basedmechanismsareusedtoexplain bio-logicalphenomenaintermsoftheinteractionsbetweencellsand chemicalsubstances,ortheinteractionsbetweentheorgansofan organism(BechtelandRichardson,2010;BechtelandAbrahamsen, 2005).In thesocial sciences,component-basedmechanismsare usedtoexplainsocialphenomenaintermsofinteractionsbetween people,organizations,institutionsand othersocial systemsand theircomponents(Bunge,2004;HedströmandYlikoski,2010).

InREtoo,component-basedmechanismscanexplaintheeffects ofanREtechnologyintermsofinteractionsbetweencomponents ofasocial,technical,physical,anddigitalsystems.Ialready men-tionedthemechanismidentified byDamianandChisan (2006), bywhichcross-disciplinarygroupmeetings,andtheirinteraction withotherpartsofthesoftwareengineeringorganization,resulted infewerdefects,less rework,andimprovedeffortestimates.As anotherexample,Seyffetal.(2010)identifiedtwomechanismsthat reducetheuseofaudiorecordinginmobileRE:Participantsfelt uncomfortableiftheyvoicerecordedtheirneedsinapublicplace, andpublicplacesoftencontainedtoomuchbackgroundnoiseto dotherecording.

To sum up, abductive inference is the identification of component-basedmechanismsthatexplaineffects.Theycomplete theprediction

[Artifact×Context]willproduceEffects.

withanexplanationofthemechanismsbywhichtheeffectsare produced.Asindicatedearlier,researcherswillnotalwaysbeable toexplainallmechanismsofinteractionbetweenanartifactand aconcrete,practicalcontext.Totheextentthatlessmechanisms areknown,thereislessconfidencethatstatisticalregularitiesin behaviorarestableunderchangesincontext.

2.3.4. Case-basedandsample-basedreasoning

Adding abductive inferences toanalogic and inductive gen-eralization,respectively,wegetcase-basedreasoning(CBR)and sample-basedreasoning(SBR)(Fig.5).

Case-basedreasoningistheverticaldimensionofourdiagram ofscalingup(Fig.3).Itconsistsoftwosteps,namelyabduction and generalizationbyanalogy.In thefirststep, asinglecase is

(6)

Observaons Generalizaons Explanaons CBR: (2) Analogy SBR: (2) Abducon CBR: (1) Abducon SBR: (1) Inducon SBR: (3) Analogy

Fig.5.Reasoningfromobservationstoexplanationsandgeneralizationsin case-basedreasoning(CBR)andinsample-basedreasoning(SBR).

analyzedto identifyan architectureof thecase in terms of its componentsandtheirinteractions,sothatthisarchitecturemay provideanexplanationofobservedeffectsintermsof component-basedmechanisms.Forexample,ifinacaseofagiledevelopment performedbyanindependentdeveloperforasmallcompany,no clientrepresentativeison-site,thenthiscanbeexplainedbythe limitedresourcesofthesmallcompany.Thisisanabductive infer-ence.

Next,inCBRwecangeneralizebyanalogybypostulatingthatin caseswiththesameorasimilararchitecture(independent devel-oper doing agile development for a small company), the same effectswillbeproducedbythesameorsimilarmechanisms.The theoryofsimilaritythatsupportsthegeneralizationhereisthat smallcompanieshavelimitedresourcesandwillprefertotrustthe developerratherthanspendtheirscarceresourcesonagile require-mentsengineering.InCBR,thetheoryofsimilaritythatsupportsthe analogicgeneralizationisstatedintermsofacomponent-based mechanism.

Sample-basedreasoning,thehorizontaldimensionin our dia-gramofscalingup(Fig.3),ismorecomplex.Itconsistsofthreesteps (Fig.5).Fromobservationsofastatisticallymeaningfulsampleof thepopulation(e.g.arandomsampleofatleast30elements),the researcherinfersstatisticallythatthepopulationhassome charac-teristics.Forexample,fromanexperimentwithasampleofstudent projects,theresearchermaybeabletoinferstatisticallythatthere isacorrelationbetweentheuseofsomerequirementsnotation andthequalityofrequirementsspecificationsinthepopulationof studentprojects.Thisinferenceisfallible,becausethesamplemay coincidentallyshowapatternthatisabsentfromalmostallother samplesfromthispopulation.

Second,theresearchermaythenlistallpossiblecausesofsuch acorrelation,andbeabletoarguethatthebestexplanationisthat thatthenotationcausedthedifferenceinquality.Thisisan infer-encetothebestexplanation,i.e.anabduction.Iftheresearchercan explainthepostulatedimpactofnotationonqualitybysome inter-mediatecognitivemechanism,thatispostulatedbyapreviously establishedtheory,thenthisisasecond,mechanisticabduction, thatincreasesthesupportforthefirstone.

Third,theresearchermaywanttogeneralizetheclaimaboutthe populationfurthertosimilarpopulations,byanalogy.For exam-ple,froma statisticalgeneralizationaboutstudentprojects,the researchermaywant togeneralizefurthertothepopulationof allreal-worldprojectswithjuniorsoftwareengineers,andjustify thisgeneralizationbythesimilarityofthearchitectureand mech-anismsofthestudentprojectstothearchitectureandmechanisms ofreal-worldprojectswithjuniorsoftwareengineers.Thisanalogic generalizationtoomaybesupportedbya mechanistic explana-tion,ifthemechanismthatexplainsthephenomenoninstudent projects,canalsoexplainthatphenomenoninprofessional soft-wareengineeringprojects.

Double support for causal claims, in statistical evidence provided by statistical difference-making experiments, and in independently verifiedmechanismsthat can explainthe causal relationshipsinferredfromthestatisticalexperiments,seemsto becommonpracticeinthehealthsciences(RussoandWilliamson, 2007).Thus, sample-based reasoning and case-based reasoning haveausefulsupplementaryrelationship.Afterprovidingsupport foraninductivegeneralizationabouttheeffectofanartifact,the researchermaydosomecasestudies,orsomesingle-case mech-anismexperimentsasdescribedlater,inanattempttofindand understandthemechanismsthatproducesthiseffect.Or,theother wayaround,afterfindingthatamechanismhasproducedaneffect inafewcases,theresearchermaydoastatisticaldifference-making experimenttosupporttheclaimthatthiseffectcanbe general-izedstatisticallytothepopulation.Thus, thetwogeneralization dimensionsinthediagramofscalingup(Fig.3)mustbetraveled together.

2.4. Validityofinferences

Allthreekindsofinferencesdiscussedarefallible,meaningthat theirconclusionscouldbefalseeveniftheirpremisesaretrue.The researchermustthereforespelloutthereasonsthatsupportthe conclusion,andalsosummarizethereasonswhytheconclusions couldbefalseafterall.Thisiscalledadiscussionofvalidityofthe conclusions.Since“validity”suggests“justifiable”oreven“truth”, thistermismisleading.Alessmisleadingtermwouldhavebeen “plausibility”or“support”. However,Iwillsticktotheaccepted terminology.

InTable1wecanseethatthethreekindsofinferences corre-spondtothreewell-knownkindsofvalidity.Conclusionvalidityis thesupportfortheconclusionofastatisticalinference.Threatsto conclusionvalidityincludelowpower,smallsample,non-random sample,non-randomallocation,violationofassumptionsof statis-ticalalgorithms,etc.Notethatevenifconclusionvaliditywouldbe sufficientlywellargued,itstillpossiblethattheexperimentisone ofthe5%experimentsthatshowsastatisticallysignificant differ-encebychance,i.e.withouttherebeingamechanismthatproduces thedifference.

Internal validity is thesupport for an explanation of a phe-nomenonbycausalmechanismsthatproducedthephenomenon.A majorthreattointernalvalidityisthatoutcomesofanexperiment maynotonlybeexplainedbyamechanismthatleadsfrom treat-menttooutcome,butbyothermechanismstoo.Forexample,ifthe OoScontainspeople,thenhistory,maturation,andattritionmay influencetheoutcome,inadditiontotheinfluenceofinstruments, tests,theexperimenter,semanticambiguities,etc.inthe experi-ment(Shadishetal.,2002,page54ff.).Forthereaderofaresearch reporttoassessthesupportfortheabductiveinferencethatthe observedoutcomeisproducedbysomemechanisms,alternative explanationsmustbelistedexplicitly.

ExternalvalidityappearsintwoflavorsinFig.5:incase-based reasoningandinsample-basedreasoning.Incase-basedreasoning, externalvalidity isthevalidityoftheanalogicinferencefroma single-caseexplanationtoallsimilarcases.Forexample,a mecha-nismobservedin[(artifactprototype)×(simulatedcontext)]inthe laboratoryisgeneralizedtoall[artifact×context]casesinthereal world.Insample-basedreasoning,externalvalidityisthevalidityof theanalogyofonepopulationtoanotherpopulation.Forexample, aconclusionaboutthepopulationofstudentprojectsis general-izedtoaconclusionaboutthepopulationofreal-worldprojects.In bothflavors,externalvalidityisthevalidityoftheinferencefrom thestudiedOoStoallsimilarcasesintherealworld.Asobserved byGigerenzer(Gigerenzer,1984),determiningexternalvalidityis anempiricalquestion.Ifconclusionsfromanexperimentin con-textAaregeneralized,fallibly,tocontextB,thenonecantestthis

(7)

Table1

Thethreekindsofinferenceandsomevalidityconsiderations.

Inductiveinference Estimationofapopulationparameter,ordecision aboutastatisticalhypothesisaboutthepopulation, basedonobservationsofasample.

Conclusionvalidity Aretheassumptionsofthestatisticalalgorithmssatisfied? Randomsample?Homogeneoussample?Random allocation?Statisticalpowerandeffectsize?Reliable measures?Logicalerrorsinreasoningfromsample statisticstopopulationhypotheses?Etc. Abductiveinference Explainingaphenomenonbyidentifyingthecausal

mechanismsthatproducedit.

Internalvalidity Aretherealternativeexplanations?Isthereacommon causethatcouldexplainthephenomena?Canthecontext oftheexperiment,thebehavioroftheexperimenter,or phenomenainthesampleofsubjectexplaintheoutcome oftheexperiment?Etc.

Analogicinference Concludingthatatargetwillhavethesameproperties asasource(theexperiment)becauseofsome similaritybetweenthem.

Externalvalidity Isthereatheoryofsimilarity?Doesthetheoryofsimilarity justifytheconclusions?Arethemechanismsinthetarget thesameasthoseinthesource?Arethereother mechanismsthatcouldinterferewiththemechanismof interest?Istheeffectcontext-sensitive?Etc.

generalizationbyrepeatingtheexperimentincontextB.Thisisin

factwhatisdonewhenscalingupfromthelabtotherealworld.

Threatstoexternalvalidityaresensitivityoftheeffectsofan

artifacttothecontextinwhichitisused,dissimilarityofthe

treat-mentusedinthelabtotreatmentsusedinpractice,interference

ofothermechanismswiththemechanismofinterest,absenceof

a theory of similaritythat couldjustify thegeneralization, etc.

Shadishetal.(2002,pages86ff.)andWohlinetal.(2012,page 110)providedetaileddiscussions.

3. Methodsforvalidationresearch

We will discuss theempirical validation methods usingthe structureofFig.4.Wehaveusedthisstructureearliertomakea checklistforempiricalresearchreports(Wieringaetal.,2012).The researcherusesanobjectofstudy(OoS)torepresentelementsof thepopulation,whereinourcasethepopulationelementshavethe structure[artifact×context].Therefore,theOoShasthisstructure too,consistingofamodeloftheartifactandamodelofthecontext. TheOoSisamodelofanarbitrarypopulationelementinthesense thatitissimilartopopulationelements,andcanbestudiedbythe researchertolearnsomethingaboutpopulationelements(Apostel, 1961).AnexamplewouldbeanOoSthatconsistsofaprototypeofa softwareproduct,interactingwithasimulationofaproblem con-text;oranREtechnique(theartifact)interactingwithastudent project(thecontext).

Instatisticalresearch,theresearcherstudiesasampleofOoS’sof statisticallymeaningfulsize.Incaseresearch,theresearcher stud-iesasmallsampleorevenasingleOoS.

Inexperimentalresearch,theresearcherappliesa treatment to an OoS and then measures what happens. In observational research,theresearchermeasurewhathappens,butdoesnotapply a treatment.Measurement aswellastreatmentusuallyrequire instruments.

InthediagramofFig.4,allinteractionsarebidirectional:One cannottreatanOoSwithouttheOoSexertingsomeinfluenceon thetreatmentinstrument,andonecannotmeasureanOoSwithout exertingsomeinfluenceontheOoS.

Theconceptoftreatmentneedssomeexplanation.Sofarwe havetakenacomponent-basedviewoftheworld,inwhichtheworld ismodeledasahierarchyofsystems,thatcontainsubsystems,that containsub-subsystems,etc.Thus,thepopulationconsistsof arti-factsinteractingwitha context,andresearchhasa structureof componentsasshowninFig.4.Inthis view,a treatmentisthe insertionofacomponentinacontext.Forexample,adoctortreats apatient(thecontext)bygiventhemadrug(theartifact),anda consultanthelpsasoftwareengineeringorganization(thecontext) byinsertinganimprovedREtechnique(theartifact).

Notethattheartifactnotonlyconsistsofaproduct,forexample adrugoranREtool,butalsoofaprocess,forexamplethe proto-colfortakingthedrugortheprocedurebywhichtousethetool. Theexperimentaltreatmentthenconsistsofmakingthisproduct availableandgivinganinstructionintheprocess.

Wecandescribethesameexperimentalsituationalsointhe moretraditionalview,inwhichatreatmentissettingthelevelof anindependentvariable.Thisisamore abstract,variable-based viewofexperiments,thatwillbeconvenienttouseinSection3.4 onstatisticalexperiments.Untilthen,itismoreilluminatingifwe usethecomponent-basedviewofFig.4.Table2listsfourgroupsof validationresearchmethodsthatwewilldiscussinthefollowing sections.

3.1. Expertopinion

Intheconceptualstageofvalidation,beforetheartifactistested onmodelsorinthefield,theresearchercanelicittheopinionof expertsaboutthepossibleusabilityandusefulnessoftheartifact. Thisisobservationalempirical research,becausetheresearcher doesnotinterveneinanobjectofstudy.Theresearcherelicits opin-ions.Itisalsonotastatisticalsurveywiththeaimtoestimatethe distributionofopinionsintheentirepopulationofexperts.Rather, itisanattempttogetearlyinformationaboutexpectedusability

Table2

Validationresearchmethods.

Methods Examples

Researchingexpertopinion •Elicitingexpertopinionusing interviews,

•Questionnaires,or •Focusgroups Single-casemechanism

experiments

•Testinganartifactprototype onasimpleexampleinthelab

•Testinganartifactprototypeonarealistic exampleinthelab

•Testinganartifactprototypeonarealistic exampleinthefield

Technicalactionresearch •Usinganartifactprototypeto helpaclient

•Teachingtheuseofanartifactprototypetoa clientbywhichtheycansolvesomeoftheir problems

Statisticaldifference-making experiments

•Comparingtheeffectof prototypesoftwoormore artifactsonasampleof simulatedcontextsinthelab

•Comparingtheeffectofprototypesoftwoor moreartifactsonasampleofcontextsinthe field

(8)

andusefulnessoftheartifactinreal-worldcontexts.Thus, statisti-callymeaningfulsamplesizesarenotneeded;usefulopinionsare needed.

IntermsofFig.4,thepopulationisnotthesetofallpossible expertsbutthesetofallpossible[artifact×context]elements.So theobjectofstudyisnottheexperteither.Rather,theexpertisan instrumenttomeasureanimaginaryOoS,namelyamentalimage thattheexperthasformedofreal-world[artifact×context] ele-ments.Thisisanunreliableinstrument,butonethatnevertheless cangiveusefulinformation.

Positiveor uncriticalopinionsofexpertsarenotveryuseful, becauseexpertsmaybemotivatedbythedesiretofinishthe inter-viewquickly,ortobenicetotheresearcher.Negativeorcritical opinionsontheotherhandareveryuseful,especiallyiftheexpert canindicatewhichelementof theartifactdesignwould notbe usableorusefulinwhichcontext,andwhy.

Example 1. Al-Emran et al. (2010) present an optimization methodforproductrelease planning.Inputtotheoptimization methodisasetofproductreleaseplans,consistingofasequence offeaturestobeimplementedinsubsequentreleasesofaproduct. Theoptimizationmethodthenfindstheplanthatismostrobust,in termsoftime-to-market,resourceassignment,andtaskschedule, withrespecttodifferencesintaskworkloadanddeveloper produc-tivity.Thatis,itselectsthereleasestrategythatisleastinfluenced bydifferencesinworkloadandproductivity.

Theresearcherstestedtheoptimizationmethodamong oth-ersbysubmittingittoexperts,askingtheiropinionaboutit.The researcherssentaquestionnaireaboutthemethodto25product developmentexpertsandreceived13responses.Manyresponses wereuninformativeinthattherespondentsthoughtthemethod wasusable and useful.Some respondents, though, complained thatusingthisoptimizationdecreasedtheirunderstandingofthe releaseplanningprocess,andotherscomplainedthattheyrequired morejustificationoftheresultbeforetheywouldadoptthe recom-mendation.Theseremarkspointatpotentialimprovementneeds ofthemethod.

Collectingexpertopinioncombinesthetwodimensionsof scal-ingup.Expertsimagineasampleofcases(informalsample-based reasoning)andimaginewhatmechanismswouldoccurineachof thosecases(informalcase-basedreasoning).Becauseofthe infor-mality of theirreasoning, theiropinions must be treated with caution,butneverthelesstheymustbetreatedseriously.

3.2. Single-casemechanismexperiments

Iuse the termsingle-case mechanism experimentto indicate experimentsinwhichtheresearcherinvestigatesoneOoSinorder totesttheeffectofsomemechanismthattheresearcherbelieves tobepresentintheOoS.Softwareengineersdothiswhenthey testasoftwareprototypebyfeedingitinputscenariosthat repre-sentpossiblescenariosintheintendedcontextofuse.Aeronautical engineersdoitwhentheytestanairfoilinawindtunnel.

IgivesomeexamplesfromREresearchbeforeIdiscussthelogic ofsingle-casemechanismexperiments.

Itisnotmypurposeheretojudgethequalityoftheanalogic generalizationsorofexplanationsgivenbyauthors,butmerelyto illustratewhattheroleofanalogyandofmechanisticexplanations invalidationexperimentsis.

Example2. Gacituaetal.(2011)proposeanewalgorithmforthe identificationofsingle-and multi-wordabstractionsin require-ments documents, and describe an experiment in which they comparetheperformanceofthisalgorithmwithhumanjudgment. Thealgorithmcomparesthefrequencyofaterminadocument withitsfrequencyinareferencedocumentofthelanguageusedin

therequirementsdocument,suchasacorpusofstandardEnglish. Termsthatarerareinthereferencedocument,butoccurfrequently intherequirementsdocumentarelikelytoindicateimportant con-ceptsinthedomain.

Totestthisalgorithm,theauthorsselectedabookonatechnical domain,anduseditsbodyasifitwerearequirementsdocument toanalyze.Theindexofthebookwasusedasareferencelistof domainconcepts.Thus,theartifacttobetestedisthealgorithm, andthebookanditsindexisthecontext;bothmakeuptheOoS. Thetreatmentinthisexperimentistherequesttoidentify abstrac-tionsinabook.Themeasurementisthemeasurementofrecalland precisionwithrespecttotheindexterms.

Concerningexternalvalidity,theauthorsarguethatthebook’s domainissimilartothedomainofREdocuments,thatthesizeof thedocumentissimilartodocumentsinREprojects,andthatthe conceptabstractionscenariosimilartothatofarequirements engi-neerwhohastofamiliarizeherselfwithanewdomain.Also,they arguethatthehierarchicalstructureoftheindexisrepresentative ofthestructureofmulti-wordtermsintheintendedpopulation.

Internalvalidityisthequestionwhetherthemechanismsbuilt intotheartifactexplaintheobservedeffects.Inthisexperiment, thefrequency-basedmechanismyieldedlowrecallandprecision. Theauthors’explanationisthattheidentificationofabstractions bypeopledoesnottakeplacebyafrequency-basedmethod,and thatfrequencyingeneralisnotasufficientlypowerfulmechanism toidentifyabstractions.

Example3. Seyffetal.(2010)testedatoolformobile require-mentsengineeringinthefield.Theygavemobilephonesrunning thetool tonine subjects,whoused itfor a few daystogather requirementsfor a systemthat supports dailycommuting, and requirementsforasystemthatsupportsshoppingactivities.The requirementswerestatedintextoraudio.Aftertheexperiment, subjects were debriefed, and researchers transcribed recorded needsintosystemrequirements.

Inthisexample,theOoSconsistsofanartifactprototype, inter-actingwitharealisticcontext.Thecontextconsistsofthemobile phoneonwhichtheprototyperuns,theusersusingtheprototype, andtheenvironmentinwhichtheusersmove.Thetreatment con-sistsoftheinstructions totheuserstousethetoolfor twoRE purposes.Thetreatmentinstrumentistheinstructionsessionin whichtheuserswereinstructed.Themeasurementsconsistofthe data(textoraudio)enteredbytheuseraswellastheanswersof theuserstoresearchers’questionsinthedebriefingsession.

ThesimilaritybetweentheseOoS’sand theenvisaged popu-lationoffuturereal-worldmobileREprocessesisthreatenedby potentialdifferencesbetweenfuturetoolsandtheoneusedinthis experiment,andpossiblyalsobydifferencesinelicitationmethods. RememberthattheartifactinthisexampleconsistsofamobileRE toolplustheprocessforusingit.

Theresearchersmadeamechanism(amobileREtool)available touserstotestifitproducedtheexpectedeffects(recorded contex-tualend-userneeds).Themechanismhadtheexpectedeffectinall nineinvestigatedcases.Athreattothevalidityofthisobservation isthatthesubjectsmayhavewantedtobenicetotheresearchers, whichwouldbeafactorco-producingtheexpectedeffect.Other users,withoutafriendlydispositiontotheresearchers,mayhave failedtoproducetheeffectswheninteractingwiththetool.

TheseexperimentsallowanalogicinferencesfromtheOoSto thepopulation,andcanbeplacedalongtheverticaldimensionof ourdiagramofscalingup(Fig.3).Aswesaw,analogic generaliza-tionsmustbesupportedbya theoryofsimilarity,thatexplains whyanobservationonthesourceoftheanalogycanleadtoa con-clusionaboutthetargetoftheanalogy.Inthecomponent-based viewof the worldthat we take, the similaritybetween source andtargetofananalogymustbearchitectural,andthetheoryof

(9)

similaritymustindicatesomecomponent-basedmechanismthat producedaresponseintheexperiment,andcanproducea simi-larresponseinthereal-worldcasesofthepopulation.Hencethe name“mechanism-basedexperiments”.Theseexperimentsdonot usestatisticalinferencetosupportclaimsaboutthepopulation,but theyuseatheoryaboutmechanismstosupportclaimsaboutthe population.

Wedistinguishmechanismsintheartifactfrommechanismsin thecontext.

• Theartifactisbydesignacollectionofcomponent-based mech-anismsthatrespondstoinput.Partofsoftwaretestingconsistof validationwhetherthesemechanisms,ifimplementedcorrectly, indeedhavethedesiredeffects. Thismayleadtosurprises,in thesensethatunexpectedphenomenamayturnup(e.g.bugs) thataretheresultsofunexpectedmechanismsinan implementa-tion.Ingeneral,inalgorithmvalidation,theremaybeunexpected mechanismsintheprogrambecauseourabilitytoprogrammay exceedourabilitytounderstandwhatweprogrammed.

Thisislesslikelytohappenwhentestingamethod. Step-by-stepmethodssuchastheRationalUnifiedProcessbuilduptheir resultsina simplemanner,byinstructionsoftheform“bring aboutresultX”,which,ifperformedcorrectly,leadstothe cre-ation of resultX. If these methods areperformed bycapable softwareengineersin anidealcontext,weusually donotrun againstunexpectedmechanismsinthemethoditself,because methodsarerelativelysimplestep-by-stepprocedures. • Onceamethodhasbeenshowntobeusablebytheresearcherand

hisorherstudents,theimportantresearchquestioniswhether itstillworksunderconditionsofpractice,i.e.intherealworld. Inreal-worldcontexts,theremaybecomponentsormechanisms thatimpacttheproductionofthedesiredresultofamethodstep inunexpectedways.Anexampleofanunexpectedmechanism inmobileREisthetendencyofuserstobeverybriefinthe tex-tualspecificationoftheirneedswhentheywereinaphysically confinedspace,andenterexplanationsbyaudiolater.Thismay makeneedsanalysismoretimeconsuming,whichinturnmay reducethetimelinessandcost-effectivenessoftherequirements specification.Theresearchersmayusethisinformationtochange themethod.

3.3. Technicalactionresearch

Technical action research (TAR) is a case-based mechanism experimenttoo,butIlistitseparatelybecauseitisalsosomething else:Itisareal-worldconsultancyproject(WieringaandMorali, 2012).InaTARprojecttheresearcherusesanartifactina real-worldprojecttohelpaclient,orgivestheartifacttootherstouse theminareal-worldproject(EngelsmanandWieringa,2012),and usesthisexperiencetolearnabouttherobustnessoftheintended effectsandthemechanismsthatbringthemabout,inuncontrolled conditionsofpractice.

Example4. MoraliandWieringa(2010)describea methodto assessconfidentialityriskswhenoutsourcingthemanagementofIT systems.TheythendescribehowMoraliusedthismethodto actu-allyassesstheconfidentialityrisksintheoutsourcingrelationship betweenalargemanufacturingcompanyandalargeoutsourcing serviceprovider.

Inthisexample,theartifactisanewriskassessmentmethod, andthecontextconsistsofMoraliapplyingthismethodtoarisk assessmentprobleminalargecompany.Moraliplayedadualrole asresearchergiving aninstructionhowtouseanartifacttoan OoSinwhich sheherselfwastheuseroftheartifact.The mea-surementstakenconsistedofallintermediateworkingdocuments

oftheproject,plusthediaryofMoraliinherroleasuserofthe method.

ThesimilarityofthisTARprojecttothepopulationofallsuchrisk assessmentprojectsisthataconfidentialityriskinanIT manage-mentoutsourcingsituationisassessed.Thereisalsoadissimilarity, whichisthatinmostprojectsinthispopulation,Moraliwillnotbe theonedoingtheriskassessment.Thisisathreattoexternal valid-itythatmustbedealtwithbyrepeatingTARprojectslikethiswith otherresearchers.

Internal validityis thequestionwhetherthemethodindeed delivereditsexpectedresults,andwhetheranymechanismsinthe contextinfluencedthis.Themethoddiddeliveritsexpectedresults, butonlyrepeatedTARprojectscanshowwhetherornotthisisthe duetothemethodonlyoralsototheuserofthemethod(Morali), thequalityofthedocumentationavailableinthecompany,etc.

TARisaspecialkindofmechanism-basedexperiment,andin theprocessofscalinguptheytaketheresearcherclosertothereal worldintheverticaldimensionoftheprocessofscalingup(Fig.3). TheinferencesinTARareofthesamekindasthoseinother case-basedmechanismexperiments.Eventsinthecaseareexplained in termsof mechanisms,andany generalizationtothe popula-tion issupportedby a theorythat says thatthese mechanisms canoccurinpopulationelementstoo.However,generalizations fromTARprojectshaveanadditionalthreattovalidity,becausethe researchermayhavecontributedpositivelytotheobservedevents inawaythatcannotbereplicated.

TARisusefulasafinalvalidationstagebeforetransferringa tech-nologytopractice,becauseitisclosertoreal-worldpracticethan othercase-basedmechanism experiments.A singleTARproject isnotenoughtojustifytheclaimthatanartifactisapplicablein theentiretargetpopulationofpossibleprojects.Butitdoesjustify theclaimthattheartifactisusableandusefulinsomereal-world projects,anditcanprovideusefulinformationtotheresearcherfor furtherimprovingtheartifact.

3.4. Statisticaldifference-makingexperiments

AstatisticalexperimentisanexperimentwithasampleofOoS’s toinfera statisticalproperty ofthepopulation. For example,it mayestimatethepopulationmeanofavariable,withaconfidence interval,fromobservationsofthesamplemean.Oritmaytesta statisticalhypothesisaboutthepopulationmeanbyobservations fromasample.

Incontrasttocase-basedmechanismexperiments,thesample sizeisrelevant,becauseasindicatedinFig.5,inferenceis sample-based,notcase-based.Statisticalexperimentssupportinductive inferencesaboutsamples,which isthehorizontaldimensionof scalingup(Fig.3).Theydonotrequireatheoryof mechanisms togeneralizeinductivelytoapopulation,butaswehaveseenin Section2.3,providingsuchatheorydoesgiveadditionalsupport toaninductivegeneralization,becauseitwoulddecreasethe like-lihoodthattheinductivegeneralizationisbasedonacoincidental patterninthedata.

Todescribestatisticalexperimentsweneedtoswitchfromthe component-basedviewthatwehavetakenuptillnow,inwhichthe worldconsistsofcomponentsandinteractions,toavariable-based viewoftheworld,inwhichtheworldconsistsofvariablesand rela-tionships.Anydescriptionoftheworldinterms ofcomponents andinteractionscanbereplacedbyamoreabstractdescriptionin termsofvariablesandrelationships.Inthisvariable-basedview,a treatmentconsistsofsettingthevalueofanindependentvariable, andeffectsaremeasuredbymeasuringthevaluesofdependent variables.Iftherearetwotreatments,oftenoneiscalledthe “treat-ment” and the other the “control”, dividing the sample into a treatmentgroupandacontrolgroup.

(10)

Inductiveinferencefromsampletopopulationcantakeplacein avarietyofways,dependingonhowOoS’swereselected (samp-ling)andhowtreatmentswereallocatedtoOoS’s.Inarandomized controlledtrial(RCT),thesampleisrandomandtheallocationof treatmentstosampleelementsisrandomtoo(Shadishetal.,2002; Sedgwick,2011).Thismakesit possibletousethecentrallimit theoremtosupporttheinductiveinferencefromsampleto pop-ulation.Therearetwowaystodothis,byhypothesistestingandby confidenceintervals(Hacking, 2001;WonnacottandWonnacott, 1990).Instatisticalhypothesistestingtheexperimentermayobserve adifferenceinthesample,thatwouldbeveryunlikely(probability lessthan5%)tooccurifadifferencedidnotexistinthe popula-tion.Theresearcherwilltheninferthat,plausibly,thedifference existsinthepopulation.Intheestimationofconfidenceintervals, theexperimentermayestimateapopulationdifferencebythe sam-pledifferenceusinga95%confidenceintervalaroundthesample mean.Thisestimationmayberightorwrong,butifshefollowsthis estimationrulealways,shewillbewronginthelongruninonly about5%oftheinferences(Hacking,2001).

Havinginductively(andfallibly)inferredthatthereisa statis-ticalcorrelationbetweenindependentanddependentvariablein thepopulation,theexperimentertriestoabduceacausal explana-tionofthisdifference,anddoesthisbytryingtoexcludeanyother possiblecauseotherthanthedifferencebetweentreatments. Ran-domsamplingandrandomallocationcanonlyintroducechance fluctuationsthatdisappearontheaverageinthelongrun.Butafter allocation,treatmentsmustbeapplied,andoutcomesmeasured, andsotheexperimentermustalsocheckwhetherapplicationor measurement,oranyother eventduringtheexperiment,could havecontributedtothemeasureddifference.Thisisallpartofthe discussionofinternalvaliditysummarizedinTable2.

If all these alternative causes are excluded, the difference betweentreatmentsistheonlyremainingpossiblecauseofthe observeddifferenceindependentvariables.Thisisanabductive inference.Notethatrandomsamplingandallocationisusedboth intheinductiveinferencestep,whereitfacilitatesapplicationofthe centrallimittheorem,andintheabductiveinferencestep,where itfacilitatestheexclusionofothercausesthanthetreatment.

Random sampling is difficult to achieve in practice, so we findmany quasi-experiments in software engineering and else-where(Kampenesetal.,2009;Shadishetal.,2002;Sjobergetal., 2005),inwhichsamplingisnotrandomorallocationoftreatments toelementsisnotrandom.Forexample,subjectsmayself-select intotreatmentorcontrolgroups,ortheresearchermayallocate treatmentstoelementsaccordingtoapropertyoftheelements. Quasi-experimentscannotusethemathematicaltechniquesbased onthecentrallimittheoremfortheirstatisticalinference,butthere areotherreasoningtechniquesthatcanbeusedforstatistical infer-enceinquasi-experiments(Shadishetal.,2002).

RCTsandquasi-experimentsbothtakeaso-called difference-makingviewoncausality,whichiswhyIcallthemhere difference-makingexperiments.Inthisview,variableXhasacausalinfluence onvariableYifXmakesadifferencetoY.Thatis,ifXhadadifferent value,withallotherthingsbeingequal,thenthevalueofYwould bedifferentaswell(Holland,1986;Woodward,2003).

Forexample,supposeinanRCT,asampleofprojectsusing pro-grammingmethodAperformedbetterontheaveragethanasample ofprojectssolvingthesameproblemsusingmethodB,andsuppose thatthisdifferenceisstatisticallysignificant,i.e.itisunlikelyto beobservedinasampleifitwouldnotexistinthepopulation.So itisunlikelythatthedifferenceistheresultofchancealone.So theresearcherisjustifiedtolookforacause.Theremaybemany causesforthedifference,includingtheavailableresourcestothe projects,thecompetenceofprojectpersonnel,andthedifference betweenmethodsAandB.Iftheresearchercanruleoutallcauses otherthan thedifferencebetweenAand B, then thestatistical

differencesupportstheclaimthatthedifferencebetweenmethods AandBisthecauseofthedifferenceinprojectperformance.

Example5. Precheltetal.(2002)describeanexperimentto com-parethedifferencebetweenmaintenancetasksdoneonprograms wheredesignpatternsweredescribedincommentlines,and main-tenancetasksdoneonprogramswheredesignpatternswerenot commented.Theprogramswereidenticalexceptforthepresence ofso-calledPatternCommentLines(PCLs).

Theartifact is here thepresence of PCLs and thecontext is theprogram,maintenancetask,andmaintainer.Thesubjects self-selectedintothesamples,which makesit hardtoknow which hypotheticalpopulationtheyarearandomsampleof,butwewill assumethatitconsistsofcomputersciencestudentsperforming maintenancetasksonprogramsofsimilarsizeandcomplexityas thoseusedintheexperiment.Thetreatmentistheinstructionto performmaintenancetasks.Treatmentswereallocatedrandomly tosubjects.Themeasuredvariablesweretaskcompletiontimeand correctnessofresult.

Theresearchersfoundaslightimprovementoftasktimeand resultcorrectnesswhenPCLswerepresent,thatwasstatistically significant.Thismeansthat there is a lowprobability that this observationwouldbemadeinthesample,ifthis improvement wouldbeabsent fromthepopulation.Thissupports the induc-tivegeneralizationthatanimprovementexistsinthepopulation ofallprogramsofthesamesizeandcomplexitybeingmaintained bystudents.Theauthorsdiscusspossiblecausesforthis improve-mentotherthanthepresenceofPCLs,andconcludethatthereis noevidencethatthereareothercauses(Precheltetal.,2002,page 599,Threatstointernalvalidity).

Theyadditionallyidentifyacognitivemechanismthatcouldbe responsibleforthiscausalrelationship(abductiveinference,Fig.5). Thismechanismispostulatedbyatheory,formulatedbyseveral researchersearlier,thatprogramcomprehensionworksbythe for-mationand validation ofhypotheses, of whichthe efficiencyis greatlyenhancedbybeacons,whicharehintsaboutfamiliarkinds ofstructures(Precheltetal.,2002,page596).PCLsaresuchbeacons. ThiscouldexplainthecausalinfluenceofPCLsmaintainability.It increasesthesupportfortheclaimthattheobservedimprovement issystematicratherthanacoincidentalevent.

Theauthorsarereluctanttogeneralize,byanalogy,fromthe populationofexperimentalmaintenancesituationssimilartothis experiment,tothepopulationofrealmaintenancetasks(Prechelt et al., 2002, page 599). However, they do reason that, if PCLs hadanimprovementeffectforrelativelysmallwell-commented programs, they might have an even better effect on large ill-commented programs(Prechelt et al., 2002, page 604). This is reasoningbyanalogy.

Inaninterestingaside,theauthorsobservethatexperiments comparingdifferentsyntacticformstoexpressthesamemeaning allhavethemethodologicalproblemthatthetwoformsrarelyhave theexactsamemeaning.Theauthorsgivegeneraladviceabouta methodologicallysoundsetupofsuchexperiments.Thisisa case-basedreasoningbyanalogy(Fig.5),inwhichtheirexperimentis anexampleforother,similarexperiments.

Statistical difference-making experiments support reasoning alongthehorizontaldimensionofourdiagramofscalingup(Fig.3). We see in this example first an inductive inference and then twoabductions.First,thestatisticalcorrelationbetweentwo vari-ables is inductively inferred to exist in a population, based on observationsinthesample.Thisisinductiveinference.Next,itis arguedthatthisstatisticalcorrelationbetweenindependentand dependentvariableisacausalrelationshipfromindependentto dependentvariable,byrulingoutallotherpossiblecausesthan thedifferenceintreatments(abduction1).Third,thiscausal rela-tionshipwasexplainedbyacognitivemechanismpostulatedby

(11)

apreviouslyestablishedtheory(abduction2).Thisincreases con-fidenceinthecausalconclusionsthattheauthorsdrewfromthe experiment.

4. Relatedwork

Thereasoningschema“[Artifact×Context]→Effectby Mech-anisms”hasbeen proposedin slightly differentforms in social science(PawsonandTilley,1997)andinmanagementscience(Van Aken,2004).Ithassomesimilaritywiththesatisfactionargument asproposedbyJacksonin softwareengineering(Jackson,2000). Wieringa(2003)callsitthesystemsengineeringargument,because itshowshowacomponentmustinteractwithothercomponents toproducedesiredbehaviorofacompositesystem.Itissimpler thanthestructurefordesigntheoriesproposedbyGregorandJones (2007). More discussionis provided elsewhere(Wieringaet al., 2011).

Douven(2011)givesaconvenientintroductiontoabductive rea-soning,alsocalled“reasoningtothebestexplanation”.Mechanistic abductionissimilartotheoreticalmodelabductionasdiscussedby Schurz(2008).

Theconceptofmechanismhasbeenproposedbyphilosophers whoanalyzedthestructureofexplanationinthephysical, biolog-ical,andsocialsciences(Glennan,1996;Machameretal.,2000).It hasbeenadoptedasanexplanatoryconstructinbiology(Bechtel and Richardson, 2010; Bechtel and Abrahamsen, 2005) and in thesocial sciences(Bunge,2004; Hedström and Ylikoski, 2010; Elster,1989).Alloftheseauthorshaveslightlydifferingconcepts ofmechanism.IllariandWilliamsonpresentasurveyand unifica-tion(McKayIllariandWilliamson,2012),whichisverysimilarto theconceptthatIhaveusedhere.

ThereisahugeliteratureoncausalityandIcannotevenbegin tocitetherelevantliteraturehere.Therearetwoviews,onethat causalityisdifference-making,theotherthatacausalrelationship isamechanism,andwithineachviewthereareseveralpointsof view.Forexample,theBayesiantheoryofPearlisanexampleof adifference-makingview,describedina book(Pearl,2009)and summarizedinapaper(Pearl,2009b).Holland(1986)isanolder, exceptionallyclearexpositionofthedifference-makingview, stay-ingwithintheframeworkoffrequency-basedstatistics.Williamson (2011)surveys somemechanistic theories,and comparesthem withdifference-makingviews.

Generalization by analogy as discussed here is one form of analytic induction, propagated by Yin as theway to generalize fromcases(Yin,2003),butactuallyoriginatingfromthe sociolo-gistZnaniecki(1968).Theclearestandmostaccessibledescription ofanalyticinductionisgivenbyRobinson(1951):Theresearcher (1) roughly defines a class ofphenomena and (2) formulates a hypothesisaboutamechanismthatispostulatedtooccurinthese phenomena.This is ourtheoryof similarity,and we have only consideredthecasewherethetheorydescribesastructureof inter-actingcomponentsthat implementamechanism.Next,asingle casethatsatisfiesthedefinitionisinvestigated.Ifobservations fal-sifythehypothesis,theneitherthedefinitionisrefinedtoexclude thecaseathand,orthehypothesisisreformulatedtomatchthe observations.Afterinvestigatinganumberofcases,thedefinition andhypothesismayreachastablestate.Theresearcherthen gener-alizesbyclaimingthatallsimilarcasescontainsimilarmechanisms, whichwillproduce similareffects.Thiskindofcase-based rea-soningmovesusupwardinthediagramofgeneralization(Fig.3). Znanieckilistsafewhistoricalexamplesfrombiology,physicsand sociologywherethiskindofreasoningwasfollowed (Znaniecki, 1968,pages236–237).

Zelkowitzand Wallace(1998)presented asurvey of empiri-calvalidationmethodsinsoftwareengineeringthatIcomparein Table3withthelistinTable2.Intheterminologyofthispaper,their

Table3

ValidationmethodsidentifiedbyZelkowitzandWallace(1998)andbyGlassetal. (2001).

Thispaper Zelkowitzand

Wallace(1998)

Glassetal.(2001)

Validationresearchmethods

•Expertopinion •Single-casemechanism

experiment

•Simulation •Fieldexperiment •Dynamicanalysis •Laboratoryexperiment–

Software •Simulation •Technicalaction

research

•Casestudy •Actionresearch •Statistical difference-making experiment •Replicated experiment •Fieldexperiment •Synthetic environment experiment •Laboratoryexperiment– humansubjects

Otherresearchmethods

•Observationalcase study

•Casestudy •Casestudy •Fieldstudy •Fieldstudy

•Meta-researchmethod •Literaturesearch •Literaturereview/analysis

Measurementmethods

•Methodstocollectdata •Project monitoring •Ethnography •Legacydata •Lessonslearned Inferencetechniques •Techniquestoinfer informationfromdata

•Staticanalysis •Dataanalysis •Groundedtheory •Hermeneutics •Protocolanalysis

listcontainsvalidationmethodsbutsomeotherkindsofmethods too.

Theirassertionmethodhasbeenomittedbecause,astheyalso pointout,itisnotaresearchmethod.It isanexperimentaluse ofanewtechnologybythedeveloperinthelaboratory. Simula-tionis executinga productin a simulatedenvironment. Thisis a single-casemechanism experiment becausetheproductisan implemented mechanism to be tested. Dynamic analysis is the executionofaproductundercontrolledconditions,similarto sim-ulationbutnotaimedatsimulatingreal-worldenvironments.Itis asingle-casemechanismexperimenttoo.

Acasestudycouldbetheuseofanewtechnologyinan indus-trialproject(ZelkowitzandWallace,1998,page26),inwhichcase weclassifyitasatechnicalactionresearchproject,oran obser-vationalstudyofaproject(ZelkowitzandWallace,1998,page25), inwhichcaseweclassifyitasanobservationalcasestudy.Case studiesthereforeappeartwiceinTable2.

Replicatedexperiments and syntheticenvironment experiments arestatisticalcomparisonofgroupsofprojects,wherein differ-entgroups,a taskisperformeddifferently. Thesearestatistical difference-makingexperiments,performedinthefield orinthe lab.

The difference between observational case studies and field studiesdefinedbyZelkowitzandWallace(1998,page26)isthat acasestudyisintrusivewhereafieldstudyisnot.Theyareboth classifiedasobservationalcasestudiesintheterminologyofthis paper, because in both, the research method is observational and influence oftheresearcher on theobjectof study isto be minimized.Observationalstudiesare,intheterminologyofthis

(12)

paper,suitableasmethodsforevaluationstudiesofimplemented technology, but not as research methods for validating new technologynotyettransferredtothemarket.

Literaturesearchispartofanyresearchbutmaybeexpandedinto afull-blownresearchmethod,alsocalledasystematicliterature review(Kitchenham,2004).

Projectmonitoringisthecollection,bytheresearcher,ofdata producedduringaprojectandlegacydataisthecollectionof doc-umentssuchassourcecode,specifications,and testplansafter theprojectisfinished.Forafull-blownresearchmethod,weneed adesignofthewaytheresearcher willinteractwiththeobject ofstudy,includingmeasurementmethodsandanyexperimental intervention,andinferencedesign.Intheterminologyofthispaper, projectmonitoringandlegacydataasdescribedbyZelkowitzand Wallace,aremeasurementmethods.

Lessonslearnedisthecollectionandanalysisoflessonslearned documentsfromprojects.InTable2thisisclassifiedasa measure-mentmethodtoobutbecauseitalsocontainsanalysis,wecould alsoclassifyitasaformofobservationalfieldstudy.

In static analysis the completed product is investigated, for exampletoanalyzeitscomplexity.Itissimilartothestudyoflegacy databutitishereclassifiedasaninferencetechniquebecauseit referstoacollectionofanalysismethods.

Glassetal.(2001)listtheempiricalresearchmethodsshownin thethirdcolumnofTable3.Non-empiricalmethodssuchas concep-tualanalysisandmathematicalproofhavebeenomitted,anddesign activities,viz.conceptimplementationandinstrumentdevelopment havebeenomittedtoo.Sinceitisnotclearfromthedescriptionby Glassetal.whetherexperimentsareofthesingle-casemechanism kindorofthestatisticaldifference-makingkind,theyareclassified asboth.Idiscussthenewentriesinthiscolumn.

Ethnographyisthedetailedcollectionanddescriptionofdaily eventsinasocialgroup,withoutanalysis,whichisclassifiedhere asameasurementmethod.Groundedtheoryistheanalysisof tex-tualdataproducedbypeople,toextractthetheoriesheldbythese people(StraussandCorbin,1998).Iconsiderthistobeadescriptive analysismethod.Hermeneuticsisthephenomenonthattointerpret humanbehavior,youhavetounderstandtheirculturaland concep-tualframework,buttheonlywaytounderstandtheirculturaland conceptualframeworkistointerprettheirbehavior.Thisleadsto aninferencestrategyinwhichtheresearcheriteratesoverupdating hisorherconceptualframeworkandinterpretinghumanbehavior inthat framework. Protocolanalysis isthe analysisof thinking-aloudprotocols,usefulforcognitivepsychology.Itisadataanalysis method.

Table3showsthattheselistsofsoftwareengineeringresearch methods are mutually consistent and can be integrated in my frameworkfor validation researchmethods,and extenditwith othermethods. The overview is not complete, happily,as new methodsandinstrumentsforresearchkeepbeingdeveloped.

5. Summaryandconclusion

Empiricalvalidationof technologybeforeitis transferredto practicerequiresinvestigatingtheeffectsoftheinteractionofthe artifactwithitscontext,andexplainingtheseeffectsbymeansof theunderlyingmechanismsthat produces theseeffects.Scaling uptopracticethusproducesadesigntheoryoftheform “[Arti-fact×Context]producesEffectsbyMechanisms”.

Producingsupportforsuchatheoryinvolvestwokindsof infer-ences,alongtheverticalandhorizontaldimensionsoftheprocess ofscalingup.Analogicinferencesintheverticaldimensionreason fromcase-basedmechanismexperimentstoreal-worldinstances of[Artifact×Context],andstatisticalinferencesalongthehorizontal dimensionreasonfromobservedsamplebehaviortothe popula-tionofallpossibleinstancesof[Artifact×Context].Bothinferences

aresupportedbyabductiveinferences,thatpostulate mechanism-basedexplanationsofcause-effectsinfluences.Mechanism-based explanationsrefertothecomponentsof[Artifact×Context]and theirinteractions.

Wediscussedthefollowingresearchmethodstovalidate arti-facts:

• Expertopinion,inwhichexpertsreasoninformallyaboutsamples (horizontally)and mechanisms(vertically),which providesan initialsanitycheckofanartifactdesign;

• Single-case mechanism experiments, in which the researcher reasonsverticallyaboutmechanismsandtheireffectsin increas-inglyrealisticartifactsinincreasinglyrealisticcontexts; • Technicalactionresearch,inwhichtheresearcherreasons

ver-ticallyaboutmechanismsandtheireffectswhenanartifactis appliedinareal-worldprojecttohelpaclient;

• Statistical difference-making experiments, in which the researcher reasons horizontally from effects observed in samplestoeffectsinferredinpopulations.

Thesemethodscanbeusedwithmeasurementinstrumentsand dataanalysismethodsknownfromsoftwareengineeringand else-where.

Thispaperhasgivensomeexamplesofuseoftheseresearch methods,butthisisjustonesteponthewaytoscalingupthese methodstoempiricalREresearch.Increasinguseofthese meth-odswillteachusmoreabouttheusabilityandusefulnessofthese researchmethodsinempiricalvalidationofREtechnology.

Acknowledgements

ThispaperbenefittedfromcommentsbyVincenzoGervasiand WalterTichy.Iwouldliketothanktheanonymousreviewersof thispaperfortheirconstructivecritique.

References

Al-Emran,A.,Pfahl,D.,Ruhe,G.,2010.Decisionsupportforproductrelease plan-ningbasedonrobustnessanalysis.In:Proceedingsofthe18thIEEEInternational RequirementsEngineeringConference(RE2010),IEEEComputerSociety, Syd-ney,Australia,pp.157–166.

Apostel,L.,1961.Towardsaformalstudyofmodelsinthenon-formalsciences.In: Freudenthal,H.(Ed.),TheConceptandRoleoftheModelintheMathematicaland theNaturalandSocialSciences.Reidel,Dordrecht,TheNetherlands,pp.1–37. Babbie,E.,2007.ThePracticeofSocialResearch,11thed.ThomsonWadsworth,

Belmont,USA.

Bechtel,W.,Abrahamsen,A.,2005.Explanation:amechanisticalternative.Studiesin theHistoryandPhilosophyofBiologicalandBiomedicalSciences36,421–441. Bechtel,W.,Richardson,R.,2010. DiscoveringComplexity:Decompositionand LocalizationasStrategiesinScientificResearch.MITPress,Cambridge, Mas-sachusetts(Reissueofthe1993editionwithanewintroduction).

Bunge,M.,2004.Howdoesitwork?Thesearchforexplanatorymechanisms. Phi-losophyoftheSocialSciences34,182–210.

Constant,E.,1980.TheOriginsoftheTurbojetRevolution.JohnsHopkins,Baltimore. Cook,S.,Daniels,J.,1994.DesigningObjectSystems:Object-OrientedModellingwith

Syntropy.Prentice-Hall,UpperSaddleRiver,NewJersey.

Cowan,C.,2002.Theprocessofevaluatingandregulatinganewdrug:phasesofa drugstudy.AANAJournal70,385–390.

Damian,D.,Chisan,J.,2006.Anempiricalstudyofthecomplexrelationshipsbetween requirementsengineeringprocessesandotherprocessesthatleadtopayoffs inproductivity,qualityandriskmanagement.IEEETransactionsonSoftware Engineering32,433–453.

Davis,A.,Hickey,A.,2004.Anewparadigmforplanningandevaluating require-mentsengineeringresearch.In:2ndInternationalWorkshoponComparative EvaluationinRequirementsEngineering,pp.7–16.

Douven,I.,2011.In:Zalta,A.(Ed.),TheStanfordEncyclopediaofPhilosophy(Spring 2011Edition).http://plato.stanford.edu/archives/spr2011/entries/abduction/ Elster,J.,1989.NutsandBoltsfortheSocialSciences.CambridgeUniversityPress,

Cambridge,UK.

Engelsman,W.,Wieringa,R.J.,2012.Goal-orientedrequirementsengineeringand enterprisearchitecture:twocasestudiesandsomelessonslearned.In: Require-mentsEngineering: Foundationfor SoftwareQuality (REFSQ2012),Essen, Germany,pp.306–320(volume7195ofLecturenotesincomputerscience, Springer).

Referenties

GERELATEERDE DOCUMENTEN

Master thesis Jet Oosterheert 21 secondary school teachers were used to get insights in their opinions about how to scale-up an existing program.. These interviews and the

By testing and experimenting during the introduction stage, inclusive businesses are able to find their best practices and adapt their initial business models to the context

Omdat het (ook bij andere aannames) praktisch uitgesloten is dat fietsers en voetgangers van dezelfde delen van de passage gebruik kunnen maken, adviseert de SWOV om het

Eind 2006 zijn verdere gesprekken opgestart met de regio en is een eerste verkenning uitgevoerd naar de mogelijkheden voor doorwerking van het gedachtegoed uit Door met

ernstig letsel, uitgedrukt in aantal bromfiets-auto-ongevallen per miljard voertuigkilometer over de periode 1985-2004. De expositie is bepaald door het product van het totale

In de onmiddellijke omgeving van het te onderzoeken terrein zijn in het verleden vondsten gedaan door Yann Hollevoet (zie CAI locatie 300036).. Het bevindt zich dan ook op de zandrug

In het algemeen geldt dat voor jonge kinderen tot circa 2 jaar het gebruik van beeldschermen zoveel mogelijk beperkt moet blijven, voor kinderen tot 5 jaar ongeveer 1 uur per dag

(A-C) Contains the familial, perinatal and environmental risk score in the BAMSE birth cohort combined with the genetic risk score (GRS) from the PIAMA GWAS (A), the GRS from