ContentslistsavailableatScienceDirect
The
Journal
of
Systems
and
Software
jo u r n al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s
Empirical
research
methods
for
technology
validation:
Scaling
up
to
practice
Roel
Wieringa
∗DepartmentofElectricalEngineering,Mathematics,andComputerScience,UniversityofTwente,TheNetherlands1
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received30November2012 Receivedinrevisedform 11November2013 Accepted16November2013 Availableonlinexxx Keywords:
Empiricalresearchmethodology Technologyvalidation Scalinguptopractice
a
b
s
t
r
a
c
t
Beforetechnologyistransferredtothemarket,itmustbevalidatedempiricallybysimulatingfuture prac-ticaluseofthetechnology.Technologyprototypesarefirstinvestigatedinsimplifiedcontexts,andthese simulationsarescaleduptoconditionsofpracticestepbystepasmorebecomesknownaboutthe tech-nology.Thispaperdiscussesempiricalresearchmethodsforscalingupnewrequirementsengineering (RE)technology.
Whenscalinguptopractice,researcherswanttogeneralizefromvalidationstudiestofuturepractice. Ananalysisofscalinguptechnologyindrugresearchrevealstwowaystogeneralize,namely induc-tivegeneralizationusingstatisticalinferencefromsamples,andanalogicgeneralizationusingsimilarity betweencases.Botharesupportedbyabductiveinferenceusingmechanisticexplanationsofphenomena observedinthesimulations.IllustrationsoftheseinferencesbothindrugresearchandempiricalRE researcharegiven.Next,fourkindsofmethodsforempiricalREtechnologyvalidationaregiven,namely expertopinion,single-casemechanismexperiments,technicalactionresearchandstatistical difference-makingexperiments.AseriesofexamplesfromempiricalREwillillustratetheuseofthesemethods,and theroleofinductivegeneralization,analogicgeneralization,andabductiveinferenceinthem.Finally, thefourkindsofempiricalvalidationmethodsarecomparedwithlistsofvalidationmethodsknown fromempiricalsoftwareengineering.Thelistsarecombinedtogiveanoverviewofsomeofthemethods, instrumentsanddataanalysistechniquesthatmaybeusedinempiricalRE.
©2013ElsevierInc.Allrightsreserved.
1. Introduction
Empiricalassessmentoftechnologycomesintwoflavors,which inthispaperwillbecalledtechnologyvalidationandtechnology evaluation,respectively.Technologyvalidationisdefined hereas theassessmentofasimulationofthetechnologyinasimulation ofitsintendedcontextofuse,inordertopredictwhatwould hap-penifthetechnologywereactuallyusedbystakeholdersinthis intendedcontext.Wetaketheterm“simulation”inaverywide senseastherepresentationof thefunctioningofonesystemor processbymeansofthefunctioningofanother.2Forexample,a
newrequirementsprioritizationtechniquemaybetestedby exper-imentingwithitinaclassroom.Thisisavalidationiftheclassroom experimentrepresentssomeaspectsofwhatwouldhappenifthe techniquewasusedinpractice.
Validationalwaysinvolvesscalinguptopractice,whichmeans that successive tests take place under increasingly realistic
∗ Tel.:+31534894189.
E-mailaddress:r.j.wieringa@utwente.nl 1http://www.cs.utwente.nl/roelw. 2http://www.merriam-webster.com.
conditions.Forexample,theinventorofarequirements prioritiza-tiontechniquemayusethistechniqueinareal-worldproject.This validationwouldresemblereal-worlduseofthetechniquemore thana classroomexperiment,exceptthatit isstilltheinventer herselfwhousesthetechnique.
Atechnologyhasbeentransferredtopracticeifithasbeen pack-aged,marketed,distributed,soldorotherwisemadeavailableto users,andisnowbeingusedindependentlyfromthecontextin whichitwasinventedortested.Aftertransfertopracticeother peo-plethanitsinventorsareusingit,andtheyareusingittoachieve theirowngoals,withouthelporotherkindofinterventionfromits inventors,andafterinvestmentoftheirowntimeand/ormoneyto learntousethetechnology.
Technologyvalidationistobecontrastedwithtechnology eval-uation,definedhereastheempiricalassessmentofatechnology asandwhenusedinpractice.Forexample,anREresearchermay studyhowaprioritizationtechniqueisusedinreal-worldprojects bymeansofobservationalcasestudies.Whereavalidationstudy aimstomakepredictions,basedonsimulations,abouthowa tech-nologywouldperformiftransferredtopractice,anevaluationstudy assesseswhathashappenedintheactualuseofthetechnology after ithas beentransferred inpractice. Thisfollows terminol-ogycommonlyusedinthesocialsciences, whereanevaluation studyisanempiricalassessmentofsomesocialinterventionthat 0164-1212/$–seefrontmatter©2013ElsevierInc.Allrightsreserved.
Fig.1.ThenewdrugdevelopmentandreviewprocessoftheU.S.FoodandDrug Administration.
hasbeen performed, suchas a recently implemented teaching methodinschools, toinvestigateitsimpactin practice(Babbie, 2007).
Technologyvalidationisaprocessofscalinguptopracticeinall engineeringsciences.Forexample,theinventorsofthejetengine validatedtheirdesignsbybuildingincreasinglyrealisticprototypes andtestingtheminincreasinglyrealisticenvironments(Constant, 1980).In this paper Iwill summarizeand analyzethe ways in whichwecanscalerequirementsengineering(RE)technologyup topractice.
TheREtechnologybeingvalidatedcouldbetechniques, meth-ods, notations, algorithms, etc. used for various requirements engineeringtaskssuchasrequirementselicitation,goalanalysis, requirementsspecification,requirementsprioritization, traceabil-itymanagement,requirementsmaintenance,etc.Requirementsin this paperaredefined asdesired properties ofa system.Goals, bycontrast,arestatesoftheworlddesiredbystakeholders,and for which the stakeholders have committed a budget (time or money)toachievethem.Allrequirementsaregoalsbecausethey aredesiredbystakeholders,andstakeholdershavecommitteda budgettoachievethem.Butnotallstakeholdergoalsaresystem requirements.Stakeholdershavemanygoalsnotstatedintermsof desiredsystempropertiesatall.
DavisandHickey(2004)proposedusingthemethodologyof NewDrugDevelopmentforscalingupREtechnology.Iwill pur-suethis ideafurtherinSection2 andfocus inparticularonthe inferencesusedinNewDrugDevelopmentwhengeneralizingfrom theobjectofvalidationresearchtoinstancesofreal-worlduseof thetechnology,andshowthattheseinferencescanbeusedinRE researchtoo.In Section3, Ipresent fourmethods for empirical technologyvalidation, and show how thegeneralization meth-odsidentifiedinSection2canbeusedinthem.Thisisillustrated bya seriesof examplesfromempirical requirements engineer-ing.Finally,inSection4,Ireviewtheempiricalsoftwarevalidation methodsidentified byZelkowitzand Wallace(1997, 1998) and byGlassetal.(2001)andshowhowtheyfitintotheframework presentedinthispaper,andaddalistofexamplesoftechniques for measurement and data analysis used in empirical software engineering.Section5endsthepaperwithabriefsummaryand outlook.
2. Scalingup
2.1. Scalingupindrugresearch
DavisandHickey(2004)werethefirsttoapplytheNewDrug DevelopmentandReviewProcessoftheU.S.FoodandDrug Admin-istration toREtechnologytransfer. Isummarize theprocess in Fig.1.Thefollowingdescriptionisbasedoninformationprovided
bytheFDA3andtheexplanationsgivenbyDavisandHickey(2004),
Cowan(2002)and MolzonandPharm(2005).Myanalysisgoes beyondthatofDavisandHickeybyanalyzingthethreekindsof inferenceusedinthisprocess.Iwillindicatetheanalogyofeach stageoftheNewDrugDevelopmentprocesswithastageinscaling upREtechnology.
2.1.1. Pre-clinicalresearch
Pre-clinicalresearchistheexplorationandvalidationofdrugs beforetestingitonpeople.Itconsistsofasynthesisandpurification task,andoftestingthedrugonanimals.
Insynthesisandpurification,achemicalisidentifiedinthe lab-oratoryasapotentiallyeffectivedrug,basedonearlierexperience reportedintheliterature,biochemicalknowledge,andknowledge ofthehumanbody.Atheoryaboutwhythiscouldbeaneffective drugtoimproveamedicalconditionispostulated.Thisistheinitial versionofadesigntheorythatwillbetestedandelaboratedinthe followingstagesoftheNewDrugDevelopmentprocess.It corre-spondsinREresearchtotheinitialideaforanewREtechniqueand theinitialjustificationthatthisideamightworktosolvesomeRE problem.Therestoftheprocessaimstovalidateandelaboratethis designtheoryinincreasinglyrealisticcontexts.
Animaltestsaredonetoshowthatthedrugwouldbesafeto usein peopleand toinvestigateinmore detailthebiochemical mechanismsthatproducethedrug’seffects.Ifthereareno nega-tiveeffectsintheinvestigatedcontexts(i.e.inanimals),andifthe mechanismsfoundinthesecontextsareexpectedtobesimilarto thoseinhumanbodies,thenthisisevidencethatthedrugis prob-ablysafeforhumans,andarequestissubmittedtoinstitutional reviewboardsforpermissiontotestthedruginhumans.Usually twodifferentanimalspeciesaretaken,becauseadrugusuallyhas differenteffectsindifferentspecies(Cowan,2002).Shortterm test-inginanimalscantakeuptothreeyearsbutontheaveragetakes 18months.
Theanimalsareusedfortestingasnaturalmodelsoftheintended real-worldcontextofthedrug,namelythehumanbody.Theanalog inREresearchwouldbetestinganewREtechniqueonstudentsin alaboratory,tostudytheeffectsofthetechniqueandthe mech-anismsbywhichtheseeffectsareproduced.Althoughthegoalof thisresearchwouldnotbetoestablishevidenceforsafetyofthe technique,thepurposewouldstillbetoassesswhetherthebenefit ofusingthetechniqueinpracticewouldoutweighthecostandrisk ofdoingso.
Long-termanimalresearchinvestigatesthelong-termeffectsof usingadrug,andmaycontinueintothepost-availabilitystage.This isanexampleofvalidationresearchthatcontinuesaftertransferof thenewtechnologyintopractice.Long-termanimalresearchhas thelogicofvalidationresearch,asitsimulatestheeffectsofadrug byusingamodel,andisusedtopredictwhatwouldhappento humanbodies.ThiscanbedoneinREresearchtoo.Forexample, theeffectofusingtheUMLonthecomprehensionofprogramscan beinvestigatedinthelaboratory,usingstudentsassubjects,long aftertheUMLhadbeentransferredtopractice.Insightsfromsucha studycouldbeusedtopredicttheeffectofUMLoncomprehension ofprogramsinpractice.
2.1.2. Clinicalresearch
Inclinicalresearch,thedrugistestedonpeople.Itconsistsof threephases.
• Inphase1,randomsamplesof20–80healthysubjectsareused totestthedrug.Thegoalistoinvestigatesideeffectsandthe
3http://www.fda.gov/Drugs/DevelopmentApprovalProcess/ SmallBusinessAssistance/ucm053131.htm.
so-calledmechanismofactionofthedrug,whichisthe biochem-ical interactionby a which a drug produces pharmacological effects.Ifpossible,theeffectiveness(positiveeffects)ofthedrug isinvestigatedtoo.Thisphasemaylastseveralmonthsandends whenresearchersaresufficientlycertainthatthedrugissafeto useinpatients.
• Inphase2,severalhundredpatientsareusedtoinvestigatethe effectofthedrugincontrolledstudiesbymeansofrandomized controlledtrials.Thegoalistoinvestigatetheeffectivenessin patientswithaspecificdiseaseorcondition,andtoinvestigate short-termside-effectsandidentifypossiblerisks.Phase2may lastseveralmonthstotwoyearsandendswhenknowledgeabout effectiveness,side-effectsandrisksisdeemedsufficientlywell establishedtodolarge-scaletestsinphase3.
• Inphase3,controlledanduncontrolledtrialswithseveral hun-dredtoseveralthousandpatientsaredonetogatheradditional evidenceabouteffectivenessandsafety.Thegoalistofindoutif theeffectivenessandsafetyclaimscanbegeneralizedtothe pop-ulationofallpossiblepatients.Phase3maylastonetofouryears andendswhenresearchersthinktheclaimsabouteffectiveness andsafetyofthedrugcanbegeneralizedtothepopulation.
As we will see below, when validating RE technology, similar researchgoalsexist,andtheyarepursuedjustasindrugresearch byexperimentalresearchinthelabandinthefield.
2.1.3. Post-availabilitystudies
Afteradrughasbeenapprovedandismadeavailabletoa mar-ket,assessmentcontinuesinso-calledphase4studies,forexample bysurveysorobservationalcasestudiesofpatientsusingthedrug. Inourterminology,theseareevaluationstudies.Post-availability studiesaredoneinREresearchtoo,forexamplewhenresearchers investigatetheeffectofusingtheUMLoncodingerrorsin real-worldprojects.
2.2. Designtheories:artifactincontextproduceseffectsby mechanisms
Iwillnowanalyzethelogicofdrugvalidationinmoredetail, withaviewtodrawingconclusionsaboutthelogicofREtechnology validation.Inotherwords,Itreatthedrugdevelopmentprocessasa modeloftheREtechnologytransferprocess,thatwecaninvestigate tolearnsomethingabouttheREtechnologyvalidationprocess,just aswecanuseanimalsasmodelsofpeopletolearnsomethingabout howpeoplerespondtodrugs.Toabstractfromwhetherwetalk aboutdrugsorREtechnology,Iwillcallthetechnologytobescaled upanartifact.
InwhatfollowsIpresentanumberofobservationsoftheprocess describedinSection2.1.Thefirstobservationisthatthevalidation tasksinnewdrugdevelopmentaredividedintothreestages: Con-ceptualvalidation,modeling,andfieldtests.Inconceptualvalidation (correspondingtosynthesisandpurification),anartifactistested by observing itsbehavior in a very artificialcontext suchas a test tube.Mostofthevalidation is doneonpaperand consists ofcomputations,workedexamples,mathematicalproofs,informal argumentstestedoutwithcolleagues,etc.Inthemodelingstage (correspondingtoanimaltesting andphase-1clinicalresearch), theartifactistestedoutonamodel.Indrugdevelopment,theseare animalsfirstandhealthypeoplenext.Thereareimportantethical constraintsinbothkindsoftestsandtheNDAprocessrecognizes theneedforanethicalreviewboardatleastwhentransitioning totestswithpeople.In thefield testingstage(correspondingto phase-2andphase-3clinicalresearch),real-worldcasesareused totesttheartifacton.Thesereal-worldcasesaretreatedasmodels ofarbitrarypopulationelements.
Mysecondobservationofdrugvalidationisthatwhatis val-idatedisnottheartifactbuttheartifactinacontext,e.g.adrug inabody.Validationistheattempttotestthefollowing predic-tion(Wieringa,2009):
[Artifact×Context]willproduceEffects.
Effectsmaychangeifthecontextchanges,andsotheartifact mustbeinvestigatedindifferentcontextsuntilitisclearinwhat rangeofcontextswhatrangeofeffectsisproduced.Forexample, anewtechniqueforelicitingrequirementsmaybetestedonits effectsinsmallprojects,largeprojects,embeddedsystemsprojects, informationsystemsprojectsetc.andbefoundtobeeffectivein somebutnotallofthesecontexts.
Third,whenvalidatinganartifactincontext,researchersshould notonlyaimatidentifyingtheregularproductionofaneffectin certaincontexts,theyshouldalsoaimtoexplainthiseffectinterms ofunderlyingmechanisms.Indrugresearchthesearecalledthe mechanismsofaction.Thistermindicatestheinteractionsbywhich adrugproducesa pharmacologicaleffect,includingthebinding ofthedrugtomoleculartargets,itseffectonthesetargets,and theeffectonbiochemicalpathwaysinthebody.Forexample, caf-feine hasseveral mechanismsof action, two of which are that itantagonizesabiochemicalcompound(adenosine)thatinhibits neurotransmitters,andthatitincreasestheactivityof neurotrans-mitterssuchasdopamine(Nehligetal.,1992).Thesemechanisms explainwhycaffeinehasapsychostimulanteffect.
Theconceptof amechanism ofactionis similartothat ofa principleofoperationusedinengineeringmethodology(Vincenti, 1990),whichisthetop-leveltheoryofthemechanismbywhich anartifactproducesaneffectinacontext.Forexample,the prin-cipleofoperationofanairplaneisthatbytheshapeofitswings, airabovethewingflowsfasterrelativetothewingthanairbelow it,whichaccordingtoBernoulli’sprincipleproducesupwardlift. Butwheretheprincipleofoperationisthehighest-levelviewof howanartifactproducessomeeffectinacontext,amechanism ofactionistheactualrealizationofthisprincipleinthe interac-tionsbetweencomponentsofalow-levelarchitecturesofrealized artifactinarealcontext.Theprincipleofoperationofanairplane explainswhyairplanesfly.Themechanismofactionofa particu-lartypeofairplaneconsistsofthedetailed,low-levelinteractions amongaircraftcomponentsandthesurroundingairthatactually producethecapabilityofthistypeofairplanetofly.The mecha-nismsofactionexploittheBernoulliprinciple,butdoalotmore.
InREtoo,mechanismsofactioncanbeidentifiedthatexplain observedeffects.Forexample,DamianandChisan(2006)describe theintroductionofREtechniquesinanorganizationandidentify cross-disciplinarygroupmeetings,andtheirinteractionwithother partsofthesoftwareengineeringorganization,asamechanismthat causesfewerdefects,lessrework,andimprovedeffortestimates.
Validationresearchthusaimsatmakingpredictionsoftheform [Artifact×Context]willproduceEffectsbyMechanisms.
Wewillcallthisadesigntheory(Wieringaetal.,2011). Whenresearchersdesignandvalidateanartifact,theystartfrom aninitialideaabouttheprincipleofoperationoftheartifact,which isasolutionidea,butnotyetanimplementedandworkingsolution. Thisstatesthetop-levelprincipleofoperation.Whenscalingupan artifacttoconditionsofpractice,thisinitialtheoryistestedand elaborated,untilfinallyastreet-testedarchitecturewith mecha-nismsofactionisdelivered,thatimplementsthetop-levelprinciple ofoperation.
Thetheoryofanimplementedartifactmaybeincompleteabout themechanismsthatproducetheeffects,andintheextremecase betotally silentabout them. For example,engineers mayhave foundwhatthedetailedstructureandtextureofawingsurface
• Designtheory: [ArtifactX Context]producesEffectsby Mechanisms
• Valuetheory: [EffectsX Stakeholders]producesValuation.
Fig.2.Thestructureofdesigntheoriesandofvaluetheories.
isthatismostconducivetofuelefficiency,withoutunderstanding theprecisemechanismsbywhichthishappens.Ifmechanismsare notunderstood,aslightlydifferentdesignoranexistingdesignin anew,previouslyun-encounteredcontextmayfailforunknown reasons.Forthisreason,inthehealthsciences,evidenceof regu-larityisnotgoodenoughtoclaimregularproductionofaneffect: Knowledgeoftheunderlyingmechanismsisneededtoo(Russoand Williamson,2007).Inengineering,intheabsenceofknowledgeof underlyingmechanisms,safetyrisksaremanagedbytestingdesign changesand sensitivitytocontext onlyin smallsteps(Petroski, 1994).
Afourthobservationofthedrugdevelopmentprocessisthat thereisasecondtheory,thatisstakeholder-related(Wieringaetal., 2011):
[Effects×Stakeholders]willproduceValuation.
Iwillcallthisavaluetheory.Thetheorystatesthatvariouskinds ofstakeholderswhoexperiencetheeffectswillattachapositive, negative,orindifferentvaluetoit.
Thegoalofdrugresearchisnotonlytoidentifyeffectsand mech-anismsofanartifactincontext,butalsotoidentifythevalueofthese effectsforstakeholders.Stakeholderslikesomeeffectsanddislike others.Effectsthatarelikedarecalled“benefits”andeffectsthatare dislikedareoftencalled“side-effects”.Contextpropertiesthattend toproduceeffectsthataredisliked,arecalled“contra-indications” indrugresearch.
Finally,asallscientifictheories,designtheoriesaswellasvalue theoriesarefallible.Theresearcherisnottotallycertainaboutthem andmuststatetheextentofhisorheruncertainty.Theuncertainty withwhicheffects,benefits,costsandside-effectscanbepredicted, arecalled“risks”.
AlloftheseconceptsarerelevantforREresearchtoo.For exam-ple,theuseofmobileREtechnologytoelicitrequirementshasthe benefitthatuserrequirementsmaybemoreconcrete,detailedand completethanispossiblebyotherelicitationtechniques.Butthat isnotcertain,andthisisarisk(ofabenefitnotmaterializing).It mayalsohaveassideeffectthattheusermayhavetheexpectation thateachandeveryneedsheenters,willbesatisfiedinthenear future.Thistooisarisk(ofanegativeoutcome).Also,mobileRE technologymayresultinhugeamountsoftextualandmultimedia datathatmustbeanalyzedmanually,whichisacost.Inshort,when validatingREtechnology,theREresearcherisnotonlyinterested intheeffectsofanartifactincontextandthemechanismsthat pro-duceit,butalsointhebenefit,costandriskofusingthistechnology insomecontexts(Fig.2).Bothkindsoftheoryareimportant,butin therestofthispaper,Iwillfocusonthedevelopmentandvalidation ofdesigntheories.
2.3. Inferencesthatsupportdesigntheories
Lookingoncemoreatthedrugdevelopmentandreviewprocess inFig.1,weseethattwokindsofinferencesareusedtogeneralize fromexperimentstothepopulationofpotentialpatients:Inductive generalizationandanalogicgeneralization.Inductive generaliza-tionisthestatisticalinferencefromasampleoftestsubjectsto thepopulationofsubjects.Analogicgeneralizationistheinference frommodels(suchasanimals,andhealthyvolunteers)topatients. ThisisrepresentedinFig.3,whereinductivegeneralizationisthe horizontaldimensionand analogicgeneralizationisthevertical dimension.
Fig.3.Twokindsofinferenceswhenscalinguptopractice:inductive generaliza-tionfromsamplestopopulation(horizontal)andgeneralizationbyanalogyfrom experimentalcasestoreal-worldcases(vertical).
Wediscussthesegeneralizationsinthenexttwoparagraphs. Next,wediscussathirdkindofinference,calledabductive infer-ence, thatcanbeusedtosupportanalogicaswellasinductive inference.Finally,wecombinetheseinferences intwo kindsof reasoning:
• Incase-basedreasoning,analogicgeneralizationaboutcasesis supportedbyabductiveinference(verticaldimensionofFig.3); • Insample-basedreasoning,inductivegeneralizationabout
sam-plesissupportedbyabductiveinference(horizontaldimension ofFig.3).
2.3.1. Inductivegeneralization
Inductivegeneralizationisthegeneralizationfromsamplesto populationsusingstatisticalinference,suchasstatistical hypoth-esis testing or statistical parameterestimation(Hacking, 2001). Sample sizes in drugresearch start at about 30 elements, and increasetohundredsoreventhousandsofelements.Thelargerthe sample,thelargerthepoweroftheexperimenttodiscernsmall effects.
Icallthiskindofinferenceinductive.Theterm“induction”is given different meaning by different people,but here I follow Douven(2011)incallinganinferenceinductiveifitisbasedpurely onstatisticaldata.Inthecontextofthis paper,thismeansthat inductiveinferenceisstatisticalinference,inwhichsampledata areusedtoestimateastatisticalpopulationparameterortotest astatisticalhypothesis aboutapopulationparameter.Inductive inferenceisthehorizontaldimensioninFig.3.
2.3.2. Generalizationbyanalogy
TheverticaldimensionofFig.3isgeneralizationbyanalogyofthe objectofstudy(OoS)tothereal-worldpopulationunitstowhich theresearcherwishestogeneralize.The objectofstudy hasthe structure
(modeloftheartifact)×(modelofthecontext),
andis theentitystudiedbytheresearcher.SeealsoFig.4.The modeloftheartifactisoftenanartifactprototype,andthemodel ofthecontextcanbeasimulatedcontextinthelaboratory.Infield research,themodel ofthe contextis a real-worldcontext that standsasmodelforotherreal-worldcontexts.Thetreatmentand measurementelementsofFig.4willbediscussedlater.
Generalizationbyanalogyreasonsaboutcases.Forexample,if inoneagiledevelopmentprojectperformedforasmallcompany, wehaveobservedthatthecompanylackedtheresourcestoputa customeron-site,wemayinferthatinsimilarcases,asimilarthing mayhappen.Eachgeneralizationbyanalogyreasonsfromoneor moresimilarsourcecasestooneormoresimilartargetcases.
Fig.4.Thestructureofvalidationresearch.
This contrasts with inductive generalization, which reasons fromasampleofcasestothepopulationofallcases.Forexample, ifwehaveobservedthatinarandomsampleof100agileprojects performedforsmallcompanies,90%ofthecompaniesdonotputa customeron-site,wemayestimatefromthisaconfidenceinterval fortheproportionofthepopulationinwhichnocustomerisput on-site.
Toaddmoreillustrationswediscusstheroleofanalogic gener-alizationintheNewDrugDevelopmentprocess.Insynthesisand purification,researchersbuildaprototypeofthedrug,interacting withsomebiochemicalprocesses,thathassufficientsimilaritywith processesinthehumanbodytobeabletodrawsomepreliminary conclusionabouttheeffectofthedruginthehumanbody.InRE research,thiswouldbesimilartohand-testinganewtechnique,or formallyprovingsomepropertiestoshowwhatthetechniquecan doinanidealizedcontext.
Next,drugresearchersexperiment withthedruginanimals, usedasnaturalmodelsofthehumanbody.Aspointedoutearlier, thiswouldbeanalogoustotheuseofnewREtechniquesin stu-dentprojects,whicharethenusedasnaturalmodelsofreal-world projectswithpractitioners.Thereisdetailedresearchinsomefields ofdrugresearchtoassess whichanimal speciesarevalid mod-elsofhumanswithrespecttowhichresearchquestions,andfor whichresearchquestionstheyarenot(Willner,1991).Wefindthe samekindof“similarityresearch”inengineeringresearchtoo.For example,tobeabletoreasonfromobservationsofamodelina windtunneltothebehaviorofairfoilsinrealflight,theremustbea theoryofsimilaritybetweenwind-tunnelmodelsandreal-world flight(Vincenti,1990).Tomyknowledge there islittleresearch inthisareainRE,buttherehasbeensomesimilarityresearchin softwareengineeringthatstudieswithrespecttowhichresearch questionsstudentbehaviorinstudentprojectsissimilaror dissim-ilartothebehaviorofprofessionalsoftwareengineersinsoftware projects(Höstet al., 2000;Runeson, 2003;Sjoberg et al.,2003; Svahnbergetal.,2008).
Intheclinicalphase,drugresearchersstartwithhealthy peo-ple,andthencontinuewitheverlargersamplesofpatients.InRE researchthiswouldcorrespondtousingnewtechnologyfirstin pilotprojectsincompanieswithamatureREprocess,continuing witheverlargersamplesofpilotprojectsincompanieswithlow levelsofREmaturity.
Generalizationbyanalogyalsoincludesreasoningbyextreme cases,inwhichonecaseisknowntobesimilartoothercasesin somerelevantaspects,butextremelydifferentinanotheraspect. Forexample,fromtheobservationsthatanREtechniqueiseasy tounderstandandusebyMaster’sstudentsinsoftware engineer-ing,onemight concludethat itwillalsobeeasy tounderstand andusebyexperiencedsoftwareengineers.Master’sstudentsand softwareprofessionalsaresimilarinsomerespects,buttheyare dissimilarintheextentofexperienceinsoftwareengineeringthat theyhave.Studentsareanextremecasew.r.t.extentofexperience. Thetheoryofsimilarityusedtosupportthisanalogicinferenceis thatincreaseinexperienceofotherwisesimilarsubjects,preserves understandabilityandusabilityofatechnique.
In general,generalization by analogy mustbe supportedby sometheoryofsimilaritybetweentheOoSandallpopulation ele-ments,thatexplainswhyaphenomenonobservedinamodelcould leadtoconclusionsaboutpopulationunits.Whattheoryisneeded, dependsonthequestionasked.Studentsmaybegoodmodelsof practitionerswhenvalidatingeffortestimationtechniquesbutnot when validating multi-stakeholderprioritizationtechniques (cf. theexperimentbyHöstetal.,2000).
2.3.3. Abductiveinference
There is a third kind of inference used in drug validation research,calledabductive,andthatcanbeusedtosupportboth inductiveandanalogicgeneralization.Abductiveinferenceis rea-soningfromobservedphenomenatowhatisconsideredthebest explanation ofthephenomena(Douven, 2011).There aremany kindsofabduction,andhereIdefineonekind,thatIcall mech-anistic abduction, in which observed phenomena are explained intermsofcomponent-basedmechanismsthatproducedthem.I defineacomponent-basedmechanism,inturn,asarepeatable pro-cessinwhichsystemcomponentsinteracttorespondtoastimulus. Thisconceptofmechanismisknowninobject-orientedsoftware engineering, where a UML collaboration diagram represents a mechanismconsistingofsoftwareobjectsthatpasseachother mes-sageswhenrespondingtoastimulus(CookandDaniels,1994).But component-basedmechanismscanoccurinanykindofsystem,as wesawwhenwediscussedtheconceptofmechanismofaction ofadrug.Component-basedmechanismsareusedtoexplain bio-logicalphenomenaintermsoftheinteractionsbetweencellsand chemicalsubstances,ortheinteractionsbetweentheorgansofan organism(BechtelandRichardson,2010;BechtelandAbrahamsen, 2005).In thesocial sciences,component-basedmechanismsare usedtoexplainsocialphenomenaintermsofinteractionsbetween people,organizations,institutionsand othersocial systemsand theircomponents(Bunge,2004;HedströmandYlikoski,2010).
InREtoo,component-basedmechanismscanexplaintheeffects ofanREtechnologyintermsofinteractionsbetweencomponents ofasocial,technical,physical,anddigitalsystems.Ialready men-tionedthemechanismidentified byDamianandChisan (2006), bywhichcross-disciplinarygroupmeetings,andtheirinteraction withotherpartsofthesoftwareengineeringorganization,resulted infewerdefects,less rework,andimprovedeffortestimates.As anotherexample,Seyffetal.(2010)identifiedtwomechanismsthat reducetheuseofaudiorecordinginmobileRE:Participantsfelt uncomfortableiftheyvoicerecordedtheirneedsinapublicplace, andpublicplacesoftencontainedtoomuchbackgroundnoiseto dotherecording.
To sum up, abductive inference is the identification of component-basedmechanismsthatexplaineffects.Theycomplete theprediction
[Artifact×Context]willproduceEffects.
withanexplanationofthemechanismsbywhichtheeffectsare produced.Asindicatedearlier,researcherswillnotalwaysbeable toexplainallmechanismsofinteractionbetweenanartifactand aconcrete,practicalcontext.Totheextentthatlessmechanisms areknown,thereislessconfidencethatstatisticalregularitiesin behaviorarestableunderchangesincontext.
2.3.4. Case-basedandsample-basedreasoning
Adding abductive inferences toanalogic and inductive gen-eralization,respectively,wegetcase-basedreasoning(CBR)and sample-basedreasoning(SBR)(Fig.5).
Case-basedreasoningistheverticaldimensionofourdiagram ofscalingup(Fig.3).Itconsistsoftwosteps,namelyabduction and generalizationbyanalogy.In thefirststep, asinglecase is
Observaons Generalizaons Explanaons CBR: (2) Analogy SBR: (2) Abducon CBR: (1) Abducon SBR: (1) Inducon SBR: (3) Analogy
Fig.5.Reasoningfromobservationstoexplanationsandgeneralizationsin case-basedreasoning(CBR)andinsample-basedreasoning(SBR).
analyzedto identifyan architectureof thecase in terms of its componentsandtheirinteractions,sothatthisarchitecturemay provideanexplanationofobservedeffectsintermsof component-basedmechanisms.Forexample,ifinacaseofagiledevelopment performedbyanindependentdeveloperforasmallcompany,no clientrepresentativeison-site,thenthiscanbeexplainedbythe limitedresourcesofthesmallcompany.Thisisanabductive infer-ence.
Next,inCBRwecangeneralizebyanalogybypostulatingthatin caseswiththesameorasimilararchitecture(independent devel-oper doing agile development for a small company), the same effectswillbeproducedbythesameorsimilarmechanisms.The theoryofsimilaritythatsupportsthegeneralizationhereisthat smallcompanieshavelimitedresourcesandwillprefertotrustthe developerratherthanspendtheirscarceresourcesonagile require-mentsengineering.InCBR,thetheoryofsimilaritythatsupportsthe analogicgeneralizationisstatedintermsofacomponent-based mechanism.
Sample-basedreasoning,thehorizontaldimensionin our dia-gramofscalingup(Fig.3),ismorecomplex.Itconsistsofthreesteps (Fig.5).Fromobservationsofastatisticallymeaningfulsampleof thepopulation(e.g.arandomsampleofatleast30elements),the researcherinfersstatisticallythatthepopulationhassome charac-teristics.Forexample,fromanexperimentwithasampleofstudent projects,theresearchermaybeabletoinferstatisticallythatthere isacorrelationbetweentheuseofsomerequirementsnotation andthequalityofrequirementsspecificationsinthepopulationof studentprojects.Thisinferenceisfallible,becausethesamplemay coincidentallyshowapatternthatisabsentfromalmostallother samplesfromthispopulation.
Second,theresearchermaythenlistallpossiblecausesofsuch acorrelation,andbeabletoarguethatthebestexplanationisthat thatthenotationcausedthedifferenceinquality.Thisisan infer-encetothebestexplanation,i.e.anabduction.Iftheresearchercan explainthepostulatedimpactofnotationonqualitybysome inter-mediatecognitivemechanism,thatispostulatedbyapreviously establishedtheory,thenthisisasecond,mechanisticabduction, thatincreasesthesupportforthefirstone.
Third,theresearchermaywanttogeneralizetheclaimaboutthe populationfurthertosimilarpopulations,byanalogy.For exam-ple,froma statisticalgeneralizationaboutstudentprojects,the researchermaywant togeneralizefurthertothepopulationof allreal-worldprojectswithjuniorsoftwareengineers,andjustify thisgeneralizationbythesimilarityofthearchitectureand mech-anismsofthestudentprojectstothearchitectureandmechanisms ofreal-worldprojectswithjuniorsoftwareengineers.Thisanalogic generalizationtoomaybesupportedbya mechanistic explana-tion,ifthemechanismthatexplainsthephenomenoninstudent projects,canalsoexplainthatphenomenoninprofessional soft-wareengineeringprojects.
Double support for causal claims, in statistical evidence provided by statistical difference-making experiments, and in independently verifiedmechanismsthat can explainthe causal relationshipsinferredfromthestatisticalexperiments,seemsto becommonpracticeinthehealthsciences(RussoandWilliamson, 2007).Thus, sample-based reasoning and case-based reasoning haveausefulsupplementaryrelationship.Afterprovidingsupport foraninductivegeneralizationabouttheeffectofanartifact,the researchermaydosomecasestudies,orsomesingle-case mech-anismexperimentsasdescribedlater,inanattempttofindand understandthemechanismsthatproducesthiseffect.Or,theother wayaround,afterfindingthatamechanismhasproducedaneffect inafewcases,theresearchermaydoastatisticaldifference-making experimenttosupporttheclaimthatthiseffectcanbe general-izedstatisticallytothepopulation.Thus, thetwogeneralization dimensionsinthediagramofscalingup(Fig.3)mustbetraveled together.
2.4. Validityofinferences
Allthreekindsofinferencesdiscussedarefallible,meaningthat theirconclusionscouldbefalseeveniftheirpremisesaretrue.The researchermustthereforespelloutthereasonsthatsupportthe conclusion,andalsosummarizethereasonswhytheconclusions couldbefalseafterall.Thisiscalledadiscussionofvalidityofthe conclusions.Since“validity”suggests“justifiable”oreven“truth”, thistermismisleading.Alessmisleadingtermwouldhavebeen “plausibility”or“support”. However,Iwillsticktotheaccepted terminology.
InTable1wecanseethatthethreekindsofinferences corre-spondtothreewell-knownkindsofvalidity.Conclusionvalidityis thesupportfortheconclusionofastatisticalinference.Threatsto conclusionvalidityincludelowpower,smallsample,non-random sample,non-randomallocation,violationofassumptionsof statis-ticalalgorithms,etc.Notethatevenifconclusionvaliditywouldbe sufficientlywellargued,itstillpossiblethattheexperimentisone ofthe5%experimentsthatshowsastatisticallysignificant differ-encebychance,i.e.withouttherebeingamechanismthatproduces thedifference.
Internal validity is thesupport for an explanation of a phe-nomenonbycausalmechanismsthatproducedthephenomenon.A majorthreattointernalvalidityisthatoutcomesofanexperiment maynotonlybeexplainedbyamechanismthatleadsfrom treat-menttooutcome,butbyothermechanismstoo.Forexample,ifthe OoScontainspeople,thenhistory,maturation,andattritionmay influencetheoutcome,inadditiontotheinfluenceofinstruments, tests,theexperimenter,semanticambiguities,etc.inthe experi-ment(Shadishetal.,2002,page54ff.).Forthereaderofaresearch reporttoassessthesupportfortheabductiveinferencethatthe observedoutcomeisproducedbysomemechanisms,alternative explanationsmustbelistedexplicitly.
ExternalvalidityappearsintwoflavorsinFig.5:incase-based reasoningandinsample-basedreasoning.Incase-basedreasoning, externalvalidity isthevalidityoftheanalogicinferencefroma single-caseexplanationtoallsimilarcases.Forexample,a mecha-nismobservedin[(artifactprototype)×(simulatedcontext)]inthe laboratoryisgeneralizedtoall[artifact×context]casesinthereal world.Insample-basedreasoning,externalvalidityisthevalidityof theanalogyofonepopulationtoanotherpopulation.Forexample, aconclusionaboutthepopulationofstudentprojectsis general-izedtoaconclusionaboutthepopulationofreal-worldprojects.In bothflavors,externalvalidityisthevalidityoftheinferencefrom thestudiedOoStoallsimilarcasesintherealworld.Asobserved byGigerenzer(Gigerenzer,1984),determiningexternalvalidityis anempiricalquestion.Ifconclusionsfromanexperimentin con-textAaregeneralized,fallibly,tocontextB,thenonecantestthis
Table1
Thethreekindsofinferenceandsomevalidityconsiderations.
Inductiveinference Estimationofapopulationparameter,ordecision aboutastatisticalhypothesisaboutthepopulation, basedonobservationsofasample.
Conclusionvalidity Aretheassumptionsofthestatisticalalgorithmssatisfied? Randomsample?Homogeneoussample?Random allocation?Statisticalpowerandeffectsize?Reliable measures?Logicalerrorsinreasoningfromsample statisticstopopulationhypotheses?Etc. Abductiveinference Explainingaphenomenonbyidentifyingthecausal
mechanismsthatproducedit.
Internalvalidity Aretherealternativeexplanations?Isthereacommon causethatcouldexplainthephenomena?Canthecontext oftheexperiment,thebehavioroftheexperimenter,or phenomenainthesampleofsubjectexplaintheoutcome oftheexperiment?Etc.
Analogicinference Concludingthatatargetwillhavethesameproperties asasource(theexperiment)becauseofsome similaritybetweenthem.
Externalvalidity Isthereatheoryofsimilarity?Doesthetheoryofsimilarity justifytheconclusions?Arethemechanismsinthetarget thesameasthoseinthesource?Arethereother mechanismsthatcouldinterferewiththemechanismof interest?Istheeffectcontext-sensitive?Etc.
generalizationbyrepeatingtheexperimentincontextB.Thisisin
factwhatisdonewhenscalingupfromthelabtotherealworld.
Threatstoexternalvalidityaresensitivityoftheeffectsofan
artifacttothecontextinwhichitisused,dissimilarityofthe
treat-mentusedinthelabtotreatmentsusedinpractice,interference
ofothermechanismswiththemechanismofinterest,absenceof
a theory of similaritythat couldjustify thegeneralization, etc.
Shadishetal.(2002,pages86ff.)andWohlinetal.(2012,page 110)providedetaileddiscussions.
3. Methodsforvalidationresearch
We will discuss theempirical validation methods usingthe structureofFig.4.Wehaveusedthisstructureearliertomakea checklistforempiricalresearchreports(Wieringaetal.,2012).The researcherusesanobjectofstudy(OoS)torepresentelementsof thepopulation,whereinourcasethepopulationelementshavethe structure[artifact×context].Therefore,theOoShasthisstructure too,consistingofamodeloftheartifactandamodelofthecontext. TheOoSisamodelofanarbitrarypopulationelementinthesense thatitissimilartopopulationelements,andcanbestudiedbythe researchertolearnsomethingaboutpopulationelements(Apostel, 1961).AnexamplewouldbeanOoSthatconsistsofaprototypeofa softwareproduct,interactingwithasimulationofaproblem con-text;oranREtechnique(theartifact)interactingwithastudent project(thecontext).
Instatisticalresearch,theresearcherstudiesasampleofOoS’sof statisticallymeaningfulsize.Incaseresearch,theresearcher stud-iesasmallsampleorevenasingleOoS.
Inexperimentalresearch,theresearcherappliesa treatment to an OoS and then measures what happens. In observational research,theresearchermeasurewhathappens,butdoesnotapply a treatment.Measurement aswellastreatmentusuallyrequire instruments.
InthediagramofFig.4,allinteractionsarebidirectional:One cannottreatanOoSwithouttheOoSexertingsomeinfluenceon thetreatmentinstrument,andonecannotmeasureanOoSwithout exertingsomeinfluenceontheOoS.
Theconceptoftreatmentneedssomeexplanation.Sofarwe havetakenacomponent-basedviewoftheworld,inwhichtheworld ismodeledasahierarchyofsystems,thatcontainsubsystems,that containsub-subsystems,etc.Thus,thepopulationconsistsof arti-factsinteractingwitha context,andresearchhasa structureof componentsasshowninFig.4.Inthis view,a treatmentisthe insertionofacomponentinacontext.Forexample,adoctortreats apatient(thecontext)bygiventhemadrug(theartifact),anda consultanthelpsasoftwareengineeringorganization(thecontext) byinsertinganimprovedREtechnique(theartifact).
Notethattheartifactnotonlyconsistsofaproduct,forexample adrugoranREtool,butalsoofaprocess,forexamplethe proto-colfortakingthedrugortheprocedurebywhichtousethetool. Theexperimentaltreatmentthenconsistsofmakingthisproduct availableandgivinganinstructionintheprocess.
Wecandescribethesameexperimentalsituationalsointhe moretraditionalview,inwhichatreatmentissettingthelevelof anindependentvariable.Thisisamore abstract,variable-based viewofexperiments,thatwillbeconvenienttouseinSection3.4 onstatisticalexperiments.Untilthen,itismoreilluminatingifwe usethecomponent-basedviewofFig.4.Table2listsfourgroupsof validationresearchmethodsthatwewilldiscussinthefollowing sections.
3.1. Expertopinion
Intheconceptualstageofvalidation,beforetheartifactistested onmodelsorinthefield,theresearchercanelicittheopinionof expertsaboutthepossibleusabilityandusefulnessoftheartifact. Thisisobservationalempirical research,becausetheresearcher doesnotinterveneinanobjectofstudy.Theresearcherelicits opin-ions.Itisalsonotastatisticalsurveywiththeaimtoestimatethe distributionofopinionsintheentirepopulationofexperts.Rather, itisanattempttogetearlyinformationaboutexpectedusability
Table2
Validationresearchmethods.
Methods Examples
Researchingexpertopinion •Elicitingexpertopinionusing interviews,
•Questionnaires,or •Focusgroups Single-casemechanism
experiments
•Testinganartifactprototype onasimpleexampleinthelab
•Testinganartifactprototypeonarealistic exampleinthelab
•Testinganartifactprototypeonarealistic exampleinthefield
Technicalactionresearch •Usinganartifactprototypeto helpaclient
•Teachingtheuseofanartifactprototypetoa clientbywhichtheycansolvesomeoftheir problems
Statisticaldifference-making experiments
•Comparingtheeffectof prototypesoftwoormore artifactsonasampleof simulatedcontextsinthelab
•Comparingtheeffectofprototypesoftwoor moreartifactsonasampleofcontextsinthe field
andusefulnessoftheartifactinreal-worldcontexts.Thus, statisti-callymeaningfulsamplesizesarenotneeded;usefulopinionsare needed.
IntermsofFig.4,thepopulationisnotthesetofallpossible expertsbutthesetofallpossible[artifact×context]elements.So theobjectofstudyisnottheexperteither.Rather,theexpertisan instrumenttomeasureanimaginaryOoS,namelyamentalimage thattheexperthasformedofreal-world[artifact×context] ele-ments.Thisisanunreliableinstrument,butonethatnevertheless cangiveusefulinformation.
Positiveor uncriticalopinionsofexpertsarenotveryuseful, becauseexpertsmaybemotivatedbythedesiretofinishthe inter-viewquickly,ortobenicetotheresearcher.Negativeorcritical opinionsontheotherhandareveryuseful,especiallyiftheexpert canindicatewhichelementof theartifactdesignwould notbe usableorusefulinwhichcontext,andwhy.
Example 1. Al-Emran et al. (2010) present an optimization methodforproductrelease planning.Inputtotheoptimization methodisasetofproductreleaseplans,consistingofasequence offeaturestobeimplementedinsubsequentreleasesofaproduct. Theoptimizationmethodthenfindstheplanthatismostrobust,in termsoftime-to-market,resourceassignment,andtaskschedule, withrespecttodifferencesintaskworkloadanddeveloper produc-tivity.Thatis,itselectsthereleasestrategythatisleastinfluenced bydifferencesinworkloadandproductivity.
Theresearcherstestedtheoptimizationmethodamong oth-ersbysubmittingittoexperts,askingtheiropinionaboutit.The researcherssentaquestionnaireaboutthemethodto25product developmentexpertsandreceived13responses.Manyresponses wereuninformativeinthattherespondentsthoughtthemethod wasusable and useful.Some respondents, though, complained thatusingthisoptimizationdecreasedtheirunderstandingofthe releaseplanningprocess,andotherscomplainedthattheyrequired morejustificationoftheresultbeforetheywouldadoptthe recom-mendation.Theseremarkspointatpotentialimprovementneeds ofthemethod.
Collectingexpertopinioncombinesthetwodimensionsof scal-ingup.Expertsimagineasampleofcases(informalsample-based reasoning)andimaginewhatmechanismswouldoccurineachof thosecases(informalcase-basedreasoning).Becauseofthe infor-mality of theirreasoning, theiropinions must be treated with caution,butneverthelesstheymustbetreatedseriously.
3.2. Single-casemechanismexperiments
Iuse the termsingle-case mechanism experimentto indicate experimentsinwhichtheresearcherinvestigatesoneOoSinorder totesttheeffectofsomemechanismthattheresearcherbelieves tobepresentintheOoS.Softwareengineersdothiswhenthey testasoftwareprototypebyfeedingitinputscenariosthat repre-sentpossiblescenariosintheintendedcontextofuse.Aeronautical engineersdoitwhentheytestanairfoilinawindtunnel.
IgivesomeexamplesfromREresearchbeforeIdiscussthelogic ofsingle-casemechanismexperiments.
Itisnotmypurposeheretojudgethequalityoftheanalogic generalizationsorofexplanationsgivenbyauthors,butmerelyto illustratewhattheroleofanalogyandofmechanisticexplanations invalidationexperimentsis.
Example2. Gacituaetal.(2011)proposeanewalgorithmforthe identificationofsingle-and multi-wordabstractionsin require-ments documents, and describe an experiment in which they comparetheperformanceofthisalgorithmwithhumanjudgment. Thealgorithmcomparesthefrequencyofaterminadocument withitsfrequencyinareferencedocumentofthelanguageusedin
therequirementsdocument,suchasacorpusofstandardEnglish. Termsthatarerareinthereferencedocument,butoccurfrequently intherequirementsdocumentarelikelytoindicateimportant con-ceptsinthedomain.
Totestthisalgorithm,theauthorsselectedabookonatechnical domain,anduseditsbodyasifitwerearequirementsdocument toanalyze.Theindexofthebookwasusedasareferencelistof domainconcepts.Thus,theartifacttobetestedisthealgorithm, andthebookanditsindexisthecontext;bothmakeuptheOoS. Thetreatmentinthisexperimentistherequesttoidentify abstrac-tionsinabook.Themeasurementisthemeasurementofrecalland precisionwithrespecttotheindexterms.
Concerningexternalvalidity,theauthorsarguethatthebook’s domainissimilartothedomainofREdocuments,thatthesizeof thedocumentissimilartodocumentsinREprojects,andthatthe conceptabstractionscenariosimilartothatofarequirements engi-neerwhohastofamiliarizeherselfwithanewdomain.Also,they arguethatthehierarchicalstructureoftheindexisrepresentative ofthestructureofmulti-wordtermsintheintendedpopulation.
Internalvalidityisthequestionwhetherthemechanismsbuilt intotheartifactexplaintheobservedeffects.Inthisexperiment, thefrequency-basedmechanismyieldedlowrecallandprecision. Theauthors’explanationisthattheidentificationofabstractions bypeopledoesnottakeplacebyafrequency-basedmethod,and thatfrequencyingeneralisnotasufficientlypowerfulmechanism toidentifyabstractions.
Example3. Seyffetal.(2010)testedatoolformobile require-mentsengineeringinthefield.Theygavemobilephonesrunning thetool tonine subjects,whoused itfor a few daystogather requirementsfor a systemthat supports dailycommuting, and requirementsforasystemthatsupportsshoppingactivities.The requirementswerestatedintextoraudio.Aftertheexperiment, subjects were debriefed, and researchers transcribed recorded needsintosystemrequirements.
Inthisexample,theOoSconsistsofanartifactprototype, inter-actingwitharealisticcontext.Thecontextconsistsofthemobile phoneonwhichtheprototyperuns,theusersusingtheprototype, andtheenvironmentinwhichtheusersmove.Thetreatment con-sistsoftheinstructions totheuserstousethetoolfor twoRE purposes.Thetreatmentinstrumentistheinstructionsessionin whichtheuserswereinstructed.Themeasurementsconsistofthe data(textoraudio)enteredbytheuseraswellastheanswersof theuserstoresearchers’questionsinthedebriefingsession.
ThesimilaritybetweentheseOoS’sand theenvisaged popu-lationoffuturereal-worldmobileREprocessesisthreatenedby potentialdifferencesbetweenfuturetoolsandtheoneusedinthis experiment,andpossiblyalsobydifferencesinelicitationmethods. RememberthattheartifactinthisexampleconsistsofamobileRE toolplustheprocessforusingit.
Theresearchersmadeamechanism(amobileREtool)available touserstotestifitproducedtheexpectedeffects(recorded contex-tualend-userneeds).Themechanismhadtheexpectedeffectinall nineinvestigatedcases.Athreattothevalidityofthisobservation isthatthesubjectsmayhavewantedtobenicetotheresearchers, whichwouldbeafactorco-producingtheexpectedeffect.Other users,withoutafriendlydispositiontotheresearchers,mayhave failedtoproducetheeffectswheninteractingwiththetool.
TheseexperimentsallowanalogicinferencesfromtheOoSto thepopulation,andcanbeplacedalongtheverticaldimensionof ourdiagramofscalingup(Fig.3).Aswesaw,analogic generaliza-tionsmustbesupportedbya theoryofsimilarity,thatexplains whyanobservationonthesourceoftheanalogycanleadtoa con-clusionaboutthetargetoftheanalogy.Inthecomponent-based viewof the worldthat we take, the similaritybetween source andtargetofananalogymustbearchitectural,andthetheoryof
similaritymustindicatesomecomponent-basedmechanismthat producedaresponseintheexperiment,andcanproducea simi-larresponseinthereal-worldcasesofthepopulation.Hencethe name“mechanism-basedexperiments”.Theseexperimentsdonot usestatisticalinferencetosupportclaimsaboutthepopulation,but theyuseatheoryaboutmechanismstosupportclaimsaboutthe population.
Wedistinguishmechanismsintheartifactfrommechanismsin thecontext.
• Theartifactisbydesignacollectionofcomponent-based mech-anismsthatrespondstoinput.Partofsoftwaretestingconsistof validationwhetherthesemechanisms,ifimplementedcorrectly, indeedhavethedesiredeffects. Thismayleadtosurprises,in thesensethatunexpectedphenomenamayturnup(e.g.bugs) thataretheresultsofunexpectedmechanismsinan implementa-tion.Ingeneral,inalgorithmvalidation,theremaybeunexpected mechanismsintheprogrambecauseourabilitytoprogrammay exceedourabilitytounderstandwhatweprogrammed.
Thisislesslikelytohappenwhentestingamethod. Step-by-stepmethodssuchastheRationalUnifiedProcessbuilduptheir resultsina simplemanner,byinstructionsoftheform“bring aboutresultX”,which,ifperformedcorrectly,leadstothe cre-ation of resultX. If these methods areperformed bycapable softwareengineersin anidealcontext,weusually donotrun againstunexpectedmechanismsinthemethoditself,because methodsarerelativelysimplestep-by-stepprocedures. • Onceamethodhasbeenshowntobeusablebytheresearcherand
hisorherstudents,theimportantresearchquestioniswhether itstillworksunderconditionsofpractice,i.e.intherealworld. Inreal-worldcontexts,theremaybecomponentsormechanisms thatimpacttheproductionofthedesiredresultofamethodstep inunexpectedways.Anexampleofanunexpectedmechanism inmobileREisthetendencyofuserstobeverybriefinthe tex-tualspecificationoftheirneedswhentheywereinaphysically confinedspace,andenterexplanationsbyaudiolater.Thismay makeneedsanalysismoretimeconsuming,whichinturnmay reducethetimelinessandcost-effectivenessoftherequirements specification.Theresearchersmayusethisinformationtochange themethod.
3.3. Technicalactionresearch
Technical action research (TAR) is a case-based mechanism experimenttoo,butIlistitseparatelybecauseitisalsosomething else:Itisareal-worldconsultancyproject(WieringaandMorali, 2012).InaTARprojecttheresearcherusesanartifactina real-worldprojecttohelpaclient,orgivestheartifacttootherstouse theminareal-worldproject(EngelsmanandWieringa,2012),and usesthisexperiencetolearnabouttherobustnessoftheintended effectsandthemechanismsthatbringthemabout,inuncontrolled conditionsofpractice.
Example4. MoraliandWieringa(2010)describea methodto assessconfidentialityriskswhenoutsourcingthemanagementofIT systems.TheythendescribehowMoraliusedthismethodto actu-allyassesstheconfidentialityrisksintheoutsourcingrelationship betweenalargemanufacturingcompanyandalargeoutsourcing serviceprovider.
Inthisexample,theartifactisanewriskassessmentmethod, andthecontextconsistsofMoraliapplyingthismethodtoarisk assessmentprobleminalargecompany.Moraliplayedadualrole asresearchergiving aninstructionhowtouseanartifacttoan OoSinwhich sheherselfwastheuseroftheartifact.The mea-surementstakenconsistedofallintermediateworkingdocuments
oftheproject,plusthediaryofMoraliinherroleasuserofthe method.
ThesimilarityofthisTARprojecttothepopulationofallsuchrisk assessmentprojectsisthataconfidentialityriskinanIT manage-mentoutsourcingsituationisassessed.Thereisalsoadissimilarity, whichisthatinmostprojectsinthispopulation,Moraliwillnotbe theonedoingtheriskassessment.Thisisathreattoexternal valid-itythatmustbedealtwithbyrepeatingTARprojectslikethiswith otherresearchers.
Internal validityis thequestionwhetherthemethodindeed delivereditsexpectedresults,andwhetheranymechanismsinthe contextinfluencedthis.Themethoddiddeliveritsexpectedresults, butonlyrepeatedTARprojectscanshowwhetherornotthisisthe duetothemethodonlyoralsototheuserofthemethod(Morali), thequalityofthedocumentationavailableinthecompany,etc.
TARisaspecialkindofmechanism-basedexperiment,andin theprocessofscalinguptheytaketheresearcherclosertothereal worldintheverticaldimensionoftheprocessofscalingup(Fig.3). TheinferencesinTARareofthesamekindasthoseinother case-basedmechanismexperiments.Eventsinthecaseareexplained in termsof mechanisms,andany generalizationtothe popula-tion issupportedby a theorythat says thatthese mechanisms canoccurinpopulationelementstoo.However,generalizations fromTARprojectshaveanadditionalthreattovalidity,becausethe researchermayhavecontributedpositivelytotheobservedevents inawaythatcannotbereplicated.
TARisusefulasafinalvalidationstagebeforetransferringa tech-nologytopractice,becauseitisclosertoreal-worldpracticethan othercase-basedmechanism experiments.A singleTARproject isnotenoughtojustifytheclaimthatanartifactisapplicablein theentiretargetpopulationofpossibleprojects.Butitdoesjustify theclaimthattheartifactisusableandusefulinsomereal-world projects,anditcanprovideusefulinformationtotheresearcherfor furtherimprovingtheartifact.
3.4. Statisticaldifference-makingexperiments
AstatisticalexperimentisanexperimentwithasampleofOoS’s toinfera statisticalproperty ofthepopulation. For example,it mayestimatethepopulationmeanofavariable,withaconfidence interval,fromobservationsofthesamplemean.Oritmaytesta statisticalhypothesisaboutthepopulationmeanbyobservations fromasample.
Incontrasttocase-basedmechanismexperiments,thesample sizeisrelevant,becauseasindicatedinFig.5,inferenceis sample-based,notcase-based.Statisticalexperimentssupportinductive inferencesaboutsamples,which isthehorizontaldimensionof scalingup(Fig.3).Theydonotrequireatheoryof mechanisms togeneralizeinductivelytoapopulation,butaswehaveseenin Section2.3,providingsuchatheorydoesgiveadditionalsupport toaninductivegeneralization,becauseitwoulddecreasethe like-lihoodthattheinductivegeneralizationisbasedonacoincidental patterninthedata.
Todescribestatisticalexperimentsweneedtoswitchfromthe component-basedviewthatwehavetakenuptillnow,inwhichthe worldconsistsofcomponentsandinteractions,toavariable-based viewoftheworld,inwhichtheworldconsistsofvariablesand rela-tionships.Anydescriptionoftheworldinterms ofcomponents andinteractionscanbereplacedbyamoreabstractdescriptionin termsofvariablesandrelationships.Inthisvariable-basedview,a treatmentconsistsofsettingthevalueofanindependentvariable, andeffectsaremeasuredbymeasuringthevaluesofdependent variables.Iftherearetwotreatments,oftenoneiscalledthe “treat-ment” and the other the “control”, dividing the sample into a treatmentgroupandacontrolgroup.
Inductiveinferencefromsampletopopulationcantakeplacein avarietyofways,dependingonhowOoS’swereselected (samp-ling)andhowtreatmentswereallocatedtoOoS’s.Inarandomized controlledtrial(RCT),thesampleisrandomandtheallocationof treatmentstosampleelementsisrandomtoo(Shadishetal.,2002; Sedgwick,2011).Thismakesit possibletousethecentrallimit theoremtosupporttheinductiveinferencefromsampleto pop-ulation.Therearetwowaystodothis,byhypothesistestingandby confidenceintervals(Hacking, 2001;WonnacottandWonnacott, 1990).Instatisticalhypothesistestingtheexperimentermayobserve adifferenceinthesample,thatwouldbeveryunlikely(probability lessthan5%)tooccurifadifferencedidnotexistinthe popula-tion.Theresearcherwilltheninferthat,plausibly,thedifference existsinthepopulation.Intheestimationofconfidenceintervals, theexperimentermayestimateapopulationdifferencebythe sam-pledifferenceusinga95%confidenceintervalaroundthesample mean.Thisestimationmayberightorwrong,butifshefollowsthis estimationrulealways,shewillbewronginthelongruninonly about5%oftheinferences(Hacking,2001).
Havinginductively(andfallibly)inferredthatthereisa statis-ticalcorrelationbetweenindependentanddependentvariablein thepopulation,theexperimentertriestoabduceacausal explana-tionofthisdifference,anddoesthisbytryingtoexcludeanyother possiblecauseotherthanthedifferencebetweentreatments. Ran-domsamplingandrandomallocationcanonlyintroducechance fluctuationsthatdisappearontheaverageinthelongrun.Butafter allocation,treatmentsmustbeapplied,andoutcomesmeasured, andsotheexperimentermustalsocheckwhetherapplicationor measurement,oranyother eventduringtheexperiment,could havecontributedtothemeasureddifference.Thisisallpartofthe discussionofinternalvaliditysummarizedinTable2.
If all these alternative causes are excluded, the difference betweentreatmentsistheonlyremainingpossiblecauseofthe observeddifferenceindependentvariables.Thisisanabductive inference.Notethatrandomsamplingandallocationisusedboth intheinductiveinferencestep,whereitfacilitatesapplicationofthe centrallimittheorem,andintheabductiveinferencestep,where itfacilitatestheexclusionofothercausesthanthetreatment.
Random sampling is difficult to achieve in practice, so we findmany quasi-experiments in software engineering and else-where(Kampenesetal.,2009;Shadishetal.,2002;Sjobergetal., 2005),inwhichsamplingisnotrandomorallocationoftreatments toelementsisnotrandom.Forexample,subjectsmayself-select intotreatmentorcontrolgroups,ortheresearchermayallocate treatmentstoelementsaccordingtoapropertyoftheelements. Quasi-experimentscannotusethemathematicaltechniquesbased onthecentrallimittheoremfortheirstatisticalinference,butthere areotherreasoningtechniquesthatcanbeusedforstatistical infer-enceinquasi-experiments(Shadishetal.,2002).
RCTsandquasi-experimentsbothtakeaso-called difference-makingviewoncausality,whichiswhyIcallthemhere difference-makingexperiments.Inthisview,variableXhasacausalinfluence onvariableYifXmakesadifferencetoY.Thatis,ifXhadadifferent value,withallotherthingsbeingequal,thenthevalueofYwould bedifferentaswell(Holland,1986;Woodward,2003).
Forexample,supposeinanRCT,asampleofprojectsusing pro-grammingmethodAperformedbetterontheaveragethanasample ofprojectssolvingthesameproblemsusingmethodB,andsuppose thatthisdifferenceisstatisticallysignificant,i.e.itisunlikelyto beobservedinasampleifitwouldnotexistinthepopulation.So itisunlikelythatthedifferenceistheresultofchancealone.So theresearcherisjustifiedtolookforacause.Theremaybemany causesforthedifference,includingtheavailableresourcestothe projects,thecompetenceofprojectpersonnel,andthedifference betweenmethodsAandB.Iftheresearchercanruleoutallcauses otherthan thedifferencebetweenAand B, then thestatistical
differencesupportstheclaimthatthedifferencebetweenmethods AandBisthecauseofthedifferenceinprojectperformance.
Example5. Precheltetal.(2002)describeanexperimentto com-parethedifferencebetweenmaintenancetasksdoneonprograms wheredesignpatternsweredescribedincommentlines,and main-tenancetasksdoneonprogramswheredesignpatternswerenot commented.Theprogramswereidenticalexceptforthepresence ofso-calledPatternCommentLines(PCLs).
Theartifact is here thepresence of PCLs and thecontext is theprogram,maintenancetask,andmaintainer.Thesubjects self-selectedintothesamples,which makesit hardtoknow which hypotheticalpopulationtheyarearandomsampleof,butwewill assumethatitconsistsofcomputersciencestudentsperforming maintenancetasksonprogramsofsimilarsizeandcomplexityas thoseusedintheexperiment.Thetreatmentistheinstructionto performmaintenancetasks.Treatmentswereallocatedrandomly tosubjects.Themeasuredvariablesweretaskcompletiontimeand correctnessofresult.
Theresearchersfoundaslightimprovementoftasktimeand resultcorrectnesswhenPCLswerepresent,thatwasstatistically significant.Thismeansthat there is a lowprobability that this observationwouldbemadeinthesample,ifthis improvement wouldbeabsent fromthepopulation.Thissupports the induc-tivegeneralizationthatanimprovementexistsinthepopulation ofallprogramsofthesamesizeandcomplexitybeingmaintained bystudents.Theauthorsdiscusspossiblecausesforthis improve-mentotherthanthepresenceofPCLs,andconcludethatthereis noevidencethatthereareothercauses(Precheltetal.,2002,page 599,Threatstointernalvalidity).
Theyadditionallyidentifyacognitivemechanismthatcouldbe responsibleforthiscausalrelationship(abductiveinference,Fig.5). Thismechanismispostulatedbyatheory,formulatedbyseveral researchersearlier,thatprogramcomprehensionworksbythe for-mationand validation ofhypotheses, of whichthe efficiencyis greatlyenhancedbybeacons,whicharehintsaboutfamiliarkinds ofstructures(Precheltetal.,2002,page596).PCLsaresuchbeacons. ThiscouldexplainthecausalinfluenceofPCLsmaintainability.It increasesthesupportfortheclaimthattheobservedimprovement issystematicratherthanacoincidentalevent.
Theauthorsarereluctanttogeneralize,byanalogy,fromthe populationofexperimentalmaintenancesituationssimilartothis experiment,tothepopulationofrealmaintenancetasks(Prechelt et al., 2002, page 599). However, they do reason that, if PCLs hadanimprovementeffectforrelativelysmallwell-commented programs, they might have an even better effect on large ill-commented programs(Prechelt et al., 2002, page 604). This is reasoningbyanalogy.
Inaninterestingaside,theauthorsobservethatexperiments comparingdifferentsyntacticformstoexpressthesamemeaning allhavethemethodologicalproblemthatthetwoformsrarelyhave theexactsamemeaning.Theauthorsgivegeneraladviceabouta methodologicallysoundsetupofsuchexperiments.Thisisa case-basedreasoningbyanalogy(Fig.5),inwhichtheirexperimentis anexampleforother,similarexperiments.
Statistical difference-making experiments support reasoning alongthehorizontaldimensionofourdiagramofscalingup(Fig.3). We see in this example first an inductive inference and then twoabductions.First,thestatisticalcorrelationbetweentwo vari-ables is inductively inferred to exist in a population, based on observationsinthesample.Thisisinductiveinference.Next,itis arguedthatthisstatisticalcorrelationbetweenindependentand dependentvariableisacausalrelationshipfromindependentto dependentvariable,byrulingoutallotherpossiblecausesthan thedifferenceintreatments(abduction1).Third,thiscausal rela-tionshipwasexplainedbyacognitivemechanismpostulatedby
apreviouslyestablishedtheory(abduction2).Thisincreases con-fidenceinthecausalconclusionsthattheauthorsdrewfromthe experiment.
4. Relatedwork
Thereasoningschema“[Artifact×Context]→Effectby Mech-anisms”hasbeen proposedin slightly differentforms in social science(PawsonandTilley,1997)andinmanagementscience(Van Aken,2004).Ithassomesimilaritywiththesatisfactionargument asproposedbyJacksonin softwareengineering(Jackson,2000). Wieringa(2003)callsitthesystemsengineeringargument,because itshowshowacomponentmustinteractwithothercomponents toproducedesiredbehaviorofacompositesystem.Itissimpler thanthestructurefordesigntheoriesproposedbyGregorandJones (2007). More discussionis provided elsewhere(Wieringaet al., 2011).
Douven(2011)givesaconvenientintroductiontoabductive rea-soning,alsocalled“reasoningtothebestexplanation”.Mechanistic abductionissimilartotheoreticalmodelabductionasdiscussedby Schurz(2008).
Theconceptofmechanismhasbeenproposedbyphilosophers whoanalyzedthestructureofexplanationinthephysical, biolog-ical,andsocialsciences(Glennan,1996;Machameretal.,2000).It hasbeenadoptedasanexplanatoryconstructinbiology(Bechtel and Richardson, 2010; Bechtel and Abrahamsen, 2005) and in thesocial sciences(Bunge,2004; Hedström and Ylikoski, 2010; Elster,1989).Alloftheseauthorshaveslightlydifferingconcepts ofmechanism.IllariandWilliamsonpresentasurveyand unifica-tion(McKayIllariandWilliamson,2012),whichisverysimilarto theconceptthatIhaveusedhere.
ThereisahugeliteratureoncausalityandIcannotevenbegin tocitetherelevantliteraturehere.Therearetwoviews,onethat causalityisdifference-making,theotherthatacausalrelationship isamechanism,andwithineachviewthereareseveralpointsof view.Forexample,theBayesiantheoryofPearlisanexampleof adifference-makingview,describedina book(Pearl,2009)and summarizedinapaper(Pearl,2009b).Holland(1986)isanolder, exceptionallyclearexpositionofthedifference-makingview, stay-ingwithintheframeworkoffrequency-basedstatistics.Williamson (2011)surveys somemechanistic theories,and comparesthem withdifference-makingviews.
Generalization by analogy as discussed here is one form of analytic induction, propagated by Yin as theway to generalize fromcases(Yin,2003),butactuallyoriginatingfromthe sociolo-gistZnaniecki(1968).Theclearestandmostaccessibledescription ofanalyticinductionisgivenbyRobinson(1951):Theresearcher (1) roughly defines a class ofphenomena and (2) formulates a hypothesisaboutamechanismthatispostulatedtooccurinthese phenomena.This is ourtheoryof similarity,and we have only consideredthecasewherethetheorydescribesastructureof inter-actingcomponentsthat implementamechanism.Next,asingle casethatsatisfiesthedefinitionisinvestigated.Ifobservations fal-sifythehypothesis,theneitherthedefinitionisrefinedtoexclude thecaseathand,orthehypothesisisreformulatedtomatchthe observations.Afterinvestigatinganumberofcases,thedefinition andhypothesismayreachastablestate.Theresearcherthen gener-alizesbyclaimingthatallsimilarcasescontainsimilarmechanisms, whichwillproduce similareffects.Thiskindofcase-based rea-soningmovesusupwardinthediagramofgeneralization(Fig.3). Znanieckilistsafewhistoricalexamplesfrombiology,physicsand sociologywherethiskindofreasoningwasfollowed (Znaniecki, 1968,pages236–237).
Zelkowitzand Wallace(1998)presented asurvey of empiri-calvalidationmethodsinsoftwareengineeringthatIcomparein Table3withthelistinTable2.Intheterminologyofthispaper,their
Table3
ValidationmethodsidentifiedbyZelkowitzandWallace(1998)andbyGlassetal. (2001).
Thispaper Zelkowitzand
Wallace(1998)
Glassetal.(2001)
Validationresearchmethods
•Expertopinion •Single-casemechanism
experiment
•Simulation •Fieldexperiment •Dynamicanalysis •Laboratoryexperiment–
Software •Simulation •Technicalaction
research
•Casestudy •Actionresearch •Statistical difference-making experiment •Replicated experiment •Fieldexperiment •Synthetic environment experiment •Laboratoryexperiment– humansubjects
Otherresearchmethods
•Observationalcase study
•Casestudy •Casestudy •Fieldstudy •Fieldstudy
•Meta-researchmethod •Literaturesearch •Literaturereview/analysis
Measurementmethods
•Methodstocollectdata •Project monitoring •Ethnography •Legacydata •Lessonslearned Inferencetechniques •Techniquestoinfer informationfromdata
•Staticanalysis •Dataanalysis •Groundedtheory •Hermeneutics •Protocolanalysis
listcontainsvalidationmethodsbutsomeotherkindsofmethods too.
Theirassertionmethodhasbeenomittedbecause,astheyalso pointout,itisnotaresearchmethod.It isanexperimentaluse ofanewtechnologybythedeveloperinthelaboratory. Simula-tionis executinga productin a simulatedenvironment. Thisis a single-casemechanism experiment becausetheproductisan implemented mechanism to be tested. Dynamic analysis is the executionofaproductundercontrolledconditions,similarto sim-ulationbutnotaimedatsimulatingreal-worldenvironments.Itis asingle-casemechanismexperimenttoo.
Acasestudycouldbetheuseofanewtechnologyinan indus-trialproject(ZelkowitzandWallace,1998,page26),inwhichcase weclassifyitasatechnicalactionresearchproject,oran obser-vationalstudyofaproject(ZelkowitzandWallace,1998,page25), inwhichcaseweclassifyitasanobservationalcasestudy.Case studiesthereforeappeartwiceinTable2.
Replicatedexperiments and syntheticenvironment experiments arestatisticalcomparisonofgroupsofprojects,wherein differ-entgroups,a taskisperformeddifferently. Thesearestatistical difference-makingexperiments,performedinthefield orinthe lab.
The difference between observational case studies and field studiesdefinedbyZelkowitzandWallace(1998,page26)isthat acasestudyisintrusivewhereafieldstudyisnot.Theyareboth classifiedasobservationalcasestudiesintheterminologyofthis paper, because in both, the research method is observational and influence oftheresearcher on theobjectof study isto be minimized.Observationalstudiesare,intheterminologyofthis
paper,suitableasmethodsforevaluationstudiesofimplemented technology, but not as research methods for validating new technologynotyettransferredtothemarket.
Literaturesearchispartofanyresearchbutmaybeexpandedinto afull-blownresearchmethod,alsocalledasystematicliterature review(Kitchenham,2004).
Projectmonitoringisthecollection,bytheresearcher,ofdata producedduringaprojectandlegacydataisthecollectionof doc-umentssuchassourcecode,specifications,and testplansafter theprojectisfinished.Forafull-blownresearchmethod,weneed adesignofthewaytheresearcher willinteractwiththeobject ofstudy,includingmeasurementmethodsandanyexperimental intervention,andinferencedesign.Intheterminologyofthispaper, projectmonitoringandlegacydataasdescribedbyZelkowitzand Wallace,aremeasurementmethods.
Lessonslearnedisthecollectionandanalysisoflessonslearned documentsfromprojects.InTable2thisisclassifiedasa measure-mentmethodtoobutbecauseitalsocontainsanalysis,wecould alsoclassifyitasaformofobservationalfieldstudy.
In static analysis the completed product is investigated, for exampletoanalyzeitscomplexity.Itissimilartothestudyoflegacy databutitishereclassifiedasaninferencetechniquebecauseit referstoacollectionofanalysismethods.
Glassetal.(2001)listtheempiricalresearchmethodsshownin thethirdcolumnofTable3.Non-empiricalmethodssuchas concep-tualanalysisandmathematicalproofhavebeenomitted,anddesign activities,viz.conceptimplementationandinstrumentdevelopment havebeenomittedtoo.Sinceitisnotclearfromthedescriptionby Glassetal.whetherexperimentsareofthesingle-casemechanism kindorofthestatisticaldifference-makingkind,theyareclassified asboth.Idiscussthenewentriesinthiscolumn.
Ethnographyisthedetailedcollectionanddescriptionofdaily eventsinasocialgroup,withoutanalysis,whichisclassifiedhere asameasurementmethod.Groundedtheoryistheanalysisof tex-tualdataproducedbypeople,toextractthetheoriesheldbythese people(StraussandCorbin,1998).Iconsiderthistobeadescriptive analysismethod.Hermeneuticsisthephenomenonthattointerpret humanbehavior,youhavetounderstandtheirculturaland concep-tualframework,buttheonlywaytounderstandtheirculturaland conceptualframeworkistointerprettheirbehavior.Thisleadsto aninferencestrategyinwhichtheresearcheriteratesoverupdating hisorherconceptualframeworkandinterpretinghumanbehavior inthat framework. Protocolanalysis isthe analysisof thinking-aloudprotocols,usefulforcognitivepsychology.Itisadataanalysis method.
Table3showsthattheselistsofsoftwareengineeringresearch methods are mutually consistent and can be integrated in my frameworkfor validation researchmethods,and extenditwith othermethods. The overview is not complete, happily,as new methodsandinstrumentsforresearchkeepbeingdeveloped.
5. Summaryandconclusion
Empiricalvalidationof technologybeforeitis transferredto practicerequiresinvestigatingtheeffectsoftheinteractionofthe artifactwithitscontext,andexplainingtheseeffectsbymeansof theunderlyingmechanismsthat produces theseeffects.Scaling uptopracticethusproducesadesigntheoryoftheform “[Arti-fact×Context]producesEffectsbyMechanisms”.
Producingsupportforsuchatheoryinvolvestwokindsof infer-ences,alongtheverticalandhorizontaldimensionsoftheprocess ofscalingup.Analogicinferencesintheverticaldimensionreason fromcase-basedmechanismexperimentstoreal-worldinstances of[Artifact×Context],andstatisticalinferencesalongthehorizontal dimensionreasonfromobservedsamplebehaviortothe popula-tionofallpossibleinstancesof[Artifact×Context].Bothinferences
aresupportedbyabductiveinferences,thatpostulate mechanism-basedexplanationsofcause-effectsinfluences.Mechanism-based explanationsrefertothecomponentsof[Artifact×Context]and theirinteractions.
Wediscussedthefollowingresearchmethodstovalidate arti-facts:
• Expertopinion,inwhichexpertsreasoninformallyaboutsamples (horizontally)and mechanisms(vertically),which providesan initialsanitycheckofanartifactdesign;
• Single-case mechanism experiments, in which the researcher reasonsverticallyaboutmechanismsandtheireffectsin increas-inglyrealisticartifactsinincreasinglyrealisticcontexts; • Technicalactionresearch,inwhichtheresearcherreasons
ver-ticallyaboutmechanismsandtheireffectswhenanartifactis appliedinareal-worldprojecttohelpaclient;
• Statistical difference-making experiments, in which the researcher reasons horizontally from effects observed in samplestoeffectsinferredinpopulations.
Thesemethodscanbeusedwithmeasurementinstrumentsand dataanalysismethodsknownfromsoftwareengineeringand else-where.
Thispaperhasgivensomeexamplesofuseoftheseresearch methods,butthisisjustonesteponthewaytoscalingupthese methodstoempiricalREresearch.Increasinguseofthese meth-odswillteachusmoreabouttheusabilityandusefulnessofthese researchmethodsinempiricalvalidationofREtechnology.
Acknowledgements
ThispaperbenefittedfromcommentsbyVincenzoGervasiand WalterTichy.Iwouldliketothanktheanonymousreviewersof thispaperfortheirconstructivecritique.
References
Al-Emran,A.,Pfahl,D.,Ruhe,G.,2010.Decisionsupportforproductrelease plan-ningbasedonrobustnessanalysis.In:Proceedingsofthe18thIEEEInternational RequirementsEngineeringConference(RE2010),IEEEComputerSociety, Syd-ney,Australia,pp.157–166.
Apostel,L.,1961.Towardsaformalstudyofmodelsinthenon-formalsciences.In: Freudenthal,H.(Ed.),TheConceptandRoleoftheModelintheMathematicaland theNaturalandSocialSciences.Reidel,Dordrecht,TheNetherlands,pp.1–37. Babbie,E.,2007.ThePracticeofSocialResearch,11thed.ThomsonWadsworth,
Belmont,USA.
Bechtel,W.,Abrahamsen,A.,2005.Explanation:amechanisticalternative.Studiesin theHistoryandPhilosophyofBiologicalandBiomedicalSciences36,421–441. Bechtel,W.,Richardson,R.,2010. DiscoveringComplexity:Decompositionand LocalizationasStrategiesinScientificResearch.MITPress,Cambridge, Mas-sachusetts(Reissueofthe1993editionwithanewintroduction).
Bunge,M.,2004.Howdoesitwork?Thesearchforexplanatorymechanisms. Phi-losophyoftheSocialSciences34,182–210.
Constant,E.,1980.TheOriginsoftheTurbojetRevolution.JohnsHopkins,Baltimore. Cook,S.,Daniels,J.,1994.DesigningObjectSystems:Object-OrientedModellingwith
Syntropy.Prentice-Hall,UpperSaddleRiver,NewJersey.
Cowan,C.,2002.Theprocessofevaluatingandregulatinganewdrug:phasesofa drugstudy.AANAJournal70,385–390.
Damian,D.,Chisan,J.,2006.Anempiricalstudyofthecomplexrelationshipsbetween requirementsengineeringprocessesandotherprocessesthatleadtopayoffs inproductivity,qualityandriskmanagement.IEEETransactionsonSoftware Engineering32,433–453.
Davis,A.,Hickey,A.,2004.Anewparadigmforplanningandevaluating require-mentsengineeringresearch.In:2ndInternationalWorkshoponComparative EvaluationinRequirementsEngineering,pp.7–16.
Douven,I.,2011.In:Zalta,A.(Ed.),TheStanfordEncyclopediaofPhilosophy(Spring 2011Edition).http://plato.stanford.edu/archives/spr2011/entries/abduction/ Elster,J.,1989.NutsandBoltsfortheSocialSciences.CambridgeUniversityPress,
Cambridge,UK.
Engelsman,W.,Wieringa,R.J.,2012.Goal-orientedrequirementsengineeringand enterprisearchitecture:twocasestudiesandsomelessonslearned.In: Require-mentsEngineering: Foundationfor SoftwareQuality (REFSQ2012),Essen, Germany,pp.306–320(volume7195ofLecturenotesincomputerscience, Springer).