Six strategies for generalizing software engineering theories

(1)

Contents lists available atScienceDirect

Science

of

Computer

Programming

www.elsevier.com/locate/scico

Six

strategies

for

generalizing

software

engineering

theories

Roel Wieringa

∗

,

Maya Daneva

University of Twente, The Netherlands

a

r

t

i

c

l

e

i

n

f

o

a

b

s

t

r

a

c

t

Article history:

Received 11 October 2013

Received in revised form 5 March 2014 Accepted 25 June 2014 Available online xxxx Keywords: Generalization External validity Scaling up Architectural mechanisms Statistical inference

General theories of software engineering must balance between providing full under-standing of a single case and providing partial understanding of many cases. In this paper we argue that for theories to be useful in practice, they should give sufficient understanding of a sufficiently large classof cases, without having to be universal or complete.Weprovidesixstrategiesfordevelopingsuchtheoriesofthemiddlerange. Inlab-to-labstrategies,theoriesoflaboratoryphenomenaare developedandgeneralized tootherlaboratoryphenomena.Thisisacharacteristic strategyforbasicscience.In lab-to-field strategies, theories are developed of artifacts that first operateunder idealized laboratoryconditions,whicharethenscaledupuntiltheycanoperateunderuncontrolled fieldconditions.Thisisthecharacteristicstrategyfortheengineeringsciences.

In case-based strategies, wegeneralize about components ofreal-world cases, that are supposedtoexhibitlessvariationthanthecasesasawhole.Insample-basedstrategies, wegeneralizeabouttheaggregatebehaviorofsamplesofcases,whichcanexhibitpatterns notvisibleatthecaselevel.Wediscussthreeexamplesofsample-basedstrategies. Throughout the paper, we use examples of theories and generalization strategies from softwareengineeringtoillustrate ouranalysis.The paperconcludes withadiscussionof relatedworkandimplicationsforempiricalsoftwareengineeringresearch.

1. Introduction

This paper aims to show two things: First, we aim to show that it is not worthwhile to develop general theory of softwareengineering, butthat itis very usefultodevelop incompletely specified,partial theories thatcan be applied to practice.Second,weidentifyfourclassesofstrategiestobuildtheories,namelylab-to-labandlab-to-fieldstrategies,each ofwhichcanbeaboutindividualcasesoraboutsamplesofcases.Eachofthesefourcombinationshasadifferentwayof dealingwiththevariabilityoftherealworld.Wealsogiveexamplesofsuchtheoriesandgeneralizationstrategiesfromthe fieldofsoftwareengineering(SE).

These two aims are an operationalization,for softwareengineering, ofa view about scientiﬁc theories that hasbeen expressedwellbythephilosopherofscienceNancyCartwright:

The laws that describe this world are a patchwork, not a pyramid. They do not take after the simple, elegant and abstractstructureofasystemofaxiomsandtheorems....Thedappledworldiswhat,forthemostpart,comesnaturally; regimentedbehaviourresultsfromgoodengineering.[1,p. 1]

*

Corresponding author.

E-mail addresses:r.j.wieringa@utwente.nl(R. Wieringa), m.daneva@utwente.nl(M. Daneva). http://dx.doi.org/10.1016/j.scico.2014.11.013

(2)

special features ofdesign theoriesused inthe engineeringsciences. Weargue that designtheories necessarilydeal with the variabilityof thereal world,andthat thisimpliesthat we will neverhave general designtheories.Ourdiscussion is applicable toallengineeringsciences,andtoshowthatitisapplicable tosoftwareengineeringaswell,wegive examples oftheoriesfromsoftwareengineeringthatillustrateourpoints.

To achieve oursecond aim,inSection 3 we discussfourstrategiesto develop thegeneralizationsthat are neededfor theories.Eachstrategyhastodealwiththevariabilityoftherealworld,butapproachesthisinadifferentway.

•

Inlab-to-lab generalization,we requirethe targetofgeneralization tobecontrolled sothat thegeneralizationapplies toit.Thisstrategyachievesgeneralityatthepriceofidealization.

•

In lab-to-ﬁeld generalization, we reﬁnethe generalization by dropping idealizingassumptions. This achievesrealistic generalizations,atthepriceofalimited,less-than-universalscope.

Eachofthesestrategiescanbeperformedintwoways:

•

Incase-basedgeneralization, westudyindividual cases,andgeneralizeaboutcomponentsandmechanismsfound ina case,bysimilarity.Theassumptionisthatcomponentsarelessvariedthanthecasestheyoccurin.

•

Insample-basedgeneralization,westudysamplesofcases,andgeneralizeaboutstatisticalpropertiesofthesesamples. Theassumptionisthatindividualvarietycancelsoutinsamplestatistics.

In Section 4 we will summarize our main contributions, discuss related work and draw some further implications for softwareengineeringresearch.

2. Theories

There is noagreement amongphilosophers aboutwhat a theoryis. Oneof thebriefestdeﬁnitions isthat a theory is a beliefthat thereis apatterninphenomena [2, p. 55].This includesall kindsoftheories,includingconspiracy theories aboutthecausesofthecreditcrisis,economictheoriesaboutthecausesofthesamecrisis,astrology,thetheoryofclassical mechanics,andstringtheory.Whatmakesatheoryscientiﬁc?

Thisquestionhasbeenanalyzedbyphilosophersinvariousways,ofwhichtheonlyconclusionseemstobe,again,that there isno criterionagreed onby all philosophers thatsettlesthe matter[3].Here wewill bepragmaticandconsider a theoryasscientiﬁcifithasbeensubmittedto,andsurvived,twokindsoftests[4,5]:

•

Empiricaltests. Thetheoryhasbeensubmittedto,andsurvived,testsagainstexperience.Atheorycanbetestedagainst experienceinobservationalresearchorinexperimentalresearch.

•

Justiﬁcationtoacriticalpeergroup. Thetheoryhasbeensubmitted to,andsurvived,criticismbycompetentandcritical peers. Partofthejustiﬁcationto criticalpeersisthat empiricaltestsdo notdepend onthepersonofthe researcher, andhencearerepeatable:Criticalpeersmustbeabletorepeattheempiricaltests.

Survivingcriticismandempiricaltestingisneverfinal.Evenforatheorythatsurvivedtestingandcriticismforalongtime, it isalwayspossiblethat someonewillfindaflaw intheargumentorthat atest willfalsifypartofthetheory.Scientific theoriesarefallible.Weshouldalwaysconsiderthemtobeimprovable.

Theabsenceofabsolutecertaintyaboutatheorydoesnotimplythatweshouldgiveupsearchingfortheories,letalone abolishthesearchfortheories.ToquoteGordon[6,p. 76]:

Thattheseideals cannotbeattainedisnotareasonfordisregardingthem.Perfectcleanlinessisalsoimpossible,butit doesnotserveasawarrantfornotwashing,muchlessforrollinginamanurepile.

2.1. Structureoftheories

There hasbeen an evolution in viewson the structure of theoriesfrom the classical one that scientific theoriesare sets ofpropositions witha deductiveinference system[7,8], tocurrentviewsthat scientific theories areabstract models ofphenomena[1,2,9–11].Ourpurposehereisnottosummarizetheseviewsnortodefendaviewpointonwhatwethink the definitestructure ofscientifictheoriesis,buttopoint outthreeelements that arepresentinmostscientifictheories, accordingtomanyphilosophers(Fig. 1).

(3)

Fig. 1. Elements of scientiﬁc theories. Theories in addition have a scope of applicability.

Conceptualframework Theoneelementthatispartofascientifictheoryaccordingtoallviewsisa conceptualframework bywhichtodescribephenomena.Aconceptualframeworkisasetofdefinitionsofconcepts,used,forexample,toask re-searchquestions,describeandanalyzephenomena,stategeneralizationsaboutphenomena,specifymodelsofmechanisms, etc.Forexample,atheory ofeffort estimationmaycontainadefinitionoftheconceptsofeffort andsize,andatheory of programcomprehensionmaycontainconceptslikechunking,short-termmemory,andlong-termmemory.Theseconceptsare used,amongothers,todescribeandanalyzephenomena.

Sometheoriesonly consistofaconceptualframework.Forexample,Lietal.[12]deﬁnedaconceptualmodelof multiple-component defects, containing deﬁnitions of concepts such asdefect, multiple-componentdefect, architecturalhotspot, and

repairdependency.Usingthisframework,phenomenacouldbedescribedthatexhibitregularities.Forexample,inthe inves-tigatedcasearelationwasobservedbetweencostofmaintenance,numberofmultiple-componentdefects,andpersistence ofdefects.

Note that there is a difference between the generality of a conceptual framework and the generality of descriptions madeusingtheconceptualframework.Sinceaconceptualframework isasetofdeﬁnitions,itcannotbetrueorfalse,but itcanbe applicableornot.Aconceptualframeworkisgeneralifitcan beapplied tomanyphenomena. Themoregeneral a framework,the largerthe setofphenomena to whichitis applicable.The deﬁnitionsin theframework ofLietal.are generallyapplicabletolargesoftwaresystems.

Bycontrast,descriptionscanbetrueorfalse.Adescriptionisgeneralifitistrue ofmanyphenomena.Inthecasestudied byLietal.,20%ofthecomponentscontained80%ofthemultiple-componentdefects.Thisisatruedescriptionoftheircase. Itmaywellbefalseinothercases.

Generalizations Thesecond elementusuallyfound intheoriesisacollectionof generalizations aboutphenomena. Gener-alizationsmaybeformalorinformal,expressedinwordsordiagrams,maybeknowntobetrueoftenbutfalsesometimes, andmaynotallbeconnecteddeductively.

For example, a theory that could be proposed based on the research by Li et al. [12] mentioned above, is that in mostlargesoftwaresystems,about20%ofthecomponentscontain80%ofthemultiple-componentdefects.Thistheoryis probablyfalseinsome cases,butitispossiblethatitistrueinmanycases.Itwouldacquire supportifinaseriesofcase studies oflarge softwaresystems performedindependently, each time about20% ofthecomponents turnout to contain about80%ofthemultiple-componentdefects.Itwouldacquireevenstrongersupportifinarandomsampleof30ormore largesoftwaresystems,ineachsystemthesetofcomponentsthatjointlycontain80%ofthemultiple-componentdefects, containsontheaverageabout20%ofthecomponentsofthesystem.

Models Themore recentnotionsof ascientiﬁc theory agreethat manytheoriesprovide abstract models ofphenomena.

Models may be deﬁned in text or diagrams, and the deﬁnitions may be formal or informal and are often incomplete.

Modelsrepresentaphenomenonasasystemofinteractingcomponents.Craver[2],Machameretal.[13] andBechteland Abrahamsen[14]giveexamplesofmodel-basedtheoriesfrombiology,speciﬁedbymeansofdiagramsthatshowinteracting components,suchastheheart,lungs,andtissuethatexplainsomeofthephenomenaofbloodﬂowandrespiration,orthe biochemicalsubstancesandtheirinteractionsthatexplainpartofthemetabolicprocess.Glennan[15]givessomeexamples oftheoriesaboutphysicaltechnicalstructures,suchasthetransistorandresistorsthatmakeupavoltageswitch.

Theinteractionsamongcomponentsthatproduceinterestingsystem-levelphenomenaarecalled mechanisms,andoften modelsareprimarilydescribedbytheirmechanisms.AnexampleofaneuropsychologicalmechanismgivenbyBunge[16]is theextinctionofaversivememoriesbytheactionofcannabinoids onneuronalprocessesintheamygdala,andanexample ofaneconomicmechanismistheuseofastabilizationfundbyacentralbanktostabilizegovernmentrevenueinthefaceof majorcommoditypriceﬂuctuations.HedströmandSwedberg[17]listanumberofsocialmechanisms,suchasthereference groupmechanismidentiﬁedbyMertonandKitt [18].Thagard[19] givesa rangeofexamplesacrossthebasicandapplied sciences.

Theseexamplesillustratethatthecomponentsstudiedbyresearcherscanbephysical,biological,psychological,orsocial. Ifweextendourviewtosoftwareengineering,thenweencountersoftwarecomponents,hardwarecomponents,components ofthecognitive processesofprogramcomprehension,components ofsoftwareengineeringprojects, etc.Thiscorresponds well with the diversityof examples ofsoftware engineering theoriesfound by Hannay et al.[20].Concrete examples of modelsusedinsoftwareengineeringtheorieswillbegivenlater.

(4)

Fig. 2. Trade-off between generality and practicality.

2.2. Thescopeofscientiﬁctheories

The scope of a theory isthe set of phenomena to which it isapplicable. The scope ofa generalization isthe set of phenomenaforwhichitistrue,andthescopeofamodelisthesetofphenomenatowhichitcanbeapplied.Forexample, we maypropose a20/80 theorybased onthecasestudyofLietal.discussedabove, wherewe vaguelycharacterize the scope to beall complex,large-scale, commercialsoftwaresystemsdeveloped overa periodofatleasta dozenyears [12, p. 676]. As another example,Mayrhauser etal. [21] propose a model ofthe cognitive processes by which programmers understandcode.Thescopeisclaimedtobe,precisely,allprogrammers.

Weshouldreiterateherethat theoriesarefallible, andthereforeclaims aboutthescopeofatheoryarefallibleaswell. Ourresearchshouldbeaimedatimprovingtheaccuracyofourscopeclaims.Atanypointintime,ascopeclaimisourbest bet,giventheargumentsandevidencesofar.

Idealizationandgenerality Scope claims in basic science are the opposite of scope claims in engineering science. Basic scientiﬁc research makes idealizing assumptions that are knownto false inpractice but that make patterns ofbehavior visible[22,23].Conceptslikethat ofpointmass,frictionlesssurface,perfectlyelasticbody,absolutevacuum,rationalactor and Turingmachine donotexistintherealworld,butallowresearcherstoanalyzepatternsinphenomenaconceptuallyoreven mathematically. Thisapproach to knowledge hasbeen called Galilean idealization [24,25]. As a consequence of Galilean idealization, models proposedinbasicresearch are idealizedstructuresthat abstractandsimplifystructuresfoundinthe realworld,intheinterestofconceptualandcomputationaltractability.Cartwright[1]callsthemnomologicalmachines. Basic

lawsofnaturearetruefornomologicalmachinesbutfalsefortheuncontrolledrealworld[26,9].Inother words,thelaws ofbasicsciencecannotbeappliedinpractice[27,28,9].Theuniversalityofbasiclawsofnatureisobtainedatthepriceof idealization.

Inmanysciences,middle-rangetheoriesaremoreusefulthanuniversaltheories.Atheoryismiddle-range ifits general-izationsdonothaveuniversalscope.Theconceptofmiddle-rangetheorywasdevelopedforthesocialsciences[29],butis alsoapplicabletospecialsciencessuchasgeology,meteorologyandpoliticalscience,whichallhavetodealwithavariety ofuncontrolledconditionsofpractice(Fig. 2).Engineeringsciencesproducemiddle-rangetheoriesaswell[23].Wecanview theoriesliketheCOCOMOmodel,someofthesoftwareengineeringprinciplesofDavis[30]andsomeofthetheorieslisted byEndresandRombach[31]asmiddle-rangetheories.

Sciencesofthemiddle-range,whicharesciencesthatproducemiddle-rangetheories,areshowninthemiddleofFig. 2. Researchersinthesesciencestrytoavoidunrealisticassumptionsandaimforgeneralizationsthathaveless-than-universal scope.Asaconsequence,practitionerswhowanttoapplythesemiddle-rangetheoriestotheirparticularcaseshouldassess whetherthemiddle-rangetheoryistruefortheircase,orperhapsneedstobeadapted.Inaway,apractitionershouldbuild atheoryofhisorherparticularcase,basedonmoregeneral,middle-rangeknowledgeproducedbyresearchers[32].

2.3. Functionsoftheories

Scientiﬁctheoriescanbeusedtoexplore,frame,describe,analyze,explain,predict,specify,design,control,andorganize phenomena [2].Herewe describe onlyafew ofthesefunctions,namelydescription,analysis, explanation,predictionand design.

Descriptionandanalysis Theconceptualframework ofa theorycanbe usedtodescribeandanalyzephenomena.Forthis, nogeneralizationsormodelsareneeded.Forexample,theconceptualframeworkofthe20/80multiple-component defect theorydiscussedearlier[12],deﬁnesconceptslikedefect,multiple-componentdefect,andarchitecturalhotspot.Theseconcepts canbeusedtoanalyzedataaboutaproductdevelopmentproject,todescribesome ofthephenomenainthisproject,and toanalyzetherelationsbetweenthesephenomena.

(5)

We need not restrict ourdescriptions toobserved phenomena. We can also generalize descriptively beyondobserved phenomena. For example, Huynh and Miller found in a sample of web application that roughly 70% of the vulnerabili-tieswere duetoinsecure codingpractice,whichthey callimplementationvulnerabilities [33,p. 565].Intheir conclusion theygeneralizethisﬁndingbysayingthatthemajorityofvulnerabilitiesinwebapplicationsisimplementation vulnerabili-ties[33,p. 574].Thisisadescriptivegeneralization.

Explanation Thegeneralizationsormodelsofatheorycansometimesbeusedtoexplainphenomena.Wewilldistinguish twokindsofexplanations [34,23].There areotherkindsofexplanations inthephilosophyofscience,butforthepurpose ofthispaperwewillfocusonthesetwo.

•

Atheoryexplainsaphenomenoncausallyifithasidentiﬁedanearlierphenomenonthatcausedit.Wecallthisa causal explanation. Forexample,wemayexplainaprojectfailurebyrequirementscreep.

Causation isa complex concept that defies non-circular definitionyet is central inscientific explanation[35–38].At least we can say that causal explanations refer to variables, and to causal relationships between variables. Here we followWoodward[39] insaying that X influenced variable Y causally,ifY changedbecause earlier, X changed ina particularway.

•

A theory explains a phenomenon architecturally if it identiﬁes components of a system that by their interaction produced the phenomenon. We call thisan architecturalexplanation. For example, Mayrhauseret al. [21] propose componentsofthe cognitiveprocess ofprogram comprehension,such asshort-termandlong-termmemory, and

ex-plainhowinteractionsamongthesecomponentsproduceprogramcomprehensionphenomena.

Architectural explanations refer to components of systems, andinteractions among these components that produce system-levelphenomena. Thesystemscanbephysical,social,psychological,digital,etc.Components arecharacterized bytheircapabilitytointeractwiththeirenvironmentincertainways.Asindicatedearlier,interactionsbywhich com-ponentsofasystemproducesystem-levelphenomenaarecalledmechanisms[15,40,13,41,16,42].

Wecanillustratethesetwodeﬁnitionswithametaphoricalstory:Pushingabuttononacoffeemachinecausesthemachine todispenseacupofcoffee.Afterlearningaboutthiscause-effectrelation,youwillexplainthephenomenonthatamachine hasdispensedcoffee withthe explanationthat someonepushed a button.Thisis your causaltheory ofcoffee-dispensing

phenomenabycoffeemachines.

In contrast,an architecturaltheory would explain the coffee-dispensing phenomenon by meansof the coffee machine components.Themachinecontainsacoffeereservoirandawatersupply,connectedbymechanismsthatensurethatifyou pushabutton,itdispensescoffee.Thisisanarchitecturalexplanation,whichisgeneralizabletoothercoffeemachineswith a similar architecture. Coffeemachine engineers use thistheory. Different machinesthat satisfy the same causal theory (coffee-dispensingiscausedbypushingabutton) maysatisfydifferentarchitecturaltheories(theyhavedifferent architec-turesthatrealizethiscause-effectrelation).

Yourcoffeemachinetheorywouldconsistofthethreeelementslistedearlier(Fig. 1).Itwouldhaveaconceptual frame-work inwhich concepts like button,dispenser and coffeereservoir are deﬁned, contain generalizationsaboutthe effect of events,andcontainamodelofthearchitectureofatypicalcoffeemachine.Thetheorywouldbenondeterministic,meaning that there some caseswhere itis false —e.g. when aphysical machine behaves erraticallyandspontaneously dispenses coffee.

AnexampleoftheuseofbothkindsofexplanationsinonecaseisgivenbythecasestudyofDamianandChisan[43]. Theystudied asoftwaredevelopment organization inwhich requirementsengineeringwas introduced.Afterintroduction ofrequirementsengineering, there waslessrequirements creep.The causalexplanation ofDamian andChisan isthat this was causedpartiallyby theintroductionofprojecttrackingandpartiallyby theintroductionofchangemanagement [43, p. 436].Likeallcausalexplanations,theseexplanationsexplainachangeinsomevariable(requirementscreep)byparticular changesinothervariables,that happenedearlier.Theexplanations arenondeterministic,meaningthat theearlierchanges usuallycontributeto,butdonotdeterministicallydetermine,thelaterchanges.

The paper provides information that we can use to provide an additional, architecturalexplanation. Part of the intro-ductionofrequirementsengineeringwasthecreationofachangemanagement boardthroughwhichall customerchange requests hadto pass.The previous practice that customerscould calldevelopers directly, an almost suremechanismfor requirements creep, was replaced by the mechanism involving the change control board. Anotherpart of the introduc-tionofrequirementsengineeringwastheextensionoftheprojectmanagementfunctionwithprojecttracking[43,p. 450]. Thesearchitecturalexplanationsrefer tocomponents(changecontrol board,projectmanager)withcapabilitiesto interact withothercomponents.Somecomponents(changecontrolboard)havebeenaddedtothesituation,andothercomponents (project managers)changed their capability, asthe resultofwhich some mechanisms disappeared andnewmechanisms werecreated,producingdifferentphenomenathanbefore.

Causalexplanationsrequireaconceptualframework thatdeﬁnestherelevantvariables.Variablesare themachine lan-guage of science, but the real world contains a lot more structure than just variables and relationships. Architectural explanationsassumearicherstructureoftheworld,consistingofsystems,components,capabilitiesandmechanisms.Many philosophers whotake amodel-basedview oftheoriesalsoallow architecturalexplanations [2,13,14,44,15].Themodelis thenanomologicalmachinethatshowshowphenomenaareproduced[1].

(6)

Fig. 3. Relations between different levels of aggregation.

Prediction Amajorfunction oftheoriesin theengineeringsciencesisprediction.Forexample,we canusethe resultsof Huynh andMillermentioned earlier[33] topredictthat inother webapplicationstoo,themajorityofvulnerabilitieswill beimplementationvulnerabilities.Thisexamplealsoillustratesthatwedonotneedtobeabletoexplainaphenomenonin ordertopredictit.

Conversely, we may be able to explain a phenomenon butnot to predict it. For example, we may observe that all

obviouscausesoffailure areabsentinaproject,butstillnotbe abletopredictreliablywhethertheprojectwillsucceed. Ourknowledgeofprojectfailureandsuccessmaybetooincompleteforthat.Knowledgeofsocialphenomenaisoftentoo incompletetoallowdeterministicprediction[45,p. 348].

Design Engineersare interested inthe interactions betweenanartifact andits context. Artifactsin softwareengineering are algorithms,notations,techniques,methods,etc. Thecontextinwhich theyare usedisasoftwareengineeringproject, projectpersonnel,customers,software,hardware,organizations, etc.A typicalarchitectureforsoftwareengineeringprojects isthat anactorapplies technologiestoperformactivities onasoftwaresystem[46].Softwareengineeringresearchersare interestedbetweentheinteractionsamongtheseelements.

Theories produced by softwareengineeringresearchers mayhelp practicingsoftwareengineers inthedesign and im-provementofartifacts.Forexample,ifthemajorityofvulnerabilitiesinwebapplicationsis implementationvulnerabilities, a company that developsweb applications maydecideto improvethe competenceofits programmersto avoidinsecure codingpractice.

Ifa theory providesexplanations aboutwhytheinteractionbetweenan artifact anditscontext producescertain phe-nomena,thenwe maybe ableto useittoimprovetheartifactortochoosethebestcontextforit.Aclassicalexampleis thesteammachine,whichhadbeenoperationalforover100yearsbeforeSaditCarnotexplainedhowitworked[47].This inturnprovidedknowledgethatcouldbeusedtomakethedesignofsteammachinesmoreeﬃcient.

Ifatheoryallowsthepredictionofphenomena,evenwithoutexplanation,thenitcanstill beusedtochooseadesign. Forexample,asoftwareengineermaydoperformancemeasurements oftheexecutiontimeofsome algorithmindifferent contexts. Iftheperformancemeasurements havebeenshownto berepeatable,thenthiscanbe usedtopredict whatthe performanceofnewimplementationsofthisalgorithmwillbe,eveniftheexactperformancenumberscannotbeexplained tothelastdigit.Practicalengineeringcontainsmanyoftheseempiricallydevelopedpracticallyusablepredictions[48].

2.4. Summary

Scientiﬁc theoriesconsistofaconceptualframework,andusuallycontaingeneralizationsand/ormodelsofphenomena. They can be used, among others, to describe, analyze, explain, and predict phenomena. Explanations may be causal or architectural.A causalexplanationexplainsachangeinavariablebyanearlierchangeinanothervariable.Anarchitectural explanationexplainsaphenomenonintermsoftheinteractionsamongcomponentsthatproducedit.

Theoriesareusefulforengineersbecausetheycanallowthemtodescribe,analyze,explainandpredictthebehaviorof artifactsinacontext.Noteverytheorymaybeusableinallofthesewaysatthesametime.

Softwareengineeringresearchersmustbalancebetweentheextremesofidealizationandpractice.Bymakingtoomany idealizationstheirtheorieswouldloserelevanceforpractice;byincludingtoomanyconditionsofpracticetheywouldlose the ability to generalize [49]. In the next section we look at different strategies to choose a balance between the two extremes.

3. Strategiesforgeneralization

To discussthe strategies forgeneralization, we need to distinguishobjects of study,samples, andpopulations, as in-dicated in Fig. 3. Theobject ofstudyis theobject fromwhichmeasurements are taken, such asfor examplea software engineeringproject,a softwareengineer, asoftware program,etc.We willdistinguish case-basedresearch,in which sin-gle objects ofstudyareinvestigated,from sample-basedresearch,inwhich samplesofobjectsofstudyare investigated. For example,we mayinvestigate a single softwareengineering projectin-depth,or we maysurvey a sample of projects statistically.

Insample-basedresearch,sampledataareusedtogeneralize statisticallytoawell-deﬁnedpopulation,calledthe study population. The sample is a subset of the study population. We may extend the generalization further from the study

(7)

Fig. 4. Steps in strategy 1: lab-to-lab generalization.

populationtoaso-called theoreticalpopulation,ofwhichthestudypopulationisasubset.Thetheoreticalpopulationmay belesswell-deﬁnedthanthestudypopulation.

Forexample,wemaysurveyasampleofprojectsselectedfromalistofprojectsinalargecompany.Theprojectsonthis listformthestudypopulation.Aftergeneralizing statisticallyfromthesurveyedsample tothe studypopulation, wemay generalizetothelarger setofallprojectsinthecompany, whichisill-defined becausewe donothavealistofthem.We mayevengeneralizetothesetofallsoftwareengineeringprojectsinallsimilarcompanies,whichisevenmoreill-defined iftherequiredsimilarityhasnotbeenspecifiedcompletely.

In case-based research, we may attemptgeneralization fromthe objectof studyto the theoretical population imme-diately.Forexample,fromtheinvestigation ofasingle project,we maytentativelyhypothesize ageneralization aboutall similarsoftwareengineeringprojectsinsimilarcompanies.

Wewillseethatallgeneralizationstoatheoreticalpopulationsarebasedonsimilarity.Animportantresearchquestionis thenwhatkindofsimilarityissuﬃcienttowarrantgeneralizationtoatheoreticalpopulation.Fordifferentgeneralizations, wemayneeddifferentconceptsofsimilarity.

Thedistinctionbetweensample-basedandcase-basedresearchisidealized,becauseinpracticetherearemixedformsof research,inwhichforexampleweinvestigateasampleofprojectsonebyone,inacase-basedway,orinwhichinvestigate asample ﬁrst statistically,andfollowthisup withacasestudyofoneofthem.The theoreticalpopulation maynotbe a supersetofthestudypopulation,butapopulationsimilar toitinsomerespects,etc.Butthepicturesuﬃcesasaguidefor thediscussionofgeneralizationstrategies.

Validity Theinferencestepsbywhichweproduceexplanationsandgeneralizationsarefallible,whichmeansthattheycan leadtoincorrectconclusionsfromcorrectpremises.Thedegreeofsupportforaconclusionofafallibleinferenceiscalled its validity[50,p. 513].Thevalidityofastatisticalinferencefromasample toastudypopulationiscalledits conclusion validity.

Internalvalidity isdefinedbyShadishetal.asthedegreeofsupportfortheclaimthatarelationbetweentwovariables iscausal[50,pp. 53,508].Becausewe recognizecausalaswell asarchitecturalexplanations,we generalize thedefinition by Shadishetal. anddefineinternal validity hereasthe degreeof supportfora causalor architecturalexplanation ofa phenomenon.

Externalvalidity is deﬁnedby Shadishetal.[50, pp. 83,507] astheextent towhicha causalrelationship alsoholds overvariations inUnits,Treatments,Outcomes andSettings.Inourmoregeneralinterpretationofinternalvalidity, andin termsofFig. 3,weheredeﬁneexternalvalidityasthedegreeofsupportforthegeneralization ofacausalorarchitectural explanationtoa theoreticalpopulation.Thesourceofthisgeneralizationcan bean explanationofphenomenainasingle objectofstudyorinastudypopulation.Thetargetisalwaysatheoreticalpopulation.Theproblemoftheexternalvalidity is

theproblemofthevariabilityoftherealworld.Eachgeneralizationstrategieshasawaytodealwiththisproblemwithout solvingitcompletely.

3.1. Lab-to-labgeneralization

The ﬁrst strategy to generalize toa theoretical population is case-based, anddeals with the problemof external va-lidityby requiringuniformityinthetheoretical population.Theresearch goalisto achievetheoretical understandingofa phenomenon,andthestrategyistoachievethisbydoinglaboratoryexperimentsundercontrolledconditions(Fig. 4).

Thisisthe classicalapproach inbasicscience,wherethe majorchallenge isto createthe conditionsinthelaboratory thatareidealenoughforthephenomenontobeproduced[51,p. 92].Forexample,in1820Oerstedshowedbyexperiment thatanelectriccurrentdeflectedanearbymagneticneedle.IntermsofFig. 4,steponewastoobservethedeflectionofthe needleintheexperimentalsetup.Steptwowas theidentificationofthepresenceoftheelectriccurrentasthecause.Step threewastoconfirmthatthiswasrepeatablebyOerstedandbyallotherresearcherswhotriedreplication.

Interestingin thisexampleis that theexperimental manipulation showedthat there was a repeatable causalrelation by whichthe electriccurrentinﬂuenced the needle,butthat there was yetno mathematicaldescription orarchitectural explanation of it.So it was just a phenomenon that experimenters knew how to produce. Afew yearslater theFrench mathematicianAmpèrebuiltamathematicaltheoryofthisphenomenonthatdescribedthephenomenonexactly. Architec-turalunderstandingwasachievedmuchlater,whenMaxwelldevelopedhisﬁeldtheoryrelatingmagnetismandelectricity. Generalization,causalexplanation,description,andarchitecturalexplanationarenotalwaysfoundinalogicalorder.

Characteristicoflaboratoryresearch isthat externalvalidityisclaimedonlyforthetheoreticalpopulationoflaboratory experiments.Experimentalresearchersinbasicsciencespendtheirresearchbudgetoncreatingtheidealconditionsrequired forthegeneralizationtohold.Thisseemsirrelevantforengineeringresearchers,whowanttocreateeffectsintherealworld.

(8)

Fig. 5. Steps in strategy 2: lab-to-ﬁeld generalization.

Intherealworld,effectscreatedinthelaboratorymaybeswampedbyamultitudeofuncontrolledcausesandbeinterfered withbyothermechanisms.

However,thisdoesnotmakelaboratoryresearchirrelevantforengineeringscience.First,creatingtheoretical understand-ingisusefulinsoftwareengineeringresearchaswellasitisinanyotherscience[52].

Second, laboratory theories can be used to predict andexplain other laboratory phenomena beyondthose they were originallydevelopedfor.Forexample,thetheoryofcognitiveprocessingdevelopedfromlaboratoryexperimentsby Gellen-beck andCook[53]in1991wasused10yearslaterbyPrecheltetal.[54] tojustifythedesignofalaboratoryexperiment withpatterncommentlines(PCLs).APCLisacommentlinethatdescribestheuseofsoftwarepatternswhereapplicable. Precheltetal.wantedtotesttheeffectofPCLsonmaintainabilityofprograms,andusedthebeacontheoryofprogram com-prehensionproposedbyGellenbeckandCooktopredictandexplainwhatcanbeobservedinthelab.Thisisanexampleof lab-to-labgeneralization.

Third,ongoingtestingofatheorydevelopedforlabphenomenamayverywellshowthatthetheoreticalpredictionand explanationstill holdintheﬁeld,eventhoughthisisnotagoaloflab-to-lab generalization.Andtheremaybereasonsto expectthisinadvance.Forexample,Precheltetal.useextremecasereasoningtospeculatethatifPCLsarealreadyeffective fortherelativelysmallandwell-commentedprogramsinthelaboratory,theymaybeeffectiveinan environmentoflarge ill-documentedprograms[54,p. 604].Thisprovidesareasonforfurthertestingthispredictionintheﬁeld.

Fourth, intheabsenceoffurtherinvestigationsintheﬁeld,andassumingthatthelaboratorytheoryisinternallyvalid, a laboratory theory can be used to suggest what regimentation we should impose on the context, or what “protective covering” weshouldputaroundtheartifact,ifwewanttoreproducethephenomenaintherealworld astheyhavebeen produced in the lab[1, p. 86]. Forexample, the theory ofPrechelt etal. maybe generalizable to real-world projects in whichprogramsarekeptsmallandwell-documented.

Lab-to-labgeneralizationisoneofthecoregeneralizationstrategiesinbasicscience,wherethegoalistounderstandand createphenomenainisolation.Inengineeringscienceweﬁndanothercorestrategy,thatwecalllab-to-ﬁeldgeneralization.

3.2. Lab-to-ﬁeldgeneralization

The objectof study inengineering sciences isan artifact in a context of use [34,23]. Engineering researchers iterate between(re)designingartifactsforuseinaclassofcontexts,andinvestigatingartifactsthatinteractwithcontexts ofthis class.Forexample,wemaydesignanewsoftwareengineeringnotationforuseinaparticularkindofsoftwareengineering project,investigateits propertiesintheclassroom orinthefield, redesignit,investigatesome more,etc.Inthisstrategy, researchersstarttheirinvestigationsunderidealconditionsinthelabandfinishthemunderrealisticconditionsinthefield. At the startof thisprocess,when onlylab researchis done, thegoalis tosupport lab-to-lab generalizations.Atthe end ofthisprocess,whenfieldresearch hasbeendone,thegoalistosupportfield-to-fieldgeneralizations.Duringtheprocess, artifactsare scaledupto practiceandgeneralizationsareincreasingly targetedatfield conditions.Wewillabbreviatethis processas“lab-to-fieldgeneralization”.Itisacharacteristicgeneralizationstrategyintheengineeringsciences[55,48,49].

Inthisprocess,artifactsarescaleduptopractice[56].Forexample,theturbojetwasdesignedandbuiltﬁrstasaprototype inthelab,andinvestigatedunderidealcircumstancesthatdidnotyetresemblepracticalconditionsofaﬂyingaircraft[55]. Duringits development,itwasscaledup bymakingitmorerobusttoconditionsofpractice.Ateachstep,anincreasingly robust prototype was investigatedunder morerealistic conditionsof practice,until arealistic prototype was investigated whenitwasusedtopropelanaircraft.

The knowledge that isbuiltup during thisprocess hasa continuously changingsubject:a relativelysimple prototype operating underidealconditions atﬁrst,anda sophisticatedprototype operatingunderrealistic conditionsofpractice at last.Ateachstage,thetechnologyisinvestigated,untilsatisfactorysupportforageneralizationabouttheartifactinsimilar conditions canbegiven. Whensuﬃcient supportforsucha generalizationhasbeenacquired, thenext stepinscalingup takes place.Ideally ateach stage, the behavior of theartifact prototype can be explained architecturally,i.e. interms of componentsofwhichtheinteractionsproducethebehavioroftheprototypeartifactincontext(Fig. 5).

Lab-to-ﬁeldgeneralizationisaformoftechnologyvalidation[57,58].Elsewherewediscussresearchmethodsthatcanbe used intechnologyvalidity,such assimulation,technicalactionresearch, andstatisticaldifference-making experimentsin thelaborintheﬁeld[56].

Insoftwareengineeringresearch,wecanseequiteafewexamplesinthepasttwenty-ﬁveyearswheresoftware engineer-ing researchershavescaled uptechnologyfromlaboratorytopractice,orgeneralizedtheirknowledge ofnewtechnology. Forexample,inthe1980stheCleanRoommethodologywastestedinaseriesofexperimentsthatscaled upfromthe

(9)

lab-oratory (usingstudentsassubjects)to smallreal-worldprojectstolarge real-worldprojects,verysimilar tothewaynew drugsaretested[59,60].Thesetestsstartedoutassimulationsinthelaboratoryandendedupaspilotprojectsinpractice.

Intheearly1990sLubarsetal.[61,62]andPotts[63]investigatedobject-orientedanalysismethodsinthreeexperiments thatstartedfromsimulation,inwhichtheauthorsbuiltanobject-orientedrequirementsmodelofarelativelysimpleATMin thelab,andendedwithanactionresearchprojectinwhichtheydevelopedaspecificationofacellulartelephoneprotocol inpractice.Themethodsinvestigatedwerealreadyonthemarket,butgeneralizableknowledgeaboutthemwasstilllacking. Scalinguptechnologyfromthelaboratorytothefieldiscommoninindustrialresearch,whiledevelopingnewtechnology foramarket.However,scalingupmustbedistinguishedfromtechnologytransfer.Transferringnewtechnologytothefield canhappen withoutanysystematicprocessofscaling upandbeforescientificinvestigationofits performanceinpractice has beendone — the UMLis a caseinpoint. Conversely, scaling up technology to field conditionscan happen, without iteverbeingactuallytransferredto practicaluseinthemarket—4Kresolutionscreenscouldbean example.Technology transferincludesactivities likemassproduction,marketinganddistribution,whichare absentfromtheactivityofscaling uptechnology.

Lab-to-field generalization isa process in whichlab-to-lab generalization evolvesinto field-to-fieldgeneralization. We mayalsodistinguishfield-to-labgeneralization,inwhichaphenomenonfoundinthefieldisreproducedandinvestigatedin thelab.Forexample,thedependenceofproductivityofpairprogrammersonpersonalitycharacteristicsoftheprogrammers, foundinthe field,maybereproduced andinvestigatedinthelab[64,65].Wedonot discussthesestrategies furtherand now turn to anotherclassification of generalization strategies, basednot onthe source and target ofgeneralization, but on theobject that isgeneralizedabout: A caseora sample. Inthe following twosections we discusssample-based and case-basedstrategies.Inthediscussionandintheexamplesitwillbecomeclearthatthesestrategiescanbecombinedwith anyofthestrategiesdiscussedsofar.

3.3. Sample-basedgeneralizationstrategies

Sample-basedgeneralizationsreducethevariabilityoftherealworldbyaggregatingindividualphenomenaoversamples, sothatsomeindividual variationcancelsout. Thisideaaroseintheearly19thcenturyinconnectionwiththegovernance ofstates[66–68].Mathematiciansandpoliticiansdiscoveredthatevenwhereattheindividuallevel,norepeatablepattern couldbediscerned,atthepopulationlevel,therewerenearlystablepatterns.ThenumberofbirthsinacitylikePariswas nearlythesameevery year,aswerethenumberofmarriages,thenumberofdeaths, andeventhenumberoflettersthat gotlostintheParisianmailsystemeveryyear.Intoday’slanguagewewouldsaythatthevarianceofthesesamplestatistics wassurprisinglysmallfromyeartoyear,andweunderstanditasaconsequenceofthelawoflargenumbers.

Theearlydevelopmentofstatisticsasadisciplinewascloselyrelatedtothedevelopmentofsociologyasascience,with itsowndomainofstudy:massphenomena inpopulations.Laterinthe19thcentury,statisticsasadiscipline wasapplied toothertopics,suchasmassphenomenaingasses[66].

Wewilldiscussthreestrategiesforsample-basedgeneralization:

•

Randomizeddifference-makingexperiments:Generalizeacausaltheoryaboutstatisticalpopulationphenomena.

•

Quasi-experiments:Generalizeacausaltheoryaboutsamplephenomena.

•

Statisticallearning:Generalizeastatisticaldescriptionofsamplephenomena.

Randomizeddifference-makingexperiments Thefirstkindofstatisticalinferencefromsamplestopopulationswasdeveloped early in the20th century by Gosset and Fisher [69,70].The goal is to infera property of theprobability distributionof oneormorevariablesoverastudypopulation,fromobservationsofarandomsampleselectedfromthepopulation.There are twoformsofstatisticalinference. In statisticalhypothesistesting, sample observationsare comparedtoa hypothesis aboutthepopulationdistributionofthevariable(s)ofinterest.Theresultofthecomparisonistheninterpretedassupport fororagainst thehypothesis,orasinconclusive.Whichoftheseoptions isavailable dependsonthekindofinferencethe researcherwants touse,the oneproposed by Fisher ortheone proposed by NeymanandPearson, ora mixof thetwo. In confidenceintervalestimation,sample dataareusedto estimatean intervalinwhichadistributionparametercan be confidentlyassumedtofall,thatiswitharelativelylowrateofmakingthewrongestimation.Bothclassesoftechniquesuse theCentral-LimitTheorem,whichassumesrandomsampling.Theyarediscussedinanytextbookonstatisticalinference[71, 72].IntermsofFig. 3,statisticalinference takesonefromsample observationstopropertiesofdistributions ofthestudy population.DetailsaboutthemethodologicaldifferencesbetweenthemaregivenbyHacking[73]andWieringa[23].

Statistical inference is used in for example randomized controlled experiments to compare the effect of a randomly allocated treatment with non-treatment, or treatment by a placebo. Here we discuss the logic of statistical and causal inference fora slightly generalized kind of experiment, that we will call a randomizeddifference-makingexperiment (Fig. 6),in whichwe comparearbitrary treatments.The reasonforthisname isthat inthiskindof experimentwe take a difference-making view of causality,aswe also doin thispaper. Rephrasing ourearlier definitionof causality,we can saythatifadifferencein X makesadifferenceto Y ,then X causally influencesY .Iftakinganaspirinmakesadifference tomy headache,then takingan aspirincausallyinfluences my headache.Thisjustifies thestructure ofdifference-making experiments,wherewecomparethedifferenceinoutcomevariablesoftwotreatmentgroups[74].

(10)

Fig. 6. Steps in strategy 3: randomized difference-making experiments.

Theideaistoprovideevidencefortheclaimthattwotreatments, A andB,haveadifferenteffectinapopulation.We

deﬁne a dummyvariable X withtwo levels A and B,selecta randomsample fromthestudypopulation, andrandomly

allocate A or B to thesample elements. Suppose forsimplicitythat we dothisso thatwe get two subsamplesof equal size, one treatedby A andone treatedby B. Thesesamplescan beviewed asbeingselectedrandomlyfromtwo virtual populations, the population treatedby A and the populationtreated by B.We are interested inproviding evidencethat thereisastatisticallydiscernabledifferencebetweentheaveragesofavariableY inthepopulationtreatedby A andinthe populationtreatedbyB.Inaddition,wewanttoprovideevidencethatthisdifferenceiscausedbythedifferencebetween

A andB.

In a randomized difference-making experiment, thisevidenceis provided asfollows(Fig. 6). Step oneis to apply the

treatments, and to compute the averages yA and yB of the values of Y on the two samples. Based on this we use a

statisticalinferencetechniquetoinferwhetherthereisastatisticallydiscernablenon-zerodifferencebetweentheaverages of Y inthetwo populations,i.e.whether

μY

,A and

μY

,B aredifferent. Thedatacanprovidestrong,weak,orinconclusive

evidence foranon-zero difference.Ifwe concludethat thereisa difference,then thedegree ofsupport providedby the evidenceforthisconclusionisthestatisticalconclusionvalidityoftheinference.Randomsamplingandallocation playsa crucialroleinprovidingthissupport[71,75].

Ifthereissatisfactoryevidenceofanon-zerodifference,thensteptwoistoconcludethatbecausesamplingand alloca-tion havebeenrandom,inthelongrun,theonlypossiblecausalexplanationofthedifferencebetween

μY

,A and

μY

,B is

thedifferencebetweenA andB.Thisisalong-runexplanation:weassumethatthestatisticaldifferencewilloccuragainin mostreplications,andthereforeisastablephenomenon.Inthelongrun,thiscanonlybecausedbythedifferencebetween

A andB.Thedegreeofsupportprovidedforthiscausalinferenceistheinternalvalidityoftheinference.

Step threeis to generalize thistosimilar populations, e.g. the theoreticalpopulation. What isa “similar” population? This depends on the mechanism that produces the causal inﬂuence. A theoretical population of which the elements all contain the mechanism that isresponsible for theeffect of X on Y ,is likely toexhibit thesame causal relationship. In pharmacologythisiscalledamechanismofaction. Forexample,caffeinehasseveralmechanismsofaction,twoofwhichare thatitantagonizesabiochemicalcompound(adenosine)thatinhibitsneurotransmitters,andthatitincreasestheactivityof neurotransmitters suchasdopamine [76].Thesemechanismscreateacausal inﬂuenceof,forexample,drinkingcoffee on staying awake.Wecangeneralize thisexplanation toallorganismswithan architectureinwhichcaffeine cantriggerthis mechanismofaction.

Inengineeringthemechanismofactioniscalledaprincipleofoperation[48].Forexample,theprincipleofoperationof an airplaneisthatbytheshapeofitswings,airabovethewingﬂows fasterrelativetothewingthanairbelowit,which accordingtoBernoulli’sprincipleproducesupwardlift.Thisisthemechanismby whichforwardspeedcausesupwardlift. Allbodieswithasimilarshapewillexperiencethis.Thetheory“forwardspeedcausesupwardlift”isexternallyvalidinall bodieswithashapethatjustiﬁestheapplicationofBernoulli’sprinciple.

What therelevantsimilarityinstepthreeis,dependsonthearchitecturalexplanationofthecausalrelationship postu-latedinsteptwo.Ifnoarchitecturalexplanationhasbeenfound,then weshouldbecautious andgeneralize toobjectsof studyonwhichthevariablesusedinthecausalexplanationhave“similar”values.Forexample,wemaytentatively general-izeacausalrelationbetweenvariablestoprojectsofasimilarsize,costandleadtime.Withoutarchitecturalunderstanding ofthemechanismthatproducesthisrelation,thisgeneralizationhasweakersupportthanwithsuchanunderstanding.

The abovestrategy, whichisstrategy3,contains generalizationintwo steps.Instepone, thereisastatistical general-ization froma sampleto thestudypopulation.Instepthree,thereisgeneralization by similaritybetweenthetheoretical population andthestudypopulation.Oursketchofthisstrategy isidealized.Regardingstepone, whensamplingastudy population, thisisusuallydone withoutreplacement,so thatafter eachselection ofapopulation element,theremaining populationtoselectfromissmallerandtheprobabilityofpopulationelementstobeselectedisnotequal.Ifthepopulation is large compared tothe sample, the differencewith samplingwithreplacement isnot noticeable,but ifthepopulation is relativelysmall, thisrequiresa correctionfactortobe appliedtosample-based inferences[71].Inaddition,samplingis often notrandom butisbased onself-selection,which maycreatean unknown systematicbiasby whichit isnot know from whichpopulation you are samplingrandomly. Accordingto Freedman [71], thefew domains wherestatistics hasa satisfactoryapplicationsaremeasurement(e.g.precisionmeasurementsinastronomy)andgenetics.

Random allocationcanoftenbe achieved, butbecausesubjectsknowwhichtaskthey havebeenaskedtoperform, in steptwothisknowledgemustbeincludedinthefactorsthatmayhaveinﬂuencedtheoutcome[77].

Regardingstepthree,thespeciﬁcationofrelevantsimilaritybetweenpopulationsiscurrentlymainlybasedonsimilarity inthevaluesofvariablesandlessonsimilarityofarchitecturesorrepeatabilityofmechanisms[78].Aspointedoutabove, architecturalsimilaritygivesastrongerbasisforgeneralizationbysimilarity.

(11)

Fig. 7. Steps in strategy 4: quasi-experiments.

Togiveanexampleofstrategy3insoftwareengineering,weusethestudybyHannayetal.[65].Theytestedanumber ofhypothesesabouttheeffectofpersonality,expertise,taskcomplexityandcountryofresidenceonpairprogrammingina large-scalefield experimentinwhichsubjectswereprofessionalsandsettingswerenatural,buttreatmentswere artificial. Steponeshowedstatisticallydiscernable differencesinperformanceamongsamplesofpairswithdifferentexpertise,that are unlikely to be accidental. Step two identified some independent variables as plausible causes for these differences. Expertise,extraversion,andtaskcomplexitywereamongthevariablesthatcouldcausetheobserveddifferences.Stepthree considered the similarity and dissimilarity betweenthe studied samples and real-world programming pairs. The use of professional software engineers is the major similarity between the experimental setting and real-world settings in the field.However, low taskcomplexity andcontrolledgroup dynamicsreduce thegeneralizabilityofthisfieldexperimentto real-worldpairprogramming[65,p. 75].

Quasi-experiments Randomsampling andallocation isoften not achievable,andis rarely done inexperimental software engineering[79].Sincetheearly1960s,Campbellandcolleaguesdeveloped quasi-experiments,thatdonotassumerandom samplingorallocationandbywhichcausalinferencescouldbesupported[80].Generalizationtothetheoreticalpopulation is still by analogy. The strategy is now to collect suﬃcient data in step one to provide suﬃcient evidence for a causal theoryinsteptwo(Fig. 7).Thisisharderthanintherandomizedapproachofthepreviousstrategy, whererandomization allowed ustohaveasimpletheory oftheexperiment:inthelongrun,allcausesother thanthedifference intreatments areexcluded.Importantly,inrandomizedexperimentswedonothavetoknowwhatthesecausesare.Inquasi-experiments, developingacausalexplanationofsamplephenomena isconsiderablyharder,becausewemustknowidentifyallrelevant causes[81,82,50].Instep three,we generalizethe theoryof thesample tothetheoretical populationby similarityaswe didbefore.

Itishardtoﬁndexamplesofstrategy4insoftwareengineeringquasi-experiments,becausemanypublicationsdiscuss mostly step one,showinga difference, andspendlittle ornoattentionto steps two andthree,explanationand general-ization [83]. Toillustrate strategy 4,we use the articleby Laitenberger et al.[84] on three replicated quasi-experiments to compareperspective-basedreadingwithchecklist-based reading.The threeexperiments happenedinthree editionsof acourseforprofessionalsoftwareengineers,sothiswas asetofthreequasi-experiments.Theauthorsusedescriptiveand inferentialstatisticstoarguethat checklist-basedreadingismorecostlyandlesseffectivethanperspective-basedreading. Thisisadescriptionofastatisticalphenomenon,thatcorrespondstosteponeofstrategy4(Fig. 7).

Some explanations fortheseresultsare provided[84, p. 388]: Checklistsare oftenbasedon pastinformation,contain toomanyquestions,donotrequireinspectorstodocumenttheir analysis, andrequireinspectorstocheck allinformation inthedocumentforpossibledefects.Thesefactorscouldreducetheeffectivenessofchecklist-basedinspectionscompared toother inspectiontechniques.Ontheother hand,partofthe effectivenessofan inspection maynot beattributable toa readingtechniqueatall,buttothecompetenceoftheinspector.Byreadingthespeciﬁcationandcode,theinspector may ﬁnddefectsregardlessthereadingtechniquesused[84,p. 408].Allofthiscontributestoatheoryoftheexperiment, that explainstheobservations.Thiscorrespondstosteptwoofstrategy 4.

As an illustration of stepthree, we point out that the theory of the experimentonly refers to those elements of an inspectionthatarepresentinotherchecklist-basedinspectionstoo,bothintheclassroom aswellasintheﬁeld.This sim-ilaritysupportsgeneralization toother classroomsandtotheﬁeld.However, inreal-worldinspections, othermechanisms mayworkagainstthisgeneralization:Inspectorcompetencemakesinspectionsmoreeffectiveandlesscostly,even checklist-basedinspections. Furthermore,theexperimental settingusedindividual inspections,whereas inindustrialpractice,team inspectionsarecommon[84,p. 408].Therearethussomesimilarity-basedargumentsinfavorofgeneralizability,andother dissimilarity-basedargumentsthatlimitgeneralizability.

Statisticallearning In the past decades, computer-intensive methods have been developed in machine learning, pattern recognition, data mining and process mining by which statistical patterns in large samples can be discovered [85,86]. Descriptions ofthese statisticalsample phenomena can be used to predict similar phenomena in newsamples, without necessarilybeingabletoexplainwhythesemassphenomenaoccur,andwithoutnecessarilybeingable,tostateinadvance forwhichclassofphenomenatheseregularitiesoccur.Thegoalisnottogeneralizetoapopulation,buttogeneralizetothe nextfewcases.

Forexample,researchersmaydevelopaneffortestimationformulathatdescribestherelationbetweeneffort and com-plexity that has beenobserved in a large sample ofpast projects. A practitioner maychoose to use thisformula ifshe assessesherowncasesimilartothecasesthattheresearchersampledfrom.SeeFig. 8forthestepsinthisstrategy.

(12)

Fig. 9. Steps in strategy 6: building architectural theory of case phenomena in the ﬁeld.

Wenotethatsimilaritybetweenthetrainingsample,inwhichthephenomenonhasbeenidentiﬁed,andtheapplication sample, wherethedescriptionistobeapplied,isjudgedby theuser ofthedescription,anddoesnothavetobespeciﬁed bytheproducer ofthedescription.Theproducerofthedescriptionisaresearcher,andtheusermaybeanotherresearcher orapractitioner.Ofcourse,ithelpsiftheproducergivesacleardescriptionoffeatures ofthetrainingsample. Overtime, theapplicationsamplesmaybecomelesssimilartothetrainingsample.Toremainusable,thedescriptionshouldthenbe recalibratedonanewtrainingsample.

An examplefromsoftwareengineeringis theCOCOMOeffort estimationmodel.Boehmetal.[87] suggest thatan es-timating specialisttakes historicaldataofatleasttenprojectsinanorganizationinordertoanalyzeorganization-speciﬁc patterns anduse thisinformationto calibrate COCOMOequations inan organization-speciﬁc way. Thismodelis claimed to hold inthe organization aslong asit doesnot change its practices drastically. As thespecialist keeps collecting new information fromnewprojects in theorganization, the accuracyof themodel can be usually improvedover time, again assumingthereisnodrasticchangeincompanypractices.

Costestimationresearchersalsousecross-companydatamadeavailablebypublicdatabases(e.g.throughthePROMISE conference [88])orproprietary databases(suchasthat oftheInternationalSoftwareBenchmarkingStandards Group1₎_as

datasetstoanalyzeregularitiesandproposeimprovementsofcostestimationmodels.Statisticalregularitieshavebeenalso usedtocomparetheperformanceofmodelsdevelopedbyusingcross-companydataandwithin-companydata[89,90].

3.4. Case-basedgeneralization

In sample-basedstudies, thevariabilityoftherealworld isreducedby takingsample statistics asthe objectofstudy, such as the sum, mean,or standard deviation of a variable in a sample. This may reveal large-scale stable patterns of behavior. Incase-based research, variability is reducedby decomposing a single case intocomponents withinteractions, such asforexamplepeopleandrolesinaproject.Thesecomponentsandmechanismsmayberecurrentacrossalargeset ofdifferentcases,andarehenceinterestingsubjectsofgeneralization.Fig. 9showsthestepsinthisstrategy.

Acasestudycannotgivesupportforcausalexplanations,becauseevidenceforcausalityrequiresobservationsof differ-ences[74].ToshowthatdifferenceinX makesadifferencetoY ,weshouldbeabletoobservetwovaluesof X .Butasingle casestudywillshowusonlyonevalueof X .Thisisthereasonwehavetwosamplesinadifference-makingexperiment.

However,itispossibletotakeacausaltheoryfromanothersourceanduseittoexplainwhatweobservedinacase.For example,Sabherwal[91]usesagency theorytoexplaincoordinationinoutsourcing astheeffectofopportunistic behavior by thevendor. He inaddition explainswhy whichmechanisms the clientin an outsourcing contract ismore vulnerable toopportunisticbehaviorofthevendorthantheotherwayaround.Thisillustratesthatcasephenomenacanbeexplained causallyandarchitecturally.

In step three of strategy 6, we generalize to other cases architecturally. As stated before, similarity of the values of variables is aweak basis forgeneralization by analogy.Forexample, inthecasestudied by DamianandChisan [43], we mayarguethatsimilarimprovementswillbeobservedinothercaseswheresimilarcomponents(changecontrolboardand across-functionalteam)withsimilarcapabilitiesareintroduced.

As instrategy 5,usersofthisgeneralization mustassessiftheir owncaseis relevantlysimilarto thecasestudiedby DamianandChisantoapplythisgeneralizationtotheirowncase.Othercasesmaycontainadditionalmechanisms,suchas organizationaldynamicscausedbybudgetcuts,organizationalmergers,politicaltensions, etc.thatmaypreventtheeffects oftheintroductionofrequirementsengineeringtooccur.

Architectural generalizations can be made more robust against real-world variation andinterference by a process of analyticalinduction.Thiswasintroducedasacase-basedgeneralizationstrategyinsociologyinthe1930sbythesociologist Znaniecki[92].Inthesocialsciences,analyticalinductionconsistsofaseriesofcasestudies,whereallcaseshaveasimilar architecture, butalsodifferfromeachother[93–95].Ineachcasestudy,architecturalexplanationsaresoughtthatexplain phenomena inallcasesstudiedsofar.So,foraresearchertoconﬁrmthatanexplanationconstructedforonecaseisvalid

(13)

inanothercasetoo,heorshemaychoose thenextcasetobe assimilaraspossible.Ontheother hand,inorderto test therobustnessoftheexplanationunderdifferentcircumstances, aresearchermaychoosethenextcasesimilarenoughto maketheoccurrenceofthephenomenonpossible,butdissimilarenoughtomakeitplausiblethatitmayoccurbydifferent mechanisms,ornotatall.

The casestudiesby Mockus etal.[96] are an exampleofanalytical inductionacross two cases.Theyanalyzed

devel-opment and maintenance of the Apache and Mozilla open source projects. In the Apache case, they observed that the

projecthasacoreofabout10–15developerswhocontrolledthecodebaseandcreatedapproximately80%ormoreofnew functionality.Thisissteponeofstrategy 6.Theyexplainedthisarchitecturallybythefollowingmechanism[96,p. 9]:

•

(Apachemechanism): “The coredevelopers mustwork closelytogether, each withfairly detailedknowledge ofwhat othercoremembersaredoing.Withoutsuchknowledgetheywouldfrequentlymakeincompatiblechangestothecode.

Since they form essentially a single team, they can be overwhelmed by communicationand coordination overhead

issuesthattypicallylimitthesizeofeffectiveteamsto10–15people.”

TheMozillaprojectisarchitecturallysimilar,yetdifferent. Thisprojecthadacoreof22to36developerswhocoordinated their work accordingto a concretelydeﬁned process anduseda strict inspection policy,and who each hadcontrol ofa moduleandcreatedapproximately80%ormoreofnewfunctionality.Theauthorsthereforereﬁnedtheir explanation[97, p. 340]:

•

(ApacheandMozillamechanism):Opensourcedevelopmentshaveacoreofdeveloperswhocontrolthecodebase,and willcreateapproximately80%ormoreofthenewfunctionality.Ifthiscoregroupusesonlyinformaladhocmeansof coordinatingtheirwork,thegroupwillbenolargerthan10to15people.

Notethatthisreﬁnedexplanationaddressesbothcases.

3.5. Summary

Tosummarize,generalization strategiescanbeclassifiedaccordingtotheir sourceandtarget,andaccordingtowhether theyareaboutsamplephenomenaoraboutcasephenomena.Field-to-labgeneralizationsarepossibletoo,butwehavenot discussed them.Basic science usually generalizesfromlab tolab, requiringthe researcherto eliminatein thelaboratory much of thevariability of thereal world, to approximatethe idealizations requiredby fundamental lawsof nature. The engineeringsciencesusuallygeneralize fromlabtofield.Theysacrificeuniversalitybyincorporatingconditionsofpractice intheirgeneralizations.

Ineachofthesestrategies,wecanstudysamplesorcases.Sample-basedresearchdeals withthevariabilityofthereal world bystudying aggregatephenomena ofsamples,in whichindividual randomvariation cancelsout. Thereare various waysto dothat, usingrandomization,quasi-experimentation,orcomputer-intensive methods.In case-basedresearch, the variabilityoftherealworldisreducedbydecomposingacaseintocomponents,thatcanproducecasephenomenabytheir interactions. Generalizationshould be basedon architecturalsimilarity, andgeneralizationscan be mademore robust by analyticalinduction.

4. Conclusions,relatedworkandimplications

4.1. Conclusions

Thispaperhasthreecontributions.Ourﬁrstcontributionistoemphasizetheutilityofmiddle-rangetheoriesthatbalance generalitywithpracticality(Fig. 2).

Oursecond contributionisthatwerecognizemodelsastheories.Thisopenstheroadtoacceptingarchitecturalmodels

as theories. Architectural models supplement causal explanations of phenomena by showing how a causal inﬂuence is

produced by anunderlying mechanism. Architecturalmodelsare also usefultodeﬁne therelevant similarityrelationship whengeneralizingfromacasetoatheoreticalpopulation,orfromastudypopulationtoatheoreticalpopulation.

Third,wehaveidentiﬁedsixgeneralizationstrategiesusefulfortheoriesofthemiddle-range.Thecharacteristicstrategy of basicscience is lab-to-lab generalization, andthe characteristicstrategy for engineeringsciencesis lab-to-ﬁeld gener-alization. Each of these strategies can be done in a case-based and a sample-based way. For case-based generalization we have argued that architectural models allow better support forjudging similaritybetween casesthan variable-based theoriesdo.Finally, wehave reviewedthreesample-based strategies,namelyrandomized difference-makingexperiments, quasi-experiments,andstatisticallearningstrategies.

4.2. Relatedwork

OurviewoftheoriesissimilartothatofSjobergetal.[46],whopresentatheoryofUML-baseddevelopmentconsisting of (1)a conceptual framework, (2) propositions,(3) explanations, and(4) an indication of scope.Their propositions and explanationsseemtobesimilartoourcausalandarchitecturalexplanations,respectively.

(14)

Fig. 10. Comparison with Gregor’s[98]analysis of scientiﬁc theories.

Fig. 11. Comparison with Gregor and Jones’[99]analysis of design theories.

ComparisonofouranalysiswithGregor’s[98]analysisoftheoriesininformationsystemsresearch(Fig. 10)revealsthat weagreeontheimportanceofconstructs,generalizationsandscope.However,Gregorignorestheroleofmodelsintheories anddoesnotrecognizearchitecturalexplanations.Wedonotregardnotationstobepartofatheory,andwedonotrequire a theory to give causal explanations. Testable propositionsmay followfrom a theory in particularcases, butwe donot regard themaspartofatheory.Andwe donotthinkthat prescriptivetheoriescan exist.Theoriescanbe usedtojustify designdecisions,butdonotprescribethem.Wecanuseascientiﬁctheorytopredicttheeffectofadecision,butnottotell

uswhatwemustdo.

Gregor andJones [99] outline a structure fordesign theories, inwhich testable propositions seem to havetaken the placeofgeneralizations(Fig. 11).Theelementsinthelowerpartofthetablearenotrequiredtobepartofatheoryinour approach.Inthetabletheyarebrieﬂyexplained.

Seddon andScheepers [100] provide a roadmap for generalizations in information systems research, that is relevant forsoftwareengineeringresearchtoo.Theyidentifyanalyticalinduction(called“analyticalgeneralization”bythem,justas Yin[95] does)asoneoftheimportantroutestotheoreticalgeneralizationfromsamplesorfromcases,aswedotoo.They emphasizecombinationofresearchﬁndingswithpriorknowledge,withwhichweagree.

SeddonandScheepersprovideamoreelaborateanalysisofdifferentkindsofstatisticalgeneralizationthanwedo.They emphasizethatstatisticalinferencebasedontheCentral-LimitTheorem,usinge.g. p-valuesorconﬁdenceintervals,should satisfytheassumptionofthattheorem,namelyrandomsamples.Theyprovideausefulanalysisofstrategiestofollowsome kindofstatisticalinferencefornon-random sampling,that complementsours.Inparticularthey discussthe possibilityto use Bayesian statistics, which we do not discuss. Seddon andScheepers do not consider the possibility of architectural explanationsanddonotemphasizetheneedformiddle-rangetheoriesasmuchaswedo.

LeeandBaskerville[101]identifyfourkindsofgeneralization,whichtheycallEE,TT,ETandTE.TheirtypeEEis,inour terms,generalizingfromcasetocase.TheirtypeTTis,inourterms,theextensionofaconceptualframeworkwithacausal explanation.TheirtypeETis,inourterms,thecausalexplanationofaphenomenon,andtheirtypeTEistheapplicationof apriorcausaltheorytoaphenomenon.

4.3. Implications

Theimplicationsofthispaperforempiricalsoftwareengineeringaresimple:Sincesoftwareengineeringisan engineer-ing science,all observationsandconclusions ofthispaperapplyto softwareengineeringtoo.Theexamples thatwe have givenofallthepointsmadeinthispaperillustratethis.

The generalization strategies discussed inthispaper are strategiesto acquire generalizable knowledge aboutsoftware engineeringartifactsincontext,wheretheseartifactsmaybenovelormayhavebeenusedinpracticeforalongtime.As we statedbefore,scalinguptopracticeisnot amodelfortechnologytransferbutawaytoestablishknowledgeinthelab andthen generalize itto theﬁeld. Thesample-based strategiesallow researcherstoestablishknowledge aboutstatistical regularities and, using causal reasoning, to establish causal explanations of these regularities. The case-based strategies allow researcherstoestablisharchitecturalknowledge aboutthemechanismsthat producephenomena.Inparticular,they canestablishknowledgeaboutthemechanismsthatcanproducecausalrelationshipsinthelaborintheﬁeld.

Toestablishtheoreticalknowledge,weneedtogeneralizetoatheoreticalpopulation,andtodothat weneedadequate knowledge oftherelevantarchitecturalsimilarityrelationthatdeﬁnesthetheoreticalpopulation.Differentgeneralizations will probablyneed differentarchitecturalsimilarities. Architecturalsimilarity isa worthyresearch topicthat can onlybe performedincooperationwithpractitioners[102,103].Therearesomepublished resultsaboutthe(dis)similaritybetween

(15)

softwareengineeringstudentsandsoftwareengineeringprofessionals[104–106],butmoreneedstobedone.Thereare in-terestingnewdevelopmentssuchastheInternationalWorkshoponReplicationinEmpiricalSoftwareEngineeringResearch (RESER) [107]. The implicationof thispaperfor replication isthat the deﬁnitionof replication requiresthe deﬁnitionof anarchitecturalsimilarityrelationship.Therealsubjectofreplicationsshouldthereforebethearchitecturalexplanationsof phenomena.Replicationsshouldbetheory-based.

ThebooksbyDavis[30]andEndresandRombach[31]presentapatchworkofprinciples,observations,lawsandtheories thatall callformoreempiricalresearch tovalidateandelaboratethem.Empiricalresearch intotheseclaims ismorethan replicationofearlierresults.Itistheinvestigationofthemechanismsthatcouldexplaintheeffectsdescribedbythetheories in this patchwork,and the investigation of the limits ofthe scope ofthese theories, using the generalization strategies discussedinthispaper.

References

[1]N.Cartwright,TheDappledWorld.AStudyoftheBoundariesofScience,CambridgeUniversityPress,1999.

[2]C.Craver,Structureofscientiﬁctheories,in:P.Machamer,M.Silberstein(Eds.),TheBlackwellGuidetothePhilosophyofScience,Blackwell,2002, pp. 55–79.

[3]R.Nola,H.Sankey,TheoriesofScientiﬁcMethod,Acumen,2007.

[4]R.Merton,Thenormativestructureofscience,in:SocialTheoryandSocialStructure,enlargededition,TheFreePress,1968,pp. 267–278. [5]A.Cournand,M.Meyer,Thescientist’scode,Minerva14 (1)(1976)79–96.

[6]S.Gordon,HistoryandPhilosophyofSocialScience,Routledge,1992.

[7]C.Hempel,AspectsofScientiﬁcExplanationandOtherEssaysinthePhilosophyofScience,FreePress,1965. [8]E.Nagel,TheStructureofScience,RoutledgeandKeganPaul,1961.

[9]R.Giere,Theories,in:W.Newton-Smith(Ed.),ACompaniontothePhilosophyofScience,Blackwell,2000,pp. 515–524.

[10]P.Machamer,Abriefhistoricalintroductiontothephilosophyofscience,in:P.Machamer,M.Silberstein(Eds.),TheBlackwellGuidetothePhilosophy ofScience,Blackwell,2002,pp. 1–17.

[11]P.Godfrey-Smith,TheoryandReality.AnIntroductiontothePhilosophyofScience,TheUniversityofChicagoPress,2003.

[12]Z.Li,N.Madhavji,S.Murtaza,M.Gittens,A.Miranskyy,D.Godwin,E.Cialini,Characteristicsofmultiple-componentdefectsandarchitecturalhotspots: alargesystemcasestudy,Empir.Softw.Eng.16(2011)667–702.

[13]P.Machamer,L.Darden,C.Craver,Thinkingaboutmechanisms,Philos.Sci.67(2000)1–25.

[14]W.Bechtel,A.Abrahamsen,Explanation:amechanisticalternative,Stud.Hist.Philos.Biol.Biomed.Sci.36(2005)421–441. [15]S.Glennan,Mechanismsandthenatureofcausation,Erkenntnis44(1996)49–71.

[16]M.Bunge,Howdoesitwork?Thesearchforexplanatorymechanisms,Philos.Soc.Sci.34 (2)(2004)182–210.

[17] P. Hedström, R. Swedberg (Eds.), Social Mechanisms. An Analytical Approach to Social Theory, Cambridge University Press, 1998.

[18]R.Merton,A.Kitt,Contributionstothetheoryofreferencegroupbehavior,in:R.Merton,P.Lazarsfeld(Eds.),ContinuitiesinSocialResearch:Studies intheScopeandMethodof“TheAmericanSoldier”,FreePress,1950,pp. 40–105.

[19]P.Thagard,Explainingdisease: correlations,causes,and mechanisms, in:F.Keil,R. Wilson(Eds.),Explanation and Cognition,MIT Press,2000, pp. 255–276.

[20]J.Hannay,D.Sjøberg,T.Dybå,Asystematicreviewoftheoryuseinsoftwareengineeringexperiments,IEEETrans.Softw.Eng.30 (2)(2007)87–107. [21]A.v.Mayrhauser,A.Vans,A.Howe,Programunderstandingbehaviorduringenhancementoflarge-scalesoftware,J.Softw.Maint. Evol.9(1997)

299–327.

[22]R.Laymon,Experimentationandthelegitimacyofidealization,Philos.Stud.77(1995)353–375.

[23]R.Wieringa,DesignScienceMethodologyforInformationSystemsandSoftwareEngineering,Springer,2014.

[24]E.McMullin,Acaseforscientiﬁcrealism,in:J.Leplin(Ed.),ScientiﬁcRealism,UniversityofCaliforniaPress,1984,pp. 8–40. [25]E.McMullin,Galileanidealization,Stud.Hist.Philos.Sci.16 (3)(1985)247–273.

[26]N.Cartwright,HowtheLawsofPhysicsLie,OxfordUniversityPress,1983.

[27]R.Laymon,Applyingidealizedscientiﬁctheoriestoengineering,Synthese81(1989)353–371. [28]M.Boon,Howscienceisappliedintechnology,Int.Stud.Philos.Sci.20 (1)(2006)27–47.

[29]R.Merton,Onsociologicaltheoriesofthemiddlerange,in:SocialTheoryandSocialStructure,enlargededition,TheFreePress,1968,pp. 39–72. [30]A.Davis,201PrinciplesofSoftwareDevelopment,McGraw-Hill,1995.

[31]A.Endres,D.Rombach,AHandbookofSoftwareandSystemsEngineering:EmpiricalObservations,LawsandTheories,PearsonAddisonWesley,2003. [32]P.VanStrien,Towardsamethodologyofpsychologicalpractice:theregulativecycle,TheoryPsychol.7 (5)(1997)683–700.

[33]T.Huynh,J.Miller,Anempiricalinvestigationintoopensourcewebapplications’implementationvulnerabilities,Empir.Softw.Eng.15 (5)(2010) 556–576.

[34]R.Wieringa,M.Daneva,N.Condori-Fernandez,Thestructureofdesigntheories,andananalysisoftheiruseinsoftwareengineeringexperiments,in: InternationalSymposiumonEmpiricalSoftwareEngineeringandMeasurement(ESEM),IEEEComputerSociety,2011,pp. 295–304.

[35]M.Parascandola,D.Weed,Causationinepidemiology,J.Epidemiol.CommunityHealth55(2001)905–912. [36]J.Goldthorpe,Causation,statistics,andsociology,Eur.Sociol.Rev.17 (1)(2001)1–20.

[37]M.M.Marini,B.Singer,Causalityinthesocialsciences,Sociol.Method.18(1988)347–409. [38]J.Woodward,MakingThingsHappen.ATheoryofCausalExplanation,OxfordUniversityPress,2003.

[39]J.Woodward,Causationandmanipulability,in:E.N.Zalta(Ed.),TheStanfordEncyclopediaofPhilosophy,summer2013edition,2013.

[40]P.Hedström,R.Swedberg,Socialmechanisms:anintroductoryessay,in:P.Hedström,R.Swedberg(Eds.),SocialMechanisms.AnAnalyticalApproach toSocialTheory,CambridgeUniversityPress,1998,pp. 1–31.

[41]S.Glennan,Rethinkingmechanisticexplanation,Philos.Sci.69(2002)S342–S353.

[42]P.McKayIllari,J.Williamson,Whatisamechanism?Thinkingaboutmechanismsacrossthesciences,Eur.J.Philos.Sci.2(2012)119–135. [43]D.Damian,J.Chisan,Anempiricalstudyofthecomplexrelationshipsbetweenrequirementsengineeringprocessesandotherprocessesthatleadto

payoffsinproductivity,qualityandriskmanagement,IEEETrans.Softw.Eng.32 (7)(2006)433–453.

[44]W.Bechtel,R.Richardson,DiscoveringComplexity:DecompositionandLocalizationasStrategiesinScientiﬁcResearch,MITPress,2010,reissueofthe 1993editionwithanewintroduction.

[45]A.Kaplan,TheConductofInquiry.MethodologyforBehavioralScience,TransactionPublishers,1998,ﬁrstedition1964byChandlerPublishers. [46]D.Sjøberg,T.Dybå,B.Anda,J.Hannay,Buildingtheoriesinsoftwareengineering,in:F.Shull,J.Singer,D.Sjøberg(Eds.),GuidetoAdvancedEmpirical