• No results found

Speaking Truth to Power:  an exploration of the evaluation department of the Dutch Ministry of Foreign Affairs (2019)

N/A
N/A
Protected

Academic year: 2021

Share "Speaking Truth to Power:  an exploration of the evaluation department of the Dutch Ministry of Foreign Affairs (2019)"

Copied!
112
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Speaking Truth to Power

An exploration of the evaluation department of the Dutch Ministry of

Foreign Affairs (2019)

By Lotte Levelt University of Amsterdam

Supervisor: Dr. N (Nicky) R. M. Pouw

May 28, 2020 Word count: 29982

Research MSc International Development Studies Faculty of Social and Behavioural Sciences

(2)

Contents

Contents ... 2

List of tables and figures ... 4

Abbreviations and acronyms ... 5

Acknowledgements ... 7

Abstract ... 8

1. Introduction ... 9

1.1 Background: Problem statement and relevance ... 9

1.2 Research gaps and contributions ... 10

1.3 Research questions ... 10

1.4 Structure of the thesis ... 11

2. A review of approaches to poverty and Development Cooperation ... 12

2.1 Introduction ... 12

2.2 Theoretical approaches to poverty and development ... 12

2.3 Practical approaches to Development Cooperation ... 15

2.4 Effectiveness of Development Cooperation practice ... 20

2.5 Conclusion ... 24

3. Evaluating, learning and policymaking ... 25

3.1 Introduction ... 25

3.2 Schools of thought: Quantitative, qualitative and mixed analysis ... 25

3.3 Critical review of evaluation techniques ... 29

3.4 Evaluation use and learning in a policy-setting ... 34

3.5 Conclusion ... 38

4. Research design ... 40

4.1 Ontological and epistemological background ... 40

4.2 Research methodology ... 40

4.3 Positionality and ethical considerations ... 45

4.4 Limitations of the study: What is not researched? ... 46

4.5 Conceptualisation and operationalisation ... 46

4.6 Conceptual model ... 49

4.7 Operationalisation table ... 50

5. Research setting ... 51

5.1 Introduction ... 51

5.2 The Netherlands and its Ministry of Foreign Affairs ... 51

5.3 Background of the evaluation unit (IOB) ... 53

5.4 IOB’s setting in relation to research questions ... 56

5.5 Conclusion ... 57

6. How are IOB’s evaluations of Dutch foreign policies and programmes designed? ... 59

6.1 Introduction ... 59

6.2 Chronological overview of an evaluation trajectory ... 59

6.3 Factors arising in an evaluation trajectory ... 63

6.4 Conclusion ... 66

7. Which actors are involved in the design of these evaluations and what are their assumptions? ... 67

7.1 Introduction ... 67

7.2 Stakeholders involved in the evaluation trajectory ... 68

(3)

7.4 Views of and roles of IOB ... 73

7.5 Conclusion ... 76

8. What lessons are drawn from the case study evaluation and what, if any, adjustments follow? ... 77

8.1 Introduction ... 77

8.2 Background to the Reconstruction evaluation ... 78

8.3 Lessons of the Reconstruction evaluation and subsequent learning ... 79

8.4 Conclusion ... 84

9. Conclusion ... 86

9.1 Answering the research question: a synthesis of findings ... 86

9.2 Discussion: linking findings to theory and contributions to academic debate ... 88

9.3 Methodological reflections and limitations ... 92

9.4 Back to the future: advice for M&E practitioners and avenues for research ... 94

References ... 97

Appendix I. Data sources ... 106

Appendix II. Interview guide (example) ... 110

(4)

List of tables and figures

List of tables

Table 1. Types of evaluation use and learning p. 36

Table 2. Chart for linking research questions to data sources and methods p. 41

Table 3. Operationalisation p. 50

Table 4. Main stakeholders of IOB department p. 69

Table 5. Results: Fields of study of IOB department p. 71

Table 6. Results: Typology of IOB’s roles p. 75

Table 7. Results case study: Types of evaluation use and learning p. 85

List of figures

Figure 1. Financial flows to developing countries p. 17

Figure 2. Distribution of strict outcomes analysed by Vivalt (2019) p. 22

Figure 3. Example of a Theory of Change p. 29

Figure 4. Schematic overview of the triangulation design p. 42

Figure 5. Conceptual model p. 49

(5)

Abbreviations and acronyms

Alphabetical list of abbreviations and acronyms appearing in the thesis, followed by their meaning and, where necessary, translations (English, Dutch).

Organisations and departments

3IE International Initiative for Impact Evaluation

CDA Dutch Christian centre-right party (Christen Democratisch Appèl) D66 Dutch Progressive centre party

DAC Development Assistance Committee (part of OECD)

DGBEB Directorate-General for Foreign Economic Relations (part of Min. FA) DGPZ Directorate-General for Political Affairs (part of Min. FA)

DGIS Directorate-General for International Cooperation (part of Min. FA) DSH Department for Stabilisation and Humanitarian aid (part of Min. FA) DSO Department for Social Development (part of Min. FA)

ESA Strategic Policy Unit (Eenheid Strategische Advisering)

FEZ Quality Control and Supervision department (part of Min. FA) IMF International Monetary Fund

IOB International Research & Policy Evaluation (Intl. Onderzoek & Beleidsevaluatie) IOV Inspectie Ontwikkelingssamenwerking te Velde (former name of IOB)

IPA Innovations for Poverty Action

Min. FA (BZ) Ministry of Foreign Affairs (Ministerie van Buitenlandse Zaken) NGO Non-Governmental Organisation

OECD Organisation for Economic Co-operation and Development PvdA Dutch Labour Party (Partij van de Arbeid)

RVO Netherlands Enterprise Agency (Rijksdienst Voor Ondernemen) SER Social and Economic Council (Sociaal-Economische Raad) UNDP United Nations Development Programme

USAID United States Agency for International Development VNO-NCW Dutch Employers’ Federation

(6)

M&E Terminology

AfT Aid for Trade CT Cash Transfer

CCT Conditional Cash Transfer CSR Corporate Social Responsibility

DC (OS) Development Cooperation (Ontwikkelingssamenwerking) FGT-indices Foster-Greer-Thorbecke indices

GDP Gross Domestic Product GNI Gross National Income GNP Gross National Product HDI Human Development Index IE Impact Evaluation

MDG Millennium Development Goal M&E Monitoring and Evaluation MPI Multidimensional Poverty Index ODA Official Development Assistance PADev Participatory Assessment of Poverty PPP Purchasing Power Parity

RCT Randomised Controlled Trial SDG Sustainable Development Goal

SRHR Sexual and Reproductive Health and Rights ToC Theory of Change

ToR Terms of Reference

UCT Unconditional Cash Transfer WASH Water, Sanitation and Hygiene

(7)

Acknowledgements

First and foremost, I would like to express gratitude to my supervisor, dr. Nicky Pouw, whose insights and expertise encouraged me to go the extra mile. Not only did she provide academic guidance, she also offered me opportunities to give guest lectures and connected me to inspiring individuals (at the Ministry of Foreign Affairs and the University). I experienced her as an ambitious supervisor, all the while maintaining an eye for personal wellbeing. As such, she has helped me grow, not only as a researcher, but as an individual, for which I am very thankful: It is this balance that has paved the way for this thesis.

Furthermore, I am very grateful for having spent my field research with IOB, at the Ministry of Foreign Affairs. Their curiosity, warm welcome and eager participation in my thesis project was very motivating. Thank you to Rob van Poelje and Wendy Asbeek-Brusse, for supporting me and giving me the opportunity to spend time with IOB. Thank you to all IOB researchers who were willing to be interviewed by me (everyone accepted my invitation!). My experience at IOB has been a truly inspiring one; I have learned a lot about the development sector, monitoring and evaluation and the challenges of organising moments of reflection and learning in a political organisation like a Ministry. I am particularly grateful for the advice and guidance I received from Caspar Lobbrecht: Thank you for challenging my assumptions, for providing me with constructive feedback, and, above all, for the fun and humour you have brought to the table. I am sure we will meet again, because the last word has not been said about development cooperation…

Moreover, I am thankful for the policymakers and directors at DSH, whom I interviewed about a somewhat sensitive evaluation, and whose cooperation illustrated their great integrity, skill and conviction. I also owe gratitude to my second reader; dr. Hebe Verrest, whom I have come to know through my work as a teaching assistant. I admire her passion for education and look forward to her academic insights and critical questions during the defence.

Finally, I’d like to thank my running shoes and choir for ‘taking me out my head’, effective altruism for giving me a place to combine my head and heart, and above all: my dear friends, my sister Lisa and Birgitte and Simon for having my back, always.

(8)

Abstract

The rise of ‘evidence-based’ Development Cooperation has provoked two concerns in academia and practice: Firstly, the existing body of evaluation research is predominantly quantitative, which risk portraying false objectivity. Secondly, such studies overemphasise accountability at the cost of learning. Hence, this exploratory study aims to firstly, explore underlying assumptions of evaluators and secondly, refocus attention to learning by analysing an evaluation’s follow-up. Further, policy learning scholarship is dominated by survey-based research and cases in which learning did happen. Hence, this study contributes to a methodological research gap, using a mixed methodology and selecting a case in which learning outcomes were yet unknown. To this end, the study is set within the Evaluation Department of the Dutch Ministry of Foreign Affairs. This is a quintessential setting, historically being one of the first countries to evaluate government spending on Development Cooperation. The study employs a triangulation design, using semi-structured interviews with evaluators and policymakers, and participant observations. A subsequent content analysis of these interviews reveals the backgrounds and preferences of evaluators to be principally qualitative, and finds the assumptions of evaluators to be influenced by study background and work experiences in developing countries. These findings result in a typology of the evaluation department’s five roles within the Ministry, ranging from knowledge broker to advisor and being a critical voice. Further, the case study finds three main ways in which the evaluation is used: symbolical, instrumental and empowerment use. Hence, the study responds to ongoing academic debate by nuancing the image of evaluators as predominantly quantitative, and shows how an evaluation could stimulate learning by analysing the learning efforts policymakers take in its aftermath. This study recommends the use of the typology of evaluation department roles, of which the knowledge broke role is especially feasible. Hereby, the study contributes to improved learning, and hence performance of the Ministry’s Development Cooperation endeavours.

Keywords:

Evaluation – development cooperation – The Netherlands – institutional learning – Ministry of Foreign Affairs – policy adjustment

(9)

1. Introduction

1.1 Background: Problem statement and relevance

A major buzzword in current International Development practice and academia is ‘evidence-based’ policy and intervention (White & Raitzer, 2017). Following discussions of aid effectiveness of the 1990s and 2000s, a shared recognition has emerged amongst scientists and professionals that learning and accountability should be central concerns (Easterly, 2007; Doucouliagos & Paldam 2008). Banerjee and Duflo (2011) famously pioneered these concerns in their experimental poverty research. One straightforward way in which actors and institutions in the International Development sector attempt to be (more) evidence-based, is through evaluation of policies and programmes.

Evaluations, however, can be viewed in light of ‘political arithmetic’, a school of thought that is concerned with deciphering the assumptions underpinning statistics and associated methods, most notably, macroeconomic indicators (Mügge, 2019). The majority of evaluations are Impact Evaluations (White & Raitzer, 2017). Due to their overwhelmingly quantitative nature, they run the risk of portraying false objectivity. Evaluators’ training in particular schools of thought may influence their methodological approaches, as Ravaillon shows in his fictive article about evaluators (Ravaillon, 2001). Further, Notten (2016) finds, based on a sample of European countries, that a choice of poverty indicator influences the effect sizes (‘impacts’) of programmes. Besides various discursive trends surrounding ‘what works’, her study shows choices by evaluators do influence the reported outcomes (Notten, 2016). It is therefore important to understand what schools of thought evaluators are part of, to then understand the choices they make when evaluating.

Furthermore, evaluations generally serve to improve accountability and learning by development actors and institutions themselves (Clements, Chianca & Sasaki, 2008). Since the early 2000s, evaluations have been predominantly quantitative, a trend which, in The Netherlands, is tied to the shift to the Aid and Trade agenda (Min. van Buitenlandse Zaken, 2013). Quantitative evaluations tend to focus on accountability- between donors, implementing organisations and beneficiaries, not paying sufficient attention to the learning purpose evaluations generally also intend to serve. Rarely is the eventual uptake of lessons, drawn from evaluations, analysed. Further, most studies in the policy and learning realm are survey-based, and focus on cases where learning did happen, which means current scholarship lacks detailed description of learning processes between individuals and skews our perception of policy adjustment (Moyson et al., 2017). Hence, this study

(10)

addresses those gaps and concerns by focusing on learning (rather than accountability), using a mix of qualitative methods and analysing a case study evaluation of which outcomes were unknown.

Not only are the abovementioned issues academic puzzles, they also, more importantly, bear societal significance for two key reasons. First and foremost is the humanitarian, (or even ethical) concern, that researching if and where development efforts can be improved, either quantitatively (e.g. reaching more people) or qualitatively (e.g. improving the types of development endeavours), this is a worthy undertaking. Secondly, the Dutch Ministry of Foreign Affairs is a primary donor of Official Development Assistance, constituting approximately 0.7% of the GNI of the Netherlands (OECD, 2016). Gaining insight into how these funds are spent, and whether they can be allocated better, constitutes an economic case for improved learning.

In short, this thesis will use a mixed-method approach to study the two abovementioned research problems: the underlying assumptions and backgrounds of evaluators and the learning capacity of their evaluations.

1.2 Research gaps and contributions

This study will address the above issues by exploring the evaluation process of the Policy and Operations Evaluation Department, ‘Internationaal Onderzoek en Beleidsevaluatie’ (hereafter: IOB) of the Dutch Ministry of Foreign Affairs. The thesis contributes to academic as well as professional debates in international development studies, by addressing the following knowledge gaps:

§ Deciphering the backgrounds and underlying assumptions of actors involved in evaluations. § Refocusing attention from accountability to learning by analysing the follow-up of an evaluation. To this end, the study will attempt to answer the following research question: “How does IOB evaluate Dutch Development cooperation policy and to what, if any, adjustments does this lead?”

1.3 Research questions

In order to answer the above-mentioned research question, I will answer the following sub questions:

(11)

Q2. Which actors are involved in the design of these evaluations and what are their assumptions? Q3. What lessons are drawn from the case study evaluation and what, if any, adjustments follow?

In order to describe how IOB evaluates and to what potential changes they may lead, the study’s sub questions follow a typical chronological trajectory of evaluation. Firstly, evaluations will be ‘deciphered’, by tracing their construction. The goal of this first section is to level the playing field of IOB evaluation. IOB’s scope of evaluation covers the entirety of Dutch foreign policy, which is why sub question 1 considers a variety of example evaluations, which is broader than Development Cooperation alone. Secondly, the thesis moves to more analytical questions of which actors are involved and what their main background and assumptions are. Though this study cannot infer causality, it is explored how and through which mechanisms, according to evaluators and policymakers themselves, their backgrounds and points of view potentially shape evaluations. Third and finally, the thesis uses a case study evaluation to explore the final stages of an evaluation. By following one evaluation up close, and focusing on the response and actions taken by the Ministry of Foreign Affairs, rather than the concrete results of the evaluation, this study sheds light on the learning capacity of an evaluation, rather than its often-served purpose of accountability.

In short, the research questions are chronologically ordered to mimic the evaluation trajectory, moving from a descriptive question on the evaluation process, to actors and their views and finally the follow-up of an evaluation in the Ministry.

1.4 Structure of the thesis

The remainder of the thesis will be structured as follows: Chapter 2 sketches the academic landscape of approaches to development cooperation, as well as common poverty reduction strategies and their effectiveness. Moving from theory to practice, Chapter 3 describes the recent and ongoing rise of M&E in Development Cooperation. Chapter 4 outlines the research design of the study, including its methodology, epistemological background and operationalisation, followed by a description of the study’s setting in Chapter 5. The following chapters present the results of the thesis: Chapter 6 outlines the IOB evaluation trajectory, Chapter 7 reveals the evaluators’ backgrounds and assumptions, and Chapter 8 describes the lessons learned and follow-up of an exemplary evaluation. Finally, Chapter 9 contains the conclusion, consisting of: i) the answer to the main question, ii) findings in relation to theory and contributions to academic debate, iii) methodological reflections and limitations and finally, iv) advice for M&E professionals.

(12)

2. A review of approaches to poverty and Development Cooperation

2.1 Introduction

The purpose of Chapters 2 and 3 is to set the scholarly scene of this study by critically reviewing existing studies. Chapter 2 specifically, first discusses academic literature following approaches to poverty and international development throughout history (2.2). Secondly, it compares and contrasts a variety of current archetypical development cooperation strategies (2.3). Third and finally, it reviews state-of-the-art research into effectiveness, i.e. ‘what works’ and ‘what does not work’ (2.4). Section 2.5 then concludes by summarising the chapter and highlighting the research gaps this study aims to fill. The chapter bridges to Chapter 3, which elaborates on evaluation, learning and policy change.

2.2 Theoretical approaches to poverty and development

Poverty throughout history: from economic growth (1940s) to the capability approach (1990s)

This section starts from the work of Polanyi (in 1944), as this period marks the end of WW2 and the establishment of key multilateral organisations like the United Nations, IMF and the World Bank. In his magnum opus “The Great Transformation” Polanyi argues how the economy is historically and socially embedded (Polanyi, 1944). In doing so, he is one of the first opponents to neoliberal economics of Hayek, who perceives economic growth as key to human progress (Hayek, 1946).

During the 1950s and 1960s modernisation theory (e.g. Rostow’s “Stages of Growth”) reigns: the prevailing discourse became that countries would transform progressively through linear stages,

from a traditional to a modern society (Rostow, 1959). Poverty is treated as a residual problem,

typically implying people failing in the market economy. Thereby, the onus is put on the poor themselves. This notion is illustrated by the UN’s first Development Decade in 1961, which aimed at 5% annual economic growth in developing countries, instead of addressing poverty reduction explicitly (Hulme, 2014). Besides economic growth, the 1960s also marked the Green Revolution: technology transfer initiatives to developing countries aimed expanding food production to reduce malnutrition. Though these were somewhat successful in a number of Latin American and Asian countries, they failed to prevent the occurrence of famines in the African continent caused by

(13)

entitlement failures (Sen, 1982; Toennissen, Adesina & de Vries, 2008). Although life expectancy and literacy improved, mass poverty remains the norm in many Asian and African countries in the 1960s. A new conceptualisation, introduced by Latin American economists such as Prebisch and Singer, called ‘dependency theory’, suggests that many ‘peripheral’ countries are poor because of their relations (e.g. trade barriers) with ‘core’ countries like the U.S (Prebisch, 1962; Singer, 1950). In short, they define poverty as a structural process, resulting from geopolitical power differences embedded in international relations (Hulme, 2014).

In the 1970s, international agencies begin to focus on poverty directly, e.g. when the International Labour Organisation introduces a ‘basic needs’ approach, which prioritises meeting the basic needs of all people. Those needs are variously defined, but they commonly entail universal provision of nutrition, health and education services (Stewart, 1985). The 1970s also marks the introduction of the term Development Cooperation, in an effort to increase ownership of developing countries. However, this broadening proves short-term: in the 1980s, the concept of poverty is side-lined to prioritise economic growth, in conjunction with the infamous Structural Adjustment Programmes of the World Bank. By the end of the 1980s, UNICEF becomes concerned with setbacks in public health and education arising in developing countries, especially among women and children. In their 1988 publication, “Adjustment with a Human Face”, they encourage the World Bank to refocus on poverty in their adjustment policies (Jolly, 1991).

In the 1990s, alternative ways of defining poverty move away from growth to theories of human development and gender equality (Nussbaum, 2001). A notable work of this time is the capability approach by Sen, who states: ‘poverty is not just a lack of money, it is not having the capability to realise one’s full potential as a human being’ (Sen, 1999: p. 11). The work of Sen and Nussbaum, among others, has inspired measures of poverty such as the Human Development Index, which will be discussed in section 3.1.

Current theories: from multidimensional poverty to inclusive development, 1990s-present

Over the past decades, the broadening of the definition of poverty has stuck in international development scholarship (Kanbur & Squire, 1999; Spicker, 2007). This shows in current definitions of poverty by the UN, which uses a capability approach: “…poverty can also mean the denial of opportunities and choices most basic to human development. To lead a long, healthy, creative life. To have a decent standard of living. To enjoy dignity, self-esteem, the respect of others and the

(14)

things that people value in life. Human poverty thus looks at more than a lack of income.” (UNDP, 1998: p. 25). Building on Sen’s capability approach, Alkire conceptualises poverty as multidimensional, consisting of a variety of aspects ranging from access to medical care to school attendance and access to sanitation. (Alkire et al., 2015). She criticises unidimensional, measuring solely income, indicators for it neglects the multifacetedness of poverty. Her multidimensional poverty index (MPI), which measures three dimensions of poverty (health, education and living standard, with a total number of 10 indicators, is based on the capability approach, and will be further discussed in section 3.2 (Alkire et al., 2015).

Following increased multidimensionality in conceptual discussions of poverty, which helped to observe the heterogeneity below the poverty line, the subjective experience of poverty started to gain attention (Krishna, 2007). Previous poverty research was predominantly objective, measured and conceptualised by an outsider (e.g. a researcher) based on observable measures, instead of by the poor person him or herself (Hulme, 2014). In an effort to capture the dynamics of poverty from the viewpoint of the poor themselves, and to move beyond income, the World Bank starts collecting their experiences in country studies in the 1990s and 2000s. This effort resulted in a three-volume study called ‘Voices of the Poor’, presenting Participatory Poverty Assessments carried out in various countries (Narayan, Chambers, Shah & Petesch, 1999). Shifting the focus to the poor themselves, Krishna studies subjective poverty, i.e. poverty as described and viewed by those experiencing it (Krishna, 2007). Subjective definitions and measuring tools of poverty are developed, such as the Ladder of Life, a focus group method which explores participants’ understanding of different groups and their wellbeing in a community (Narayan & Petesch, 2005; Petesch, 2018).

Furthermore, the 2000s marks the adoption of the Millennium Development Goals (MDGs), submerging poverty into the broad goal of sustainable development, focused on the three pillars of economic, environmental and social development. After financial and environmental crises in the 2010s, the UN adopt the Sustainable Development Goals (SDGs), addressing global inequalities and inclusiveness. In response to recurring trade-offs in favour of economic growth over social wellbeing and the environment, scholars introduce the concept of ‘inclusive development’. The latter is defined by Gupta, Pouw and Ros-Tonen as ‘development that includes marginalised people, sectors and countries in social, political and economic processes for increased

(15)

human well-being, social and environmental sustainability, and empowerment.’ (Gupta, Pouw & Ros-Tonen, 2015: p. 546).

Sub-conclusion

In short, the historic trajectory of poverty definitions and measurements has shifted alongside debates of what is considered ‘development’, varying from economic growth approaches to a broadened focus on human wellbeing. This chapter has briefly outlined conceptual discussions, such as unidimensional (e.g. income-focused) and multidimensional poverty definitions (including non-monetary metrics like access to education) famously inspired by the capability approach of Sen in 1999 and Nussbaum in 2001. Another dichotomy refers to who’s defining poverty: whether it be an outsider (objective) or the poor themselves (subjective) (Hulme, 2014). Recent years have seen the submerging of poverty into concepts of sustainable development, consisting of an economic, social and environmental pillar, and inclusive development, which explicitly addresses the marginalised. Nonetheless, as will be discussed in Chapter 3 on poverty measurement, income-focused definitions of poverty currently prevail, and though multidimensional conceptualisations have become popularised, they are rarely measured in practice.

2.3 Practical approaches to Development Cooperation

Before discussing effectiveness research of interventions, it is important to discuss what these they entail, and who executes them. Hence, this section sheds light on development policies and programmes, i.e. the objects of evaluations. Development cooperation approaches have shifted alongside theories of poverty as outlined in the above section (2.2). Moving from theory to practice, this section roughly describes the current Development Cooperation arena. Firstly, the section touches upon missions and visions of development cooperation, followed by an overview of various channels of development cooperation, thereby mapping the involved stakeholders. It must be noted that it focuses on donating actors, thereby not describing developing countries’ governments or beneficiaries. It goes without saying that they are also stakeholders, but the focus of this thesis is on programmes and evaluations carried out by a donor country. Secondly, since this study centres around government, this section ends by discussing their strategies in particular.

Rationales of Development Cooperation

Development Cooperation practices have typically been guided by a variety of visions, congruent with changing discourses in academia and across international institutions like the UN. This

(16)

paragraph touches upon exemplary visions employed by development cooperation actors to guide their initiatives, e.g. human rights based, target-based, bottom-up and capacity-building approaches.

The human rights-based approach is built on international human rights law (Hamm, 2001). The rationale behind this approach is that human rights principles should guide development programming, which translates into measures of non-discrimination, empowerment and accountability. Further, the development process is highlighted, emphasising that the ways in which results are achieved must be in accordance with human rights principles (Sarelin, 2007). Finally, the human rights approach aims to emancipate the vulnerable: “…by acknowledging that the poor have human rights, beggars are transformed into claimants.” (Sarelin, 2007: p. 476).

Whereas the human rights-based approach includes not only outcomes but also the development process, the target-based approach focuses primarily on the former. Since the 2010s, the development agenda has seen an increasing interest in objective and quantifiable goals (Langford & Winkler, 2014). One motivation for target-based approaches is that setting clear goals stimulates action. This idea has, at times, proved successful (e.g., WASH targets set in 1976 appear to have accelerated access in sanitation), though only when targets are embedded in a political process and are part of a coordinated effort (Langford & Winkler, 2014, p. 247). However, target-based approaches also run the risk of excluding vital elements like accessibility, equality and sustainability, because such concepts are hard to capture (Langford & Wrinkler, 2014).

Whilst target-based approaches tend to overlook elements of access and equality, bottom-up approaches in fact focus on such dynamics by emphasizing local conditions and values in development processes (Gore, 2013). Following notable mismatches between global goals and local needs, supporters of bottom-up or grassroots approaches argue that people should be at the centre of development initiatives, particularly the marginalised or excluded (Fors & Moreno, 2002). Bottom-up approaches consist of a variety of visions, including basic needs (fulfilment of health, education and participation goals), empowerment (enabling marginalized citizens to (re)gain power over their lives, e.g., through political participation) and rural-based development (supporting marginalised rural regions vis-à-vis urban regions) (Fors & Moreno, 2002: p. 200-202).

Finally, another trend is the capacity-building approach. Capacity-building, when understood as strengthening institutional development, is a worthwhile goal, though the term has become

(17)

somewhat of a buzzword (in some cases meaning little more than ‘training’) (Eade, 2007). The original rationale of capacity-building traces back to the belief that the role of the outsider, or donor, is to support the capacity of beneficiaries to determine, organise and sustain their own priorities and ideals. These may include (a combination of) intellectual, financial, cultural or political capacities (Eade, 2007). While target-based approaches focus on output, like pure delivery of goods, capacity-building approaches are more in line with bottom-up approaches, as both focus on autonomy of beneficiaries (Ika & Donnelly, 2017).

Channels and stakeholders in Development Cooperation

When reviewing approaches to development cooperation, it is important to highlight that development cooperation, in the form of Official Development Assistance (ODA), constitutes a minority (approximately 15%) of financial flows to developing countries, see Figure 1. This thesis considers only ODA, which is transferred to developing countries via three channels: multilaterally (i.e. through organisations like the UN), NGOs or bilaterally (from one government to another).

Figure 1

Financial flows to developing countries (ODA, remittances and other (e.g. Foreign Direct Investment, private grants, etc. in USD million, 2016)

Note. Reprinted from ‘Big picture of total resource receipts, 2002-2017’ by OECD, 2019, oecd.org/dac/stats/beyond-oda.htm.

Multilateral channel

Major multilateral institutions involved in poverty reduction include the World Bank, UN (delivering humanitarian aid and promoting sustainable development, mostly through the UN Development Programme (UNDP)) and the Development Assistance Committee (DAC) of the Organisation for Economic Co-operation and Development (OECD). The principal aim of the World Bank was to provide temporary loans to developing countries, often in conjunction with policy reforms. Currently, their explicit goal is to fight poverty (Clemens & Kremer, 2016). The DAC consists of 34 of the largest donor countries and considers itself a forum for these donors concerning aid and poverty reduction in developing countries.

(18)

NGO channel

Unsurprisingly, NGOs are a heterogeneous category. They can be divided into grassroots organisations located in recipient countries, and international organisations or funding agencies (Padrón, 1987). Whilst the number of international NGOs involved in Development Cooperation is endless, well-known organisations include: Oxfam, the Overseas Development Institute, Innovations for Poverty Action (IPA) and Cash Transfer organisation GiveDirectly.

Bilateral channel

Bilateral aid programmes were founded in the 1960s, often to support productive sectors and infrastructure in recipient countries (Hjertholm & White, 2000). The share of bilateral aid has decreased over the decades. Nonetheless, it has persisted, despite the prevalence of arguments in favour of multilateral institutions (which tend to target more effectively) (Hjertholm & White, 2000).

Current strategies pursued by donor governments

The majority of ODA comes from government sources (regardless of which channel is used) whilst the latter is funded by private charities and NGOs (OECD, 2016). This makes government-funded development practice an important concern. On the country-level specifically, the largest donors (by percentage of GNI, per capita) include Sweden, Norway, Luxembourg, Denmark and the Netherlands. In terms of total foreign aid budgets these are: US, UK, Germany, Japan and France (OECD, 2016). This section deals with programmes that directly address poverty reduction, thereby excluding initiatives a government may take that result in poverty reduction but have other principal aims (e.g. progressive taxation may indirectly reduce poverty but is principally aimed at redistribution).

Increasingly, and in conjunction with aid-sceptical works published by authors like Moyo, ‘Trade not Aid’ thinking has swept across governments, focusing attention to trade liberalisation as benefiting developing countries more than ODA does (de Lombaerde & Puri, 2009; Moyo, 2009). An example of a practical initiative that arose from such discourses is the Aid for Trade (AfT) initiative, which aims to assist developing countries to participate in international trade, for instance through financing trade-related infrastructure and stimulating productive capacity. This initiative currently makes up roughly a third of ODA financial flows (Jakupec, 2016). Furthermore, AfT discourse is illustrated in organisational changes, like the merging of international development departments with foreign trade departments in governments, as is the case in the Netherlands.

(19)

Roughly speaking, a government has two approaches to combat poverty: one is to raise the incomes of the poor, another is to reduce the negative effects of having a low income on education, health, housing and safety. Essentially, the first approach includes cash transfer programmes which directly reduce income poverty. The second concerns social policy, for instance by improving education systems and offering universal healthcare services. When successful, this approach manages to reduce structural barriers faced by the poor (Reeves, 2015). Traditionally, poverty reduction programmes have focused on the second method, for instance by delivering products or services, building infrastructure and providing training to developing countries.

Though the responsibility for poverty reduction has traditionally been with governments and NGOs, the private sector has become increasingly involved with poverty reduction too, e.g. by creating employment opportunities for the poor. This is linked with Aid versus Trade debates described above. An issue that arises, however, is that in many developing countries, the conditions for businesses to enter markets are riskier which means western companies are less likely to settle here (Callander, 2017). Further, corporate social responsibility (CSR) strategies have been on the rise, through which companies try to divert some of their attention and profit to social and environmental initiatives (Karnani, 2017). CSR strategies have, however, also been accused of practices of ‘green washing’: the discrepancy between symbolic and substantive actions taken by businesses to reduce their environmental impact (Walker & Wan, 2012).

Sub-conclusion

In sum, development cooperation strategies are often based on a vision, examples of which are capacity-building and a human rights-based approach. Such rationales vary in terms of degree of local ownership, as well as whether emphasising the results or the process. Further, three predominant channels are used for development cooperation, which are bilateral, multilateral and through NGOs. Most finance derives from governments, regardless of through which channels, which makes government strategies a relevant focus of this thesis. Governments employ a variety of strategies, pertaining roughly speaking to either improving structural circumstances of the poor, or directly increasing their income, through cash transfers. The latter method has been gaining ground in recent years, due to its growing body of research revealing its effectiveness, to be explored in the next section.

(20)

2.4 Effectiveness of Development Cooperation practice

This section delves into historic and recent effectiveness research, as well as debates surrounding international aid. Further, it discusses methodological issues that arise in effectiveness research of development strategies, most notably its external validity. The further details of evaluation research literature will be discussed in Chapter 3.

Historical overview

The effectiveness of aid has received much academic attention and the popularity of certain poverty strategies have waxed and waned in the past decades. In a longitudinal study over the 1970s, 1980s and 1990s, Feeny and White (2003) note two trends: Firstly, the share of aid to Sub-Saharan Africa had fallen during the 1990s, in favour of a more selective approach of aid delivery, a notion called ‘aid selectivity’. Secondly, reductions in aid amounts had been accompanied by improved quality of aid; such as improved financial terms, exemplified by debt relief (Feeny & White, 2003). By the end of the 1990s, the good governance debate arose, following the Worldwide Governance Indicators project, which the World Bank launched in 1996. It aimed to capture so-called good governance indicators across 200 countries, measuring concepts like ‘voice and accountability’, ‘political stability’ and ‘control of corruption’, to name a few (Kaufmann, Kraay & Mastruzzi, 2010). It was alongside this effort that the World Bank published ‘Assessing Aid’, arguing that aid has positive effects when good policies are in place (World Bank, 1998). However, Lensink and White criticise this report, scrutinising the normative nature of what ‘good’ policies are (Lensink & White, 2001). They find that many policies might be ‘good’, especially when the goal of aid shifts from economic growth to poverty reduction (Lensink & White, 2001).

The following decade marked a heated debate of pro- and opponents of aid, with notable publications on either side. Easterly (2007) and Moyo (2010) pointed to dependence of developing countries’ governments on aid, discouraging them to construct effective institutions. Sachs, on the other hand, famously advocated in favour of international aid as the solution to global poverty (Sachs, 2005). In any case, attention was increasingly paid to quality rather than mere quantity of aid, in which Monitoring and Evaluation (M&E) started playing an important role.

Roche (1999) describes this for M&E as follows: Following debates of aid effectiveness, there is pressure of governments on NGOs to demonstrate results, in a context of little feedback

(21)

mechanisms (Roche, 1999). Impacts of poverty reduction strategies are far away, both in terms of time, effects tend to show years after a programme, and space, often in locations far away from donor countries (Clements, Chianca & Sasaki, 2008). Nonetheless, support for poverty reduction depends, at least in part, on the perceived effectiveness, and admitting that effectiveness is unpredictable and difficult to assess, makes development project especially vulnerable to scrutiny (Roche, 1999). In the long run, Roche claims, the case for poverty reduction programmes can only be sustained through careful evaluation of its impact, acknowledging mistakes and the inherent insecurities of the sector (Roche, 1999). This need for impact measurement as described by Roche shows the accountability function that evaluation serves. The other function of evaluation, learning, has generally been watered down, or used interchangeably with accountability (Kogen, 2018). Increasingly, there is a need to balance accountability-serving, target based evaluation with more emergent and complexity-based approaches suitable for learning purposes, which will be the point of entry for this study (Lennie & Tacchi, 2014).

Current effectiveness research: What works, and what does not?

As the above paragraph illustrates, the question of ‘what works?’ has been the centre of lively academic debate in development economics. Meta-analysis finds a positive average effect of aid on economic growth, but one that is small, statistically insignificant and decreasing over time, which implies the positive effects of aid have a limited duration (Doucouliagos & Paldam, 2008). However, economic growth does not necessarily equate development, as the latter encompasses individual wellbeing, which necessitate material possession, the ability to use those and the level of satisfaction this causes, while the former gives an idea of primarily the material economy as a whole (McGregor & Pouw, 2016). Moving from macroeconomic research to evaluations of single interventions: Vivalt (2019) analyses and compares the results of a number of impact evaluations. Though effectiveness has been at the centre of academic debate, it must be noted that in reality, very few robust, replicable studies exist (which is unsurprising, given limited research budgets and unlimited locations, timespans and interventions to evaluate). To give an idea of the currently evaluated interventions and their corresponding number of reports refer to Figure 2.

(22)

Figure 2

Distribution of strict outcomes analysed by Vivalt (2019)

Note. Reprinted from “How much can we generalize from impact evaluations?” by Vivalt, 2019,

evavivalt.com/wp-content/uploads/How-Much-Can-We-Generalize.pdf

Zooming in on a particular strategy, the Cash Transfer (CT); García and Saavedra (2017) find that the use of CTs (they focus on Conditional ones) is expanding; more than 50 countries employs such programmes, which is twice as many compared to 2008 (García & Saavedra, 2017). A meta-analysis of evaluations which measure CT impact on educational outcomes (a total of 35 studies), finds that both CCT and UCTs improve the odds of being enrolled compared to no CT programme (Baird, Ferreira, Özler & Woolcock, 2014). Interestingly, the effect sizes for attendance are larger for CCTs compared to UCTs, but this difference is not statistically significant (Baird et al., 2014).

The question of ‘what works?’ necessarily raises the, albeit painful, question of ‘what does not work?’. It is difficult to find academic literature surrounding failures in development cooperation, which is likely linked to positive publication bias. This bias implies overreporting of positive, or significant, results in academic journals and underrepresenting of negative or no results (Bamberger, 2009). This is especially the case for NGO-commissioned research whose evaluation results are often tied to future development funding. Hence, researchers are incentivised to paint rosy pictures (Vivalt, 2019; Camfield, Duvendack & Palmer-Jones, 2014).

(23)

Nonetheless, interventions may fail for various reasons, pertaining to poor implementation, false assumptions underpinning programmes or a mismatch with local conditions. A well-known example is the buy-one-give-one campaign of TOMS shoes, which delivers a pair of shoes to poor communities for every pair bought in the developed world. It was found that the arrival of TOMS shoes drove local shoemakers out of business. The overall impact of the shoe donation programme (on outcomes like foot health, self-esteem and school attendance), measured in a randomised trial, was found to be negligible (Wydick, Katz, Calvo, Guttierez & Janet, 2016). Currently, and in response to severe criticisms, the company works together with local businesses for shoe production. This is an example of the increasing role of private sector involvement in development cooperation, as described in earlier sections. However, government-led programmes have not been exempted from failure. An example is the PlayPump, a merry-go-round water pump which turned out to break quickly and require child labour to function, which was heavily funded by USAID, the US Agency for International Development (UNICEF, 2007).

In short, the question of what works, and what does not, are heavily contested and lie at the heart of the development cooperation realm. Though effectiveness research is increasing, shown in the steadily rising number of impact evaluations, it appears there is little basis for hard claims of effectiveness. And if positive results are found, the question remains if it will work similarly in other contexts.

External validity of effectiveness research: Where does it work?

Alongside the growing body of literature on effectiveness, rises the issue of external validity of such studies. External validity refers to the generalisability of findings, i.e. the extent to which findings of one evaluation translate to other contexts (Pritchett & Sandefur, 2013). Examples of studies showing limited external validity of various studies have revealed the risk of site selection bias, where an intervention’s result is caused by the characteristics of a particular environment, thus limiting external validity (Allcott, 2015). In a recent publication, Vivalt utilises a dataset of 635 impact evaluations in the development realm, to analyse by how much results actually vary, and whether certain characteristics predict external validity (Vivalt, 2019). She finds a large degree of heterogeneity (i.e. varying effect sizes in varying contexts), some of which can be explained by study characteristics. Notably: smaller studies tend to report larger impacts, as do studies by NGOs or academics as opposed to government-issued research (Vivalt, 2019).

(24)

Sub-conclusion

In short, the effectiveness of international development interventions has been a major source of debate, and recent years have seen a rise of monitoring and evaluation (Roche, 1999). However, impact studies are affected by problems of external validity and publication bias (Vivalt, 2019). Further, evaluations are often focused on their accountability purpose, especially in response of debates surrounding development impacts. This means their second goal, to enhance learning, is often forgone. Few academic studies focus on how impact evaluations may stimulate learning, a gap which this study fills.

2.5 Conclusion

Chapter 2 described both theoretical conceptualisations and practical approaches to development cooperation. The pathway of poverty definitions and measurements has shifted from economic growth approaches to theories of human wellbeing, like Sen’s capability approach. Recently, concepts like sustainable development and inclusive development, emphasising the marginalised, have emerged. Parallel to academia, practical approaches have evolved, which shows in varying rationales behind development programmes. Examples are human rights-based and bottom-up approaches. Most finance is provided by governments, which makes their policies a relevant focus of this study. Simultaneously, poverty reduction initiatives have become increasingly tied to international trade.

Further, measurement has become important as a source of accountability towards donor governments. Principally, evaluation aims to improve accountability and learning. However, in many evaluations, questions focused on learning, which are emergent and complex, are attenuated by accountability-serving questions. This study counters that tendency by forgoing accountability and focusing on learning within a Ministry of Foreign Affairs. Furthermore, it is set in a government, because the majority of development cooperation spending comes from government sources, meaning their imperative to learn is crucial. Finally, this study approaches learning from the perspective of policymakers and evaluators. The former because they are at the forefront of learning within a government, the latter because their backgrounds and assumptions are expected to influence the evaluations they carry out.

(25)

3. Evaluating, learning and policymaking

3.1 Introduction

The purpose of this chapter is to move from the development sector itself to its evaluative endeavours. As such, it discusses academic literature surrounding learning of governments and policymakers, and the ways in which evaluations are used to facilitate the learning. Furthermore, following discussions of effectiveness in the previous chapter, it is imperative to illustrate how evaluative research is carried out. Moreover, this chapter illustrates how evaluators’ choices may affect evaluations. In doing so, it highlights a critical knowledge gap, since little empirical research about evaluators and their backgrounds exists.

This chapter first discusses how development cooperation efforts may be assessed by discussing different schools of thought in assessment (3.2). Secondly, it offers a critical review of common evaluation techniques (3.3). Third and finally, it discusses scholarly literature of evaluation use and learning in a policy-setting (3.4). Finally, the chapter summarises the research gaps and the position of this study within the overall body of literature (3.5).

3.2 Schools of thought: Quantitative, qualitative and mixed analysis

Alongside theoretical discussions around poverty and development, operationalisations of poverty vary too. Operationalisation refers to the methodology used to measure poverty, which is often tied to a school of thought. Evaluators typically have either quantitative, qualitative or mixed backgrounds, the underlying assumptions of which influence methodological choices. Hence, section 3.2 reviews the pros and cons of quantitative, qualitative and mixed analyses, and shows how evaluators, depending on their respective school of thought, may make divergent choices in their evaluations.

Quantitative analysis

Quantitative assessments commonly establish criteria prior to measurement, which may be unidimensional or multidimensional. In doing so, they result in objective measures of poverty, as opposed to subjective ones. They originate in positivist, economic schools of poverty analysis (McGee & Brock, 2001). For example, one of the first quantitative poverty thresholds is a family of metrics called the Foster-Greer-Thorbecke (FGT) indices (Foster, Greer & Thorbecke, 1984).

(26)

A frequently used index of this group is FGT2, which weighs poverty of the poorest individuals more, and in doing so is a combined measure of poverty and income inequality. The World Bank poverty line is another example of a quantitative unidimensional measure, and is, as of 2015, $1.90 a day (2011 Purchasing Power Parity (PPP)) (World Bank, 2015).

Recently, the multidimensionality and political embeddedness of poverty receives more attention. Features of Sen’s “Development as Freedom” and Nussbaum’s “Women and human development: The capabilities approach”, have been translated into alternative quantitative and multidimensional measures of poverty (Nussbaum, 2001; Sen, 1999). Examples are the Human Development Index (HDI) and the Multidimensional Poverty Index (MPI) (Alkire & Santos, 2011; UNDP, 1990). The former is a composite index of life expectancy, education and per capita income indicators. The MPI totals ten indicators, including e.g. education, health and living standards. A person is considered multidimensionally poor when deprived in three or more out of ten indicators (Alkire & Santos, 2011).

An advantage of quantitative assessments is their capacity for comparison. Since they are often based on large-scale data collection, using surveys measuring pre-established indicators, their data can be used for intra- and inter-country comparison. Further, they give a snapshot of poverty within a given context, which may serve targeting of interventions. Spatial analyses, which are based on quantitative data, can be used to map disadvantaged areas in need of resource allocation. However, quantitative assessment bears a number of critical disadvantages too. For instance, quantitative research methods like household surveys rarely reach the extremely poor, who are often hidden or immobile (Altaf, 2019). Furthermore, they tend to reduce poverty to measurable indicators like income, thus overlooking structural elements of poverty, which potentially depoliticises poverty. Including only measurable indicators also neglects the subjective experience of poverty, thereby assuming homogeneity across the poor. Further, methodological issues like changing price levels across the world can affect poverty statistics (Jolliffe & Prydz, 2015). Finally, a disadvantage lies with poor explanation: Since underlying assumptions of figures are usually not captured numerically, these may remain hidden.

Qualitative analysis

Qualitative assessments, on the other hand, draw on constructivist disciplines, of which most notably anthropology. They tend to be participatory in nature, which implies that criteria are established during analysis and in conjunction with the involved person. Typical qualitative

(27)

methods include focus group discussions and interviews. Although quantitative methods remain dominant in poverty assessment, the use of qualitative approaches have become more common, with three notable examples. Firstly, the Participatory Poverty Assessment includes the perspectives of the poor in analysis and design of anti-poverty strategies, e.g., through stake-holder analyses (McGee & Brock, 2001). An example is the Voices of the Poor project by the World Bank, which was discussed in Chapter 2 (Narayan et al., 1999). A second illustration is the PADev method, which evaluates all changes in a region during a specific period, and then analyses what development policy or programme was responsible for which change through the eyes of the intended beneficiaries (Pouw et al., 2017; Rijneveld, Belemvire, Zaal & Dietz, 2015). Third and finally, Altaf uses a multitude of qualitative methods, life histories, interviews and aspects of the PADev method, to research experiences of the extreme poor (Altaf, 2019).

A strength of such assessments is their emphasis on the perceptions of the local population, which results in better explanations of why people may experience poverty. Furthermore, qualitative assessments are more apt to explain the structures and dynamics of poverty. For instance, qualitative research on poverty traps in South Africa has provided important insights into the structural mechanisms and social relations surrounding poverty (Adato, Carter & May, 2006). A downside is that qualitative methods tend to be based on selective samples, which are often highly context-specific. Therefore, the results of such studies are difficult to compare across countries, communities, to establish causality, or to generalise to larger populations (Carvalho & White, 1997). However, there is attention for small-n studies from 3IE (an impact evaluation research centre), who seek ways in which such studies could better address causality (White & Phillips, 2012).

Mixed analysis

Recently, the popularity of mixed method assessments has risen, combining the strengths of quantitative and qualitative assessment (Fahmy, Sutton & Pemberton, 2015). Several studies have shown how different methods may reach contrasting conclusions, which indicates the need for careful consideration of methodological choice (Davis & Baulch, 2009; Notten, 2016). Davis and Baulch (2009), who carried out a longitudinal mixed methodology research of poverty dynamics in Bangladesh, found that neither qualitative nor quantitative approaches alone could describe the situation of the poor they studied. Using mixed methods as a way of triangulation, they end up with better-quality data from various sources. They conclude that integrating the two strands will contribute to policymakers’ demands for generalisability (ensured by the quantitative strand) in

(28)

conjunction with greater validity (provided by the qualitative strand), which ultimately results in better informed poverty reduction strategies (Davis & Baulch, 2009). In spite of these apparent advantages, the majority of quantitative and qualitative research still tends to be conducted separately, often due to budgetary constraints or the academic background of the assessor (Food and Agricultural Organisation, 2002).

Backgrounds of evaluators

Unsurprisingly, the backgrounds of evaluators influence evaluation studies. This is important for two reasons. Firstly, evaluator’s philosophical underpinnings influence the methodological decisions they make in their evaluations (Mertens, 2016). Secondly, choices of methods and the in- or exclusion of certain indicators shapes the outcomes of poverty evaluation (Notten, 2016). The backgrounds and assumptions of evaluators remain largely overlooked in academic literature as of yet. This study aims to contribute to existing scholarship by analysing evaluators’ backgrounds and assumptions, in the form of their thematic study backgrounds, methodological preferences and goals of evaluation.

Sub-conclusion

In short, quantitative measures dominate poverty analyses used in policymaking because their interpretation is straightforward and easy to compare across time and location. Though qualitative and mixed method assessment has become more popular, their limited generalisability and higher implementation costs hinders their usage in policymaking. The design of efficient multidimensional and mixed method evaluations remains a challenge. Generally, the choice of a particular type of assessment is tied to the definition of poverty, the scale of analysis (e.g. country or individual level), the intended use of the information, but is also based on the assumptions and backgrounds of evaluators. The latter point is crucial, because evaluators, who come from a variety of study disciplines, make contrasting choices depending on their study discipline and preferred methodology. This, in turn, influences what indicators are measured, and which are not, a choice that greatly shapes the eventual evaluation product. In spite of this link between evaluation outcomes and researcher assumptions, little research has been conducted in this area. This study addresses that gap by conducting empirical research into the backgrounds and preferences of evaluators.

(29)

3.3 Critical review of evaluation techniques

This section reviews typical evaluation strategies and methods, comparing their strengths and weaknesses. The paragraph focuses on ex-post evaluation techniques in particular, which are carried out after an intervention has been completed, to assess whether it was effective in achieving its goals and why (European Commission, 2019). The evaluation techniques are methods of establishing effectiveness of an intervention, and does not address the efficiency question (at what cost was the impact realised?). These evaluations may take place at varying levels, from a specific project, to an instrument, strategy or policy. Firstly, the overview starts of by introducing the Theory of Change, and goes on to describe three common quantitative impact evaluation (IE) techniques: Randomised controlled trial, propensity score matching and difference-in-difference. Secondly, it describes three qualitative methods: Process tracing, realist evaluation and participatory methods.

Theories of Change

Before diving into specific evaluation methods, it must be noted that most evaluations employ various techniques and data sources. Commonly, evaluations start with Theories of Change (ToC) which describe the change processes that lead to a desired result. It first analyses the main problem, the overall objective and contextual features, including culture and power relations, before a specific intervention is fitted in. Assumptions about behaviour and context underpinning the intervention are made explicit and substantiated by evidence where possible (IOB-BIS, 2015). TOCs are often visualised as diagrams and can be found in project reports (Harries, Hodgson & Noble, 2014). An example of a Theory of Change can be found in Figure 3.

Note. Adapted from “A theory of change for SRHR and HIV linkages” by the World Health Organisation, 2020,

https://www.who.int/reproductivehealth/topics/linkages/theory/en/.

Figure 3

(30)

Quantitative impact evaluation

According to the recently published “Impact Evaluation of Development Interventions: A

Practical Guide”, impact evaluations are “empirical studies that quantify the causal effects of

interventions on outcomes of interest” (White & Raitzer, 2017: p. 2). The number of impact evaluations (IEs) has been steadily rising since the 2000s (Cameron, Mishra & Brown, 2016). Several types of impact evaluative techniques exist, among others: Randomised controlled trials, propensity score matching and difference-in-difference.

Quantitative impact evaluation: 1. Randomised controlled trial

Banerjee, Duflo and Kremer pioneered poverty research using randomised controlled trials (RCTs), sometimes referred to as a randomised evaluation or experimental design. This implies random assignment of participants of the eligible population to one or more “treatment groups”, receiving the intervention (e.g. a microfinance scheme), and to a “control group” that receives no intervention or a comparator reference intervention (Banerjee & Duflo, 2011; Kremer, 2003). The computed effect size is the difference in outcome between the treatment and control group, and since these two have similar baseline characteristics, the change in outcome is likely due to the intervention (White & Raitzer, 2017). The RCT is prone to heavy scholarly debate, mostly between micro and macroeconomists. Given their strong case for causation, RCTs have been labelled the ‘golden standard’ by some development economists, notwithstanding a number of criticisms the technique faces (Ravallion, 2018; White, 2013).

One such criticism revolves around ethical considerations that randomisation brings along. Reddy suggests that the fact that the poor are often weakly organised makes it possible to randomise benefits amongst them without facing resistance, which is problematic given the unequal distribution of benefits a trial entails (Reddy, 2012). Rebutting this criticism, Glennerster argues that in many instances, choices need to be made about the allocation of limited resources anyway, in which case randomisation may be more ethical than favouring a subgroup on the basis of mutual acquaintances or an official’s liking of a certain region (Glennerster, 2017). Another issue is of methodological nature and pertains to the setting up of the experiment, which, if participants are aware of this, may cause them to alter their behaviour, leading to incorrect inferences (Thirlwall, 2012). Moreover, Thirlwall continues, it is difficult to generalise interventions that are highly contextual; if an intervention has been declared successful in one country, there is no guarantee it will work similarly in another (Thirlwall, 2012). This refers to the limited external validity of RCTs, an issue that was discussed previously, in section 2.4. Finally, Rodrik (2008) argues that

(31)

programmes at the micro level often treat symptoms rather than structural origins, which require change of institutional settings (like government policies) at the macro level. He encourages macroeconomists to recognise the advantages of randomised evaluations, and microeconomists, to acknowledge that such evaluations are restricted by the limited scope of their application (Rodrik, 2008).

Quantitative impact evaluation: 2. Propensity score matching

In impact evaluations, propensity score matching may be used as a technique, especially when the setting up of a long-term trial is costly and laborious. The propensity score is the probability of treatment assignment based on observable characteristics. Members of the treatment group are then matched to members of the control group. After matching, the impact size is computed by calculating the difference between the indicator for the treatment individual and the average matched control individuals. The advantage is that it can be done ex-post and without baseline data, thereby serving as a sort of ‘last resort’ technique. However, the model relies on only observable characteristics, thus excluding unobservable characteristics which may confound results (Austin, 2011; White & Raitzer, 2017).

Quantitative impact evaluation: 3. Difference-in-Differences estimates

This method takes the development of the control group as counterfactual. The impact is the difference between the change in outcome in the control group from that of the treatment group. This method is relatively easy to implement, however often times data are not available to test model validity. Therefore, it may be more rigorous to combine with a matching technique, or use a fixed effects model to better control confounding variables. Furthermore, like the RCT, difference-in-difference estimates require baseline data, which in many poverty studies is lacking (White & Raitzer, 2017). The model is useful for understanding the impact on those participating in the intervention, but does not extend to the overall population (White & Raitzer, 2017).

Moving from quantitative to mixed and qualitative evaluation

Overall, impact evaluation has become a dominant school of evaluation, but has also been criticised (Rutkowski & Sparks, 2014). A number of authors have cautioned the overestimation of IEs: “the stereotype of good evaluation being impact evaluation had captured the imaginations of some stakeholders […] words like “impact” . . . have emerged as powerful mantras” (Rutkowski & Sparks, 2014: p. 493). Besides this overestimation, the dominance of IEs is problematic because they are mainly quantitative, which often lacks explanatory power as was discussed in section 3.2.

Referenties

GERELATEERDE DOCUMENTEN

During my internship period ESA consisted of twelve members (Jochem, Natasja, Marjolein, Klaus, Hanne, Joost, Miguette, Jan, Rolf, Martine, Roel and Ceta) and three interns (Noha,

events <is associated with> workshops master talks <is part of> events museum week <is part of> events museumnacht <is part of> events youn creators

Hierdie verwantskap word gewoonlik in terme van herkoms of invloed verstaan, naamlik op watter wyse ’n Nuwe-Testamentiese skrywer deur filosofiese tekste beïnvloed kon word of

(2b) Verondersteld wordt dat de mate van symptomen op de somatische depressiedimensie het laagste zal zijn voor de veilige hechtingsstijl, hoger voor de

This will help to impress the meaning of the different words on the memory, and at the same time give a rudimentary idea of sentence forma- tion... Jou sactl Ui

The resolution of 12 March 2015 on the annual report from the High Representative of the European Union for Foreign Affairs and Security Policy also stressed the need to

A lecture on the Current and Future Trends in Marine Renewable Energy Research will be given on Wednesday 27 August 2008 at 11h00 in Room M203 of the Mechanical Engineering

2-Mercaptoethanol (Merck) was distilled before use. Samples were prepared using nitrogen purged, sealed ampoules and syringes. In those experiments where thicl was