Evaluating the impact of livelihoods programmes in conflict-affected situations

(1)

Evaluating the impact of livelihoods programmes

in conflict-affected situations

Master’s Thesis

Author: Eero Janson Supervisor: Dr. Talita Cetinoglu

NOHA Master’s Degree in International Humanitarian Action University of Groningen

(2)

Abstract

Over the past decades, evaluations of humanitarian programmes have come center stage due to the push for improved accountability and learning. Carrying out good-quality impact evaluations in particular are necessary to generate evidence about what works or has worked in order to make informed decisions in designing and managing humanitarian interventions. While the need and push for evidence is common to all sectors of humanitarian action, this thesis focused on the livelihoods support sector in conflict-affected contexts to review and analyze the current practice in evaluating the impact of livelihoods programmes and the principles and challenges which should be taken into account when generating high-quality evidence. This was done, first, through reviewing the current practice of impact evaluations in general, including its methods and challenges. Second, the livelihoods support sector was reviewed to establish the principles and

intervention modalities in use. Third, eight indicator registries and lists of livelihoods impact indicators currently used in the sector were reviewed and eleven selected indicators were systematically assessed against RACER quality criteria. Finally, based on the analysis of livelihoods impact indicators, a generic impact evaluation framework for the livelihoods sector, consisting of seven triangulated and quality-controlled indicators, was proposed and assessed using BOND evidential quality criteria. The thesis highlighted the continued need for high-quality evidence notably by establishing common frameworks for indicator definition and analysis, and by developing standardized ways of assessing the quality of gathered evidence. Such efforts can work to ensure that evidence-based practice is fully adhered to in humanitarian action, thereby improving learning and accountability towards people who are assisted.

This thesis is submitted for obtaining the Master’s Degree in International Humanitarian Action. By submitting the thesis, the author certifies that the text is from his hand, does not include the work of someone else unless clearly indicated, and that the thesis has been produced in accordance with proper academic practices.

(3)

1. Introduction

Over the past decades, humanitarian agencies have put a great deal of effort into improving the design and management of humanitarian responses in disaster- and conflict-affected countries. Ever since the seminal Joint Evaluation of Emergency Assistance to Rwanda which was published in 1996 and offered an unprecedented insight into the effects of

humanitarian aid (Borton et al., 1996), significant steps have been made towards improving the accountability and performance of humanitarian action (Borton, 2004). Donors and implementing agencies have, among other things, agreed on and produced various international standards, technical guidelines, and training programmes for better humanitarian aid provision.

The push to improve the quality and accountability of humanitarian action as well as to demonstrate value-for-money and proper use of limited resources (Knox Clarke & Darcy, 2014; Baker et al., 2013) has made good-quality impact evaluations of humanitarian action much more important. Well-planned, designed and executed evaluations can assist both learning and accountability in a number of ways. Evaluations offer a valuable opportunity to see what is working or has worked and what needs to change, and get structured insight into the overall performance of any intervention or programme. Evaluations contribute to the body of evidence on what works and what does not in high-risk contexts, and how do different approaches compare in results to each other. Evaluations can answer difficult questions about the state of humanitarian action and assist decision-makers at different levels in making the necessary course corrections or tough choices (ALNAP, 2016).

The livelihoods sector – with its emphasis on restoring sources of livelihoods for affected populations – is no different in that. The past decade has seen growing standardization efforts through the adoption of a number of minimum standards as well as growing efforts to measure the impact of livelihoods support programming in emergencies. Nevertheless, measuring impact of livelihoods interventions is challenging for several reasons, such as choosing proper indicators for tracking the change brought about by the intervention as well as convincingly assigning contribution for these changes to any particular livelihoods intervention or approach.

(6)

This thesis sets out to unpack the difficulties related to conducting impact assessments of livelihoods programming in conflict contexts and different methods used for that. This thesis goes on to propose a coherent evaluation framework for livelihoods interventions in

conflict-affected situations, including relevant indicators, methods, and ways of establishing attribution. The proposed framework aims to assist humanitarian organizations in their practice, as well as help set the organizational groundwork for effectively measuring impact of livelihoods programmes in general.

There are several reasons why this research is relevant and of importance. First, there is an increasing demand and need for evidence-based decision-making in the humanitarian sector. Donors, humanitarian organizations, as well as people benefiting from humanitarian interventions themselves want to be sure that the resources provided and work done are having the results and impact that they are expecting and which was promised. Evaluating interventions and programmes generates evidence which can then be used for

accountability purposes towards donors and beneficiaries, as well as for organizational learning, that is, to improve the outcomes of future interventions. Learning and

accountability are highlighted, among other places, in the Sphere Standards (Sphere, 2018) and the Core Humanitarian Standard on Quality and Accountability (CHS Alliance, 2018), both of which are regarded as authoritative in the humanitarian sector. Although many steps have been taken already to standardize and develop common approaches to evaluation, such as by establishing indicator registries and lists, the sector is still characterized by competing approaches and lack of common frameworks.

Second, with its focus on livelihoods interventions in particular, this thesis concentrates on a sector which is currently of high importance for both donors and humanitarian

organizations. Livelihoods-focused programming has been under increasing attention due to its place on the intersection of humanitarian and development sectors. Livelihoods support programmes – aimed at restoring sources of livelihoods of affected populations – and making connections between the humanitarian and development fields is particularly important in conflict-affected settings where livelihoods of local communities are affected for longer periods of time and in continually unsettled conditions. The commitment to bridge the gap between humanitarian and development sectors was recently highlighted

(7)

1.1. Research questions and methodology

This thesis is framed by the following research questions:

• What is the current practice in evaluating the impact of livelihoods programmes in conflict-affected situations?

• What principles and challenges should be taken into account when generating high-quality evidence for livelihoods programming in conflict-affected situations?

In order to answer to these two research questions, this thesis sets out to review indicator registries and lists for livelihoods indicators, review and assess a selection of indicators against a set of quality criteria, and lay out additional quality criteria for assessing the strength of evidence of evaluation frameworks. In addition, a generic evaluation framework for livelihoods programmes is proposed to exemplify the way in which evaluations should be grounded and developed.

This is done in two stages. First, in chapter 5, a total of eight indicator registries and lists are reviewed to identify the main indicators currently in use for evaluating the impact of

livelihoods programmes. A total of eleven indicators are then selected for review from these registries and lists, cross-referenced for their use in practice, and assessed against the RACER criteria which are used to evaluate the strength and quality of indicators. No new indicators for use in the livelihoods sector are proposed in this thesis.

Second, relying on a set of quality criteria proposed by BOND (2013), five guiding principles for establishing evaluation frameworks are defined in chapter 6. Then, based on the

assessment of livelihoods indicators and the guiding principles, a generic evaluation framework for assessing the impact of livelihoods programmes is proposed. The proposed framework, an observational study by design, consists of seven triangulated indicators which capture the impact and quality of livelihoods interventions, accompanied by relevant exemplary survey questions. There are several limitations to this framework, such as the possible need to contextualize the generic framework depending on the particular context or the type of intervention, which are also discussed at length in chapter 6.

(8)

The main contributions of this thesis into the field of evaluation of humanitarian action are the review and quality assessment of livelihoods indicators in chapter 5, and the proposed evaluation framework for livelihoods interventions set out in chapter 6.

1.2. Thesis structure

This introduction laid out the need and importance of evaluating humanitarian action to generate evidence for accountability and learning purposes, as well as defined the research questions and the methodology of this thesis. The rest of the thesis is structured into seven chapters.

Following the introduction, the second chapter opens the concept of evidence-based practice which serves as the theoretical-analytical framework of this thesis, and its importance in the humanitarian sector. The need for and actual use of good-quality evidence in humanitarian action are discussed. The third chapter looks at the issue of evaluating impact of humanitarian interventions from different angles: how evaluations are built up and what are their main elements, what are the main contextual challenges to evaluating humanitarian action, and what methods are used in doing that. The fourth chapter takes under closer scrutiny the livelihoods sector and the intervention modalities commonly in use there. Issues relating to protection mainstreaming in the context of livelihoods interventions are also discussed.

The fifth chapter maps and reviews different indicator registries and lists for indicators currently in use in the livelihoods sector. Indicators selected from the registries are then evaluated against the RACER criteria. Based on that mapping and review, the sixth chapter goes on to propose a generic evaluation framework for livelihoods programmes and assesses its quality and limitations. The BOND criteria for evaluative quality are used to guide the process of developing the proposed framework as well as to assess its eventual quality. The seventh chapter discusses and summarizes the main findings of this thesis and highlights further needs for research, and the final chapter reviews and answers the research questions defined in this introduction.

(9)

2. Theoretical framework: evidence-based practice

The broader theoretical-analytical framework of this thesis is built around the concept of evidence-based practice, answering the question ‘why we evaluate’. Although evidence is always used in managerial decisions, also in humanitarian action, all too often little attention is paid to the quality of evidence used. This, in turn, may result in ineffective management, poor outcomes, and limited understanding of why activities carried out have not brought about the desired impact.

Evidence-based practice is an approach to decision-making and day-to-day management which helps practitioners to critically evaluate the extent to which they can trust the

evidence they have at hand (Barends, Rousseau & Brinder, 2014). It entails critically judging the trustworthiness and relevance of current evidence, evaluating the outcomes and

impacts of decisions and actions taken, and incorporating the gathered evidence into future decisions-making processes. Evaluations which generate evidence have, therefore, a critical role to play in the broader process of decision-making. At the same time, since good-quality evidence sprouts from well-designed evaluations, it is necessary to trace the evolution of evidence to the level of indicators and data collection methods based on which conclusions are drawn and evidence thereby gathered.

This chapter reviews the basic concepts used in evidence-based practice, different types of evidence and ways of assessing their quality, and the use of evidence in decision-making in humanitarian action. While this review serves to provide a broader grounding to the whole thesis by emphasizing the importance of good-quality evidence gathering for the practice of humanitarian action, it will also set the groundwork for guiding the formulation of the evaluation framework of livelihoods interventions in chapter 6, as well as assessing the quality of the eventual proposed framework.

2.1. Defining evidence

The focus on evidence was first explicitly introduced in medical sciences. Evidence-based medicine was first defined in 1992 by medical practitioners at McMaster University in Canada (Evidence-Based Medicine Working Group, 1992). The concept was based on

(10)

advances in clinical research – clinical trials, clinical epidemiology and meta-analysis – which demonstrated the limits of individual expertise. Dr David Sackett, the person most

associated with the emergence of evidence-based medicine, has defined evidence-based medicine as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” (Sackett et al., 1996, p. 71). Evidence-based medicine stimulated the rethinking of a host of professional activities: research studies and the submission requirements for research articles, new journals, reviews of evidence in existing textbooks, new practice guidelines and the education of health professionals (Bradt, 2009, p. 5). Eventually the evidence-based practice found fertile ground also in other fields, such as management, psychology, education, and humanitarian action.

The dividing lines between concepts such as ‘knowledge’, ‘data’, ‘information’, ‘theory’, and ‘evidence’ are not always clear. While the meaning of ‘knowledge’ has been much debated by philosophers over the centuries, in the framework of this thesis, it is taken to mean a ‘justified true belief’ – in other words, a belief that is grounded in some form of fact (Knox Clarke & Darcy, 2014, p. 10). Knowledge differs from information, which is taken to mean any data which may inform understanding or belief, presented or analyzed in a context that gives it meaning. Data, on the other hand, refers to pieces of factual information (such as measurements or statistics) which may be used to generate information or knowledge through analysis.

Based on the empirical/positivist school of thought, knowledge comes only or primarily from sensory experience (Sober, 2014). This school of thought underlies the traditional scientific method: forming hypotheses, and subsequently testing, and as required, modifying these hypotheses on the basis of experiment and observation in order to generate knowledge. A number of non-positivist social scientific approaches have

problematized this approach and criticized the idea of objectivity and the understanding of knowledge as separate from the observer (Yanow & Schwartz-Shea, 2006). Emphasis is placed, instead, on the relative and subjective, on people’s perceptions and experiences, and on more complex, socially and politically informed explanations of behaviors and outcomes.

(11)

The concept of ‘evidence’ used in this thesis draws on both the empirical school of thought and its social scientific critique. Evidence can be defined as information that helps to

substantiate, prove, or falsify the truth of a specific proposition, thereby creating knowledge (Knox Clarke & Darcy, 2014, p. 11). These propositions may relate to, for example, the existence or absence of a condition (examples: state of malnutrition in a population, level of school attendance) which can be verified through repeated and systematic observation, conducted according to proven methodologies. The propositions may also relate to behaviors (examples: coping strategies in a household during a crisis, actual food

consumption patterns) or beliefs (examples: self-efficacy in a given field, self-perception of well-being) of groups of people, therefore not referring to one common ‘objective’ reality but subjective realities. Evidence may come, therefore, in different shapes and sizes, but central to its definition is its aim to substantiate, prove, or falsify a proposition being made.

2.2. Types of evidence

These two schools of thought – positivist and non-positivist – have typically embraced two different research designs and data collection methods. Positivist tradition largely utilizes quantitative methods: gathering numerical data about phenomena under scrutiny. This approach assumes that the phenomenon which is studied is measurable, quantifiable and observable, and independent of the researcher.

In general, quantitative methods can easily cope with large number of cases and are, therefore, better able to provide a broad overview of the situation under scrutiny and to make generalizations across populations (Simister & James, 2020). At the same time, it may be difficult to draw conclusions on why and how things happen based on quantitative data alone, and quantitative methods may obscure or leave out data which are difficult to quantify.

Non-positivist school of thought, at the same time, tends to privilege qualitative methods, such as case studies, focus groups, and interviews. This approach allows for people’s own words, interpretations and feelings to come to fore, together with its nuances and possible internal inconsistencies. Qualitative methods may help in shedding light on the processes

(12)

that led to the change which is being observed, allowing to explain how and why the change happened (ibid.). Qualitative methods are typically not so much concerned with increasing the number of units being covered by the research, but rather going deeper into each individual case. At the same time, data gathered by qualitative methods may be difficult to compare and aggregate, and it may also be difficult to make generalizations based on a relatively small number of cases.

The strong distinction between qualitative and quantitative research designs may not be so clear-cut, however. As Mahoney and Goetz (2006) note, “the labels quantitative and

qualitative do a poor job capturing the real differences between the traditions. Quantitative analysis inherently involve the use of numbers, but all statistical analyses also rely heavily on words for interpretation. Qualitative studies quite frequently employ numerical data; many qualitative techniques in fact require quantitative information” (p. 245).

Nevertheless, data collected using these research designs tend to produce different types of evidence. Therefore, there is no hierarchy of research designs and methods, but rather “different research designs and methods are more or less appropriate for answering different research questions” (DFID, 2014, p. 3).

For that reason, many researchers and evaluators choose to use a mixed-method approach, that is, using a wide spectrum of evidence which makes use of several research designs and methods. This may mean collecting some data in a quantitative fashion but asking follow-up questions that are more open-ended and qualitative in nature (Bamberger, 2012, pp. 3-7). A mixed-method approach can allow a researcher to document straightforward statistical phenomena while also giving participants the space to describe their subjective experience in their own words (ibid.). While it can be more time-consuming to collect and analyze both sets of data, it allows a researcher to harness the advantages of both approaches (Hussein, 2015).

Mixed-methods approach is being actively used and advocated for also in the humanitarian sector. IFRC’s monitoring and evaluation guidelines, for example, suggest that qualitative data allow for only limited generalization, and can be perceived as having low credibility, while quantitative methods can be costly and “exclude explanations and human voices

(13)

about why something has occurred” (IFRC, 2011, p. 35). As a result, “a mixed methods approach is often recommended that can utilize the advantages of both” (ibid.). Similarly, guidelines from the World Food Programme suggest that “as qualitative and quantitative data complement each other, both should be used” (World Food Programme, n.d, p. 23).

Different methods and study designs used for evaluating humanitarian action, some relying more on quantitative and some more on qualitative methodologies, will come under closer scrutiny below, in chapter 3.4.

2.3. Quality of evidence

As evidence differs in type, it also differs in quality. Data gathered using a valid method of assessment is usually better evidence for the existence of a condition than anecdotal

reports from the ground. Differences in evidential quality, however, are not simply based on the nature of the evidence (qualitative or quantitative): a structured questionnaire,

administered to a probabilistic sample of the population, may tell us less than a series of well-conducted semi-structured interviews, for example.

In order to credibly assess the quality of evidence gathered, certain quality criteria need to be defined. A number of criteria and checklists for assessing the strength and quality of evidence have been proposed. For one, Louise Shaxson (2005) proposes five components that, taken together, define the ‘robustness’ or strength of evidence in policy terms: credibility (internal validity); generalizability (transferability); reliability; objectivity (lack of bias); and rootedness (understanding the nuance). John Gerring (2011), on the other hand, proposes a unified framework for social scientific methodologies which includes qualities such as: truth, precision, generality, coherence, commensurability, and relevance. ACAPS, an humanitarian organization specializing in assessments, distinguishes between the quality indicators of quantitative and qualitative research as: internal validity/accuracy, external validity/generalizability, reliability/consistency/precision, and objectivity for quantitative research, and credibility, transferability, dependability, and confirmability for qualitative research (ACAPS, 2013).

(14)

The Active Learning Network for Accountability and Performance (ALNAP), on the other hand, focusing on the humanitarian sector in specific, has proposed the following six quality criteria for evidence:

• Accuracy: whether the evidence is a good reflection of the real situation and is a ‘true’ record of the phenomenon being measured;

• Representativeness: the degree to which the evidence accurately represents the condition of the larger population group of interest;

• Relevance: the degree to which a piece of information relates to the proposition that it is intended to prove or disprove (i.e. whether there is a strong relationship

between the indicator and the condition under scrutiny);

• Generalizability: the degree to which evidence from a specific situation can be generalized beyond that response to other situations;

• Attribution: whether analysis demonstrates a clear and unambiguous causal linkage between the two conditions or events;

• Clarity regarding context and methods: the degree to which it is clear why, how, and for whom evidence has been collected. (Knox Clarke & Darcy, 2014, pp. 15-16)

BOND, the network of British Overseas NGOs for Development, has proposed five quality criteria for evaluating the strength of evidence produced by development and humanitarian NGOs, relating to five principles:

• Voice and inclusion: whether the perspectives of beneficiaries on the effects of the intervention are included in the evidence;

• Appropriateness: whether the evidence is generated through methods that are justifiable and reliable given the nature of the assessment;

• Triangulation: whether data on intervention’s effects are gathered using a mix of methods, data sources, and perspectives;

• Contribution: whether the evidence explores how change happens and what has been the contribution of the intervention under scrutiny to this change;

• Transparency: whether the evidence discloses details about data sources and methods used, the results achieved, and any limitations in the data or conclusions.

(15)

All of the mentioned criteria proposed in these checklists are of importance when assessing the quality of evidence. There is a lot of overlap between these different criteria; for

example, BOND’s contribution and ALNAP’s attribution, BOND’s transparency and ALNAP’s clarity regarding context and methods, ALNAP’s relevance and BOND’s appropriateness, to name a few points of conflation, are essentially targeting the same qualities. All too often, however, the criteria listed in these checklists are not easily evaluable. For its more practical utility and ease of operationalization, as well as its explicit emphasis on triangulation, this thesis relies on the quality criteria put forward by BOND. In addition to proposing the quality criteria for evidence, BOND (2013) has also offered an evaluation matrix for assessing the quality which will come under further scrutiny below.

BOND’s evidential quality assessment framework can be used to assess the quality of a body of evidence, or a framework for gathering such evidence. Taken the BOND’s quality criteria as the basis, good evidence about the effects of a humanitarian intervention may be defined as one that is gathered from multiple sources and using different methods, that is clear and transparent about the origins and analysis of this data, that includes beneficiary voices, and that is able to draw a logical link between the intervention and the change on the ground.

In the framework of this thesis, the quality criteria put forward by BOND are used to guide the process of formulating an evaluation framework for livelihoods interventions in chapter 6, and in assessing the quality of the resulting framework.

2.4. Evidence-based practice in humanitarian action

In humanitarian action, there are three sets of information and evidence which are of critical importance for decision-making processes: information about pre-crisis situation; information about the existence of humanitarian needs resulting from a crisis; and

information about ‘what works’ in addressing these needs (Knox Clarke & Darcy, 2014, pp. 12-14; Darcy et al., 2013, p. 19). Effective humanitarian interventions depend as much on knowledge – about what is happening and needed on the ground and what are the best approaches to alleviate these needs – as they do on funding and logistics (Knox Clarke & Darcy, 2014, p. 5). When this knowledge is lacking, interventions are at high risk of not

(16)

succeeding. As emphasized by the United Nations Population Fund, “the international humanitarian community’s ability to collect, analyze, disseminate and act on key

information is fundamental to effective response. Better information leading to improved response directly benefits affected populations” (UNFPA, 2010, p. 9).

The need for evidence-based practice in humanitarian action has been emphasized both by donors and humanitarian practitioners. On the donor side, the Good Humanitarian

Donorship, a network of large institutional humanitarian donors which convened for the first time in 2003 is a telling example. The network has endorsed a set of principles which are related, among other things, to “accountability, efficiency and effectiveness in

implementing humanitarian action” (Good Humanitarian Donorship, n.d). These principles have been further elaborated by good practice guidelines which emphasize the importance of evidence-based decision-making, recognizing “the need to promote the use of existing knowledge and limit the parallel and redundant production of evidence” (Good

Humanitarian Donorship, 2016).

Although the need for good quality evidence and the importance of its use in decision-making is widely recognized, the reality may well be quite different. A paper by DFID (2012) recognized that the “right systems and incentives are not in place to ensure that evidence is available and used to inform decision-making” (p. 5). This may be for several reasons: data may not be available to those in decision-making positions; data may not be available in right formats; data may arrive too late to be able to influence decision-making processes; or may not be valued by actors who are more focused on immediate action (ibid., p. 32).

Studies which have reviewed the ways in which decisions are made by managers in the humanitarian sector are also telling. In general, such studies (Darcy et al., 2013; Darcy, Anderson & Majid, 2007) have found that external information was often found to have limited relevance for decision-makers. Rather, the range of options was “limited by previously decided questions about strategic priorities, available resources, and so on” (Darcy et al., 2013, p. 7) and influenced by “the institutional framework for decisions, the implicit values and assumptions that they applied in making decisions, and the mental models by which they processed available information” (ibid.). Actual operational decisions

(17)

governments and strategic considerations of donors (Darcy, Anderson & Majid, 2007, pp. 18-19).

Using best available evidence for decision-making may, therefore, look easy on paper, but may prove difficult to implement in real-life situations. Nevertheless, taken that it is

considered “unethical to deliver interventions that are at best not proven, are ineffective or, worse still, do actual harm” (DFID, 2012, p. 11), gathering and using good quality evidence in decision-making is and remains of high importance in humanitarian action.

Through highlighting the importance and elements of good-quality evidence in the humanitarian sector, this chapter laid the groundwork for exploring how evidence is

generated through evaluations and the role of indicators and evaluation frameworks in that process in the following chapter. The BOND quality criteria introduced in this chapter will be used in chapter 6 for guiding the formulation of and assessing the quality of the proposed evaluation framework for livelihoods interventions.

(18)

3. Assessing the impact of humanitarian interventions

With impact assessments gaining more and more attention in humanitarian assistance, it is important to delve deeper into the ways impact of humanitarian action can be best

captured and comprehended. This chapter lays the methodological groundwork for the rest of the thesis by defining the key concepts being used, including ‘evaluation’, ‘impact’, ‘indicator’, ‘intervention’, ‘programme’, ‘contribution’, and ‘attribution’. These and other concepts defined in this chapter are used in the same way throughout this thesis.

Further, this chapter lays out the RACER criteria for assessing the strength of indicators, which will be used in chapter 5 for reviewing indicators in use in the livelihoods sector. This chapter also discusses the difficulties related to carrying out impact evaluations in conflict contexts, and looks at the methods by which impact is typically measured in the

humanitarian sector, such as by utilizing ex-post evaluations, randomized controlled trials, quasi-experimental studies, and observational studies. These discussions will be drawn upon in chapters 6 and 7 when the evaluation framework for livelihoods interventions is

proposed, assessed, and discussed.

3.1. Humanitarian crises and humanitarian action

Every humanitarian crisis is different, varying from sudden onset disasters like earthquakes and floods to complex emergencies like armed conflicts which can last for decades.

Humanitarian programming is always highly context-dependent: political, economic, social, technological, legal, and environmental factors all shape the context in which programmes need to be designed, implemented, and eventually evaluated.

The objectives of humanitarian action are to save lives, alleviate suffering and maintain human dignity during and in the aftermath of crises and natural disasters, as well as to prevent and strengthen preparedness for the occurrence of such situations. Humanitarian action is guided by the principles of humanity, impartiality, neutrality and independence (ICRC, 1996). These principles distinguish humanitarian action from other activities, including those undertaken by political and military actors, and are important for

(19)

to secure access to those affected by the crisis (OCHA, 2012). Most humanitarian

organizations have adhered to these principles, often through expressing their commitment to the Code of Conduct for the International Red Cross and Red Crescent Movement and NGOs in Disaster Relief (IFRC & ICRC, n.d).

Another principle which has emerged later is that of ‘do no harm’. Reflecting the

Hippocratic oath in the field of medicine, the ‘do no harm’ principle refers to the obligation to avoid exposing people to additional risks through humanitarian action. The principle recognizes the fact that humanitarian interventions may also have negative effects on the people they are meant to assist, and recognizes the need to take “a step back from an intervention to look at the broader context and mitigate potential negative effects on the social fabric, the economy and the environment” (Bonis Charancle & Lucchi, 2018, p. 9).

Humanitarian principles are directly related to evaluations in two ways. First, evaluations can and should be used to assess whether humanitarian principles were adhered to by the agencies which provided aid to affected populations. Nevertheless, a recent review of evaluations carried out by the United Nations Evaluation Group concluded that

humanitarian agencies are “currently not prioritising (indeed rarely addressing) evaluation against Humanitarian Principles, nor providing adequate guidance to evaluation managers and evaluators” (UNEG HEIG, 2016).

Second, humanitarian principles are also a key reference point in designing and carrying out evaluations of humanitarian action, that is, they should guide how evaluations themselves are carried out. This may relate to, for example, what kinds of questions are asked and in what settings they are asked from those being evaluated. Humanitarian principles, most notably that of ‘do no harm’, will come under closer scrutiny again in chapters 6 and 7 when formulating the generic evaluation framework of livelihoods interventions.

3.2. Evaluating humanitarian action

The Development Assistance Committee of the Organization for Economic Co-operation and Development (OECD-DAC) defines evaluation as the “systematic and objective assessment of an on-going or completed project, programme or policy, its design, implementation and

(20)

results [...] to determine the relevance and fulfilment of objectives, development efficiency, effectiveness, impact and sustainability. An evaluation should provide information that is credible and useful, enabling the incorporation of lessons learned into the decision-making process of both recipients and donors.” (OECD-DAC, 2010, pp. 21-22)

When unpacking the OECD-DAC definition, several aspects of evaluation are worth highlighting. Evaluation, by definition, needs to:

1. be systematic, that is, planned and consistent activity, based on credible methods; 2. keep objectivity in mind, that is, step back from the immediacy of the humanitarian

action and maintain perspective, base findings on credible evidence;

3. enable examination and analysis to determine the worth or significance of the activities being evaluated;

4. enable drawing lessons to improve policy and practice and enhance accountability. (ALNAP, 2016, p. 27)

Evaluations of humanitarian action are carried out for two central purposes: learning and accountability. Learning is the process through which experience and reflection lead to changes in behavior or the acquisition of new abilities (ALNAP, 2016, p. 27). Evaluation at its core should, therefore, enable us to gain information about what and why works and what does not in relation to the problems or situations we are trying to solve, and how to improve performance and design better interventions in the future. In the humanitarian sector in general, evaluations can be very useful in generating knowledge and initiating processes of organizational learning.

In addition to learning, accountability is the second most important goal of evaluation. Accountability, in broad terms, is a process of taking into account the views of, and being held accountable by, different stakeholders, primarily the people affected by the

undertaken activities. It refers to the obligation to “demonstrate that work has been conducted in compliance with agreed rules and standards” and to “report fairly and

accurately on performance results vis-a-vis mandated roles and/or plans” (OECD-DAC, 2010, p. 14). While learning is forward-looking, that is, looking at the past activities in order to improve for future purposes, then accountability is more for backward-looking purposes,

(21)

that is, proving that the activities had the results and created the impact it aimed and claimed to be achieving.

As such, evaluation is an integral part of the humanitarian programme cycle, a widely-accepted model which helps to conceptualize the interconnectedness and cyclical nature of different elements of humanitarian programmes (IASC, 2015): assessing needs, planning, mobilizing resources, implementing and monitoring, and evaluating the impacts (see Figure 1). The cyclical nature of this model helps to highlight the two key goals of evaluation: the backward-looking accountability which aims to see what has already taken place, and the forward-looking learning which helps to start the new cycle with fresh evidence.

Figure 1. Humanitarian programme cycle, with the place of evaluation highlighted (redrawn by the author from IASC, 2015)

The basic unit of such evaluations is an intervention or a programme. Although in the framework of this thesis, these terms are often used also interchangeably, the term

‘intervention’ (or a ‘project’) is taken to mean a set of activities designed to achieve specific objectives within specified resources and implementation schedules (OECD-DAC, 2010, p. 30-31). A ‘programme’, at the same time, denotes a set of interventions which are

implemented to attain the same general goal. An humanitarian organization may, for

Needs assessment and analysis Strategic planning Resource mobilization Implementation and monitoring Evaluation

(22)

example, run several specific interventions in the same country or region and with the same general objective, which together form an humanitarian programme. Although the term ‘humanitarian intervention’ has been used also to denote coercive action taken by one state against another to prevent or put to a halt human rights or humanitarian law violations (ReliefWeb Project, 2008, p. 29), then in the framework of this thesis, the term

‘intervention’ is used only in the context of humanitarian activities which are based on humanitarian principles and as the term is commonly used in the field of humanitarian action. The term ‘intervention’ is used extensively, for example, in the Sphere handbook (Sphere, 2018) to refer to individual humanitarian projects, although the Sphere handbook never explicitly defines it.

3.2.1. Results chain: from inputs to impact

Evaluating humanitarian action can be best conceptualized as taking place in the results chain as used in the logical framework approach (NORAD, 1999; Dearden & Kowalski, 2003). Logical framework helps to follow the inter-related key elements of interventions and highlight the linkages between them, in order to facilitate common understanding of internal logic of interventions. In the logical framework, a results chain is developed which sets out the linear cause-and-effect sequence: from resources, through activities, towards achieving the desired change or result (NORAD, 1999).

There are a number of strengths and weaknesses associated with the logical framework approach. While it does provide a simple and understandable overview of the internal logic of a project or programme, it does so in a rather mechanistic way, whereas in reality change can be a much more complex process (Simister & James, 2015, pp. 2-3). Also, while the approach puts the focus on monitoring and evaluation by ensuring there are clear

benchmarks for success and failure, it may encourage reviews and evaluations to focus on expected consequences (that is, performance against predefined indicators) to the exclusion of unexpected changes, whether positive or negative (ibid.). Nevertheless, due to its

overwhelming use in the sector, the results chain embedded in the logical framework is adopted in the framework of this thesis. The main elements of the results chain of the logical framework are depicted in Figure 2.

(23)

Figure 2. Results chain (adapted by the author from NORAD, 1999)

As seen from the results chain, measuring the results of humanitarian programmes can be done on different levels. The easiest way of looking at the results of humanitarian action would be to look at outputs, that is, the goods or services which have directly resulted from the intervention activities. While outputs are directly attributable to the programme and therefore easily interpretable (such as the number of beneficiaries from the programme, or the number of kits distributed to the affected populations), they do not tell much about the actual change created by these activities. Therefore, in order to account for the actual changes taking place, we need to look at the outcomes and impact level in the logical framework.

Outcomes constitute short-term or medium-term, intended or unintended changes or shifts in conditions due to an intervention (ALNAP, 2016, p. 28). Outcomes may be desired

Inputs

• Financial, human, and material resources used • For example: funds, staff time

Activities

• Actions taken through which inputs (financial and material resources, human work) are mobilized to produce specific outputs

• For example: providing cash grants, distributing livelihoods assets

Outputs

• Goods or services which directly result from the actions undertaken

• For example: number of beneficiaries who received cash grants, number of assets distributed

Outcomes

• Short-term and medium-term effects of the intervention's outputs • For example: number of households whose livelihoods have objectively

improved as a results of the intervention (e.g. after 3 months, after 1 year)

Impact

• Long-term effects produced by the intervention, directly or indirectly, positive or negative, intended or unintended

• For example: impact of intervention on food security of affected communities

(24)

(positive) or unwanted (negative); they can be direct or undirect consequences of the intervention; they can encompass behavior change (actions, relations, policies, practices) of individuals, groups, communities, organizations, institutions or other actors. Outcomes are usually not directly attributable to the intervention, since behavior change can happen for different reasons and pinpointing a single cause may be difficult.

Impact, at the same time, looks at the wider effects of the programme – social, economic, technical and environmental – on individuals, communities, and institutions. Impacts can be intended or unintended, positive or negative, macro-level (whole community or sector) or micro-level (household, individual), short-term or long-term (ALNAP, 2016, p. 29). Impacts distinguishable at a population-level are rarely attributable to a single programme or intervention, but a specific programme or intervention may, together with other similar ones, contribute to impacts on a population.

Evaluating humanitarian programmes can be conceptualized to take place along the results chain, following a set of evaluation criteria. OECD-DAC (2000) has set out evaluation criteria for development programmes which were revised in 2006 for use in humanitarian action (ALNAP, 2006). These are:

1) Relevance: the extent to which the aid activity is suited to the priorities and policies of the target group, recipient and donor;

2) Effectiveness: the extent to which an assistance activity attains its objectives (whether humanitarian assistance is reaching the right people at the right time and whether the intended improvements for beneficiaries are achieved or not);

3) Efficiency: the extent to which the aid uses the least costly resources possible in order to achieve the desired results (it measures the outputs in relation to the inputs);

4) Impact: the positive and negative changes produced by a development intervention, directly or indirectly, intended or unintended;

5) Connectedness: the need to ensure that activities of a short-term emergency nature are carried out in a context that takes longer-term and interconnected problems into account;

(25)

6) Coverage: the need to reach major population groups facing life-threatening suffering wherever they are;

7) Coherence: the need to assess security, developmental, trade and military policies as well as humanitarian policies, to ensure that there is consistency and, in particular, that all policies take into account humanitarian and human-rights considerations. (ALNAP, 2006)

Figure 3. Connection between elements of evaluations in the framework of the results chain (adapted and expanded by the author from Volden, 2018)

When looking at the different aspects of evaluation in the framework of the results chain, multiple axes of analysis emerge (see Figure 3). While full evaluation of humanitarian interventions requires looking at all of these different aspects, this thesis is focused on the

Needs Inputs Activities Outputs Outcomes Impact Effectiveness Re le vanc e Ef fic ien cy Im p ac ts Connectedness C o ver ag e C o h er en ce

(26)

end section of the results chain: on measuring effectiveness (whether the short- and medium-term outcomes were achieved) and impact (what were the longer-term changes which took place within the target population as a result of the intervention). Taken together, in the framework of this thesis, these will be referred to as ‘impact’.

3.2.2. Attribution and contribution

What makes evaluating outcomes and impact difficult, however, is the question of

attribution. In complex humanitarian interventions, it is rarely possible to attribute a result (increased food security among a population, for example) to one specific intervention, because there are other factors (such as other interventions, seasonality, changes in state policies) which influence the final results as well. Attribution becomes more difficult as you move along the results chain and it is thus harder to attribute impacts to a specific

intervention than to attribute outcomes (see Figure 4).

Figure 4. Change in the degree of influence in the context of the results chain (adapted by the author from Smutylo, 2001, p. 5; ALNAP, 2006, p. 357)

While direct attribution may be difficult to prove in a convincing manner, more indirect ways of displaying contribution are more feasible. Analyzing contribution in evaluation refers to finding credible ways of showing that an intervention played some part in bringing about the observed results (ALNAP, 2006, p. 29). Contribution analysis recognizes that

Inputs Activities Outputs Outcomes Impact

D eg ree o f in flu en ce

(27)

several causes might contribute to a result, even if individually they may not be necessary or sufficient to create impact.

In order to attribute change to a particular intervention or programme, it is necessary to demonstrate a causal chain between the action and the effect. There are two main methodological approaches used to establish causality and thereby overcome the attribution question: comparative approaches and theory-based approaches (Proudlock, Ramalingam & Sandison, 2009, p. 25-31). Comparative approaches which aim to set up a comparison group in order to establish causality, most notably exemplified by randomized controlled trials, come under closer scrutiny in chapter 3.4.

Theoretical approaches avoid the need of setting up comparison groups by relying on ‘logical’ assumptions. This may be done, for example, by using the logical framework approach which may be regarded as a hypothesis: if certain outputs are produced and certain assumptions hold true, then the ‘logical’ assumption is that there will be certain positive outcomes (Knox Clarke and Darcy, 2014, p. 40). If an evaluation can demonstrate that the outputs were produced and that the outcomes subsequently occurred, and if beneficiaries or key informants create a narrative link between the inputs and activities on one hand, and outcomes on the other, while also discounting alternative explanations, then the intervention is generally held to have caused (or contributed to) the outcome. Although widely used, in may be less reliable for use in impact evaluations, due to the complex and ambiguous causal chain between outputs and impacts, and due to its inability to account any unintended or unforeseen impacts.

3.2.3. Indicators

In order to effectively highlight contribution of a single intervention to the achieved outcomes and impact, a proper evaluation framework needs to be set out. The main

elements of an evaluation framework are indicators. Indicator is a quantitative or qualitative variable that provides a valid and reliable evidence that something has happened – whether an output delivered, immediate effect occurred or long-term change observed (OECD-DAC, 2010, p. 25). Indicators are usually not direct measurements of the condition under scrutiny, but they measure something which points to, or indicates, that condition. Indicators

(28)

function, therefore, as proxies to the actual change which is foreseen to be achieved with the intervention; evidence from a number of indicators will provide the convincing case for the impact being achieved.

There has been a good deal of research on what makes a good indicator. Humanitarian organizations and donors generally use a checklist of criteria such as SMART (Specific, Measurable, Achievable, Relevant and Time-bound) or RACER (Relevant, Accepted, Credible, Easy, Robust) which help to select and formulate good indicators.

Indicators are SMART when they are (DG ECHO, 2007, pp. 54-55):

1) Specific – indicator should be clearly articulated, well-defined and focused.

2) Measurable – indicator should have the capacity to be counted, observed, analyzed, or challenged. Indicators should be able to determine the degree of completion or attainment. When using the same methodology, findings should be repeatable and comparable.

3) Achievable – targets attached to the indicators should be attainable in the scope of the intervention.

4) Relevant – indicator should be able to detect change and should be related to the specific situation it is ‘indicating’ information about.

5) Time-bound – targets attached to the indicators should include a specific time scale.

RACER acronym, on the other hand, foresees that good indicators are (European Commission, 2005, p. 46):

1) Relevant – indicator should have a strong correlation/relationship with the specific condition it is used to measure/indicate.

2) Acceptable – indicator must be easily understood and should be accepted by all the stakeholders (refers, in part, to indicators being shared).

3) Credible – indicator should be accessible to non-experts, unambiguous and easy to interpret.

4) Easy – data related to the indicator should be collectable with available resources (refers to both ease and cost-effectiveness of data collection).

(29)

5) Robust – indicators should be sensitive enough to monitor changes and take into account the time-lag between the action and the expected change.

While there are overlaps between these two sets of quality criteria (e.g. relevance in both sets, and robustness in RACER and specificity in SMART referring to the same quality), the two sets of criteria are inherently different. While the well-known SMART criteria provides an easily understandable guidance for defining objectives for a specific project or

programme context (especially since it includes context-specific criteria like time-boundness and achievability), the RACER criteria better allows evaluating the quality of indicators in a more general manner. In addition, through its inclusion of acceptability, the RACER criteria is in accordance with the growing trend of standardization and coherence in the

humanitarian sector, whereby good indicators are thought to be shared ones, allowing easier aggregation, comparability, and analysis. This trend is exemplified, among other examples, by the development and use of Sphere standards and its companions, which formulate common standards and indicators for the humanitarian sector (Sphere, 2018). The RACER criteria are used and promoted also by the European Commission (2005, p. 46) specifically for use in indicator definition. For these reasons, this thesis relies on the RACER criteria to assess the indicators in use in evaluating livelihoods interventions, with the corresponding analytical framework developed in chapter 5.1. The RACER criteria for assessing indicators, together with the BOND’s evidential quality criteria for assessing evaluation frameworks in general, form the methodological backbone of this thesis.

One indicator on its own is, however, limited in its utility for understanding the effects of an intervention. It is not sufficiently reliable nor conclusive and, therefore, several indicators need to be used together in order to account reliably for the change as well as the

contribution that any one intervention has made into that change. This is one way any findings can be triangulated, that is, internally verified to increase their accuracy and reliability.

Other ways of triangulating involve using different sources of data and methods of data collection. Using more than three sources and methods help to verify and corroborate findings, and the weakness or bias of any of the used methods or data sources can be compensated for by the strengths of another, thereby increasing the validity and reliability

(30)

of the results (DFID, 2005). Indeed, as emphasized by Canteli, Morris and Steen (2013), “the main guarantor of the validity of the findings is the very broad range of sources and

methods used to assemble evidence and its triangulation” (p. 3; see also Knox Clarke and Darcy, 2014, p. 38). The importance of triangulation is also highlighted in the BOND’s evidential quality criteria.

As the RACER criteria point out, good indicators are usually also shared indicators, that is, accepted and actively used by different stakeholders. To facilitate the sharing of common indicators in the humanitarian field, a number of donors and agencies have developed their indicator registries, databases, and lists. Such registries come in different shapes and sizes, and will be reviewed below (see chapter 5.2).

3.2.4. Units of analysis

Besides indicators, another key element of the evaluation framework which needs to be defined is the unit of measurement and analysis. How counting or estimation is conducted, including the very choice of the unit of enumeration, can reflect basic assumptions upon which an assistance operation has been constructed (Telford, 1997, p. 14). Different units of enumeration can be used to quantify a population: the individual beneficiary, head of household, household, family, dwelling, or targeted group (e.g. the vulnerable, political or ethnic group). Such an apparently innocuous step as the choice of statistical unit can in fact be a clear statement of priority within an assistance programme (ibid.). In the framework of this thesis, three main units of measurement are used: beneficiary, household, and

community.

The term ‘beneficiary’ is used to describe those who are affected by a crisis and receive assistance because of that. They may be direct recipients of goods and services, or indirect beneficiaries of activities such as the rehabilitation of water supplies or of health facilities. The term has been criticized for its implication of passivity of those benefiting from the assistance provided (Vowles, 2018). Alternative terms, such as ‘constituents’ or ‘clients’, have been proposed but have not taken hold in the sector (ibid.). Due to its common use in the sector, however, the term ‘beneficiary’ is also used in this thesis. Nevertheless, the

(31)

so-called beneficiaries are and should be regarded as active participants in determining their own fates.

A ‘household’ is commonly defined as a group of people, with or without blood relation, who live together in the same lodging and who share the same food. In addition, household members usually recognize the authority of the same household head (man or woman). While humanitarian actors have no standard and agreed clear definition on the definition of household, it is usually presumed to represent the most appropriate social unit for aid delivery (Flintan et al., 2019, p. 1). The concept of household may be criticized, however, for failing to represent the complex structures of family and living arrangements in all cases and regions, and the temporal and spatial dynamics and variabilities within households,

including the place of households within a wider social landscape (ibid.). In addition, humanitarian actors often make assumptions about average household size and gender composition in the absence of serious qualitative and quantitative evidence to support them, thereby risking to misrepresent the actual complexity on the ground (Telford, 1997, p. 14). This, in turn, may have potentially significant implications on measuring household welfare and production (Beaman & Dillon, 2012). Nevertheless, due to its common use in the sector and in programme evaluation, the term ‘household’ is used as a unit of

measurement in the framework of this thesis, keeping in mind that the exact definition of this term may vary between regions.

While the ‘community’ is a widely invoked concept in the humanitarian sector, it is used in a large variety of ways, “very often without reflection on its meaning, or even a definition” (Titz, Cannon & Krüger, 2018, p. 1). The ‘community’ is often not a given, holistic, and harmonious social body or institution; if imposed from outside actors, it may serve to “disguise and neglect internal hierarchies, frictions and conflicts and is rather for the ‘outsiders” convenience than serving the interests of the group members” (ibid., p. 14). In the framework of this thesis, however, the term ‘community’ is defined primarily as the small-scale geographical region where the humanitarian intervention takes place. Depending on the context, it may range from a village to a city neighborhood, from an informal rural refugee settlement with a widely recognized spokesperson to a religious group scattered around an urban area but feeling a belonging to a common social structure. In the context of this thesis and in order to avoid the common problems associated with the

(32)

use of this term, the ‘community’ is invoked primarily for the purpose of identifying key informants for assessing the broader impact of a particular humanitarian intervention, such as locally recognized community leaders (elected or otherwise), social workers, business owners, and religious leaders.

3.3. Challenges to evaluation in humanitarian context

Evaluation of humanitarian action faces two broad sets of challenges: those that are common to all evaluations, often accentuated in humanitarian contexts, and those that relate specifically to evaluating humanitarian action, often in difficult environments. The very existence of a disaster makes the humanitarian impact evaluation considerably more difficult when compared to similar assessments carried out in regular settings, e.g. on social or development programmes. When evaluating the impact of humanitarian programming, activities may need to be carried out in situations of “mismatch between resources and needs, disruptions to everyday life, security concerns, the typical absence of baseline data, logistic, medical and other hurdles to sampling and data collection, with finding a valid counterfactual, and with ethical implications” (Puri et al., 2017, p. 520).

This thesis has set its focus on conflict-affected situations which create a particular set of challenges to evaluations. The conflict-affected status of a country, as used in this thesis, is defined by the World Bank Group’s list of fragile and conflict-affected situations (FCS), identified based on a threshold number of conflict-related deaths relative to the population (World Bank, n.d).

There are, therefore, numerous challenges which need to be taken into consideration when evaluations are carried out in conflict settings, from safety and security of staff to actual access to people concerned. These challenges may render evaluations very costly, incomplete, or even nearly impossible to carry out. This has led, in turn, to evaluators of humanitarian action coming up with clever means of gathering relevant data and setting up evaluation mechanisms that could effectively measure the impact of programmes on their beneficiaries. Following, the main challenges to evaluation of humanitarian action are highlighted and some possible mitigation measures suggested.

(33)

First, insecurity or damaged infrastructure may mean lack of access to beneficiaries or other stakeholders in conflict environments. Limitation to generating and validating evidence when access is lacking may restrict the overall feasibility of evaluating programmes in crisis settings. There are, however, also creative ways being proposed to carry out evaluations remotely, such as by using local evaluators, by carrying out surveys online or via mobile phones or SMS, or by crowd-sourcing data via social media platforms (Norman, 2012). Each of these methods pose additional hurdles to gathering valid and reliable data, but in

insecure and volatile contexts, using them may prove a necessary trade-off to make.

Second, lack of reliable baseline data may make it difficult to evaluate the effects of

interventions when they end. Data may have been destroyed as a result of the crisis, or may have become irrelevant when, for example, a large proportion of the population has been displaced. In these circumstances, it may be worthwhile to conduct interviews to ask crisis-affected people and local key informants about the extent to which conditions have changed and the reasons for the changes, that is, to use recall methods for evaluation purposes (ALNAP, 2016, p. 33).

Third, conflicts often intensify differences in perspective and thus events, even the crisis itself, may be subject to widely differing interpretations. Providing assistance to different population groups may have political implications, particularly in complex emergencies where there are already political and factional rifts between different groups within an area (Knox Clarke & Darcy, 2014, p. 19). Under these circumstances, humanitarians cannot

assume that information – and particularly information from key informants – is objective or accurate; nor can they automatically assume that they themselves are free of bias (ibid.). For evaluation purposes, it is important to familiarize oneself with the fault-lines in the conflict and gather as many different points of view as possible, particularly from groups on different sides in the conflict, and ensure these different viewpoints are taken into account when analyzing the evaluation data (ALNAP, 2016, p. 34).

All in all, evaluations of humanitarian action often take place in data-poor, politicized and complex environments, where physical access is limited, populations are mobile, and where there are many different actors, all of whom wish to confirm their view of what has

(34)

that make information collection so important are precisely those that make it extremely difficult to do” (p. 19).

3.4. Methods and study designs of evaluation

There are several ways evaluations of humanitarian programmes have been done, or have been proposed to be done, which will now come under closer scrutiny. Building on the discussion above about the challenges of attributing any changes to any particular

interventions, this chapter aims to discuss the benefits and challenges of different types of evaluation study designs in order to inform the formulation of the generic evaluation framework for livelihoods interventions in chapter 6.

Following, four general ways of approaching evaluation are discussed: ex-post evaluations, randomized controlled trials, quasi-experimental studies, and observational studies. The evaluation framework proposed later in this thesis falls under the latter type. In addition, evidence aggregating methods like systematic reviews and meta-analyses are also briefly introduced.

3.4.1. Ex-post evaluations

The typical impact assessment type carried out in the humanitarian sector today is an ex-post evaluation. This is the evaluation carried out after implementation of a humanitarian intervention, a programme, or a set of programmes, relying mostly on primary data gathered in one point in time and historical secondary data. This approach aims to present the overall impact of the intervention on different groups of people in the affected society, typically based on macro-level data, and thereby produces results which are rather broad and indirect. This is the method typically used to evaluate the overall response by the humanitarian community to emergencies.

One of the most well-known and influential ex-post evaluation is the Tsunami Evaluation Coalition’s (TEC) report on the humanitarian assistance carried out in response to the Indian Ocean tsunami of 2004 (Tsunami Evaluation Coalition, 2006). The report contained several important conclusions; for example, it was found that humanitarian assistance was

(35)

disseminated disproportionately to areas that were easily served by transportation, rather than based on the need of different groups of affected populations, and that the old and disabled were often excluded from benefits because they were poorly informed about its availability.

Such approach, while highly relevant in assessing the overall response to the disaster, has its obvious limitations. For one, macro-level response evaluations cannot effectively pinpoint which parts of the response fared better than others, and to what degree. For example, the TEC evaluation could not determine how much worse off the aforementioned vulnerable groups were in comparison to the population groups which did receive assistance. This was because the TEC report relied on macro-level secondary data which, in essence, did not allow for closer scrutiny.

That has led many critics to say that while the number of evaluations has proliferated, there is a general lack of “theory-based, reliable evidence causally linking interventions to relevant outcomes” (Bozolli, Brück & Wald, 2013, p. 519; authors’ emphasis). When reviewing impact evaluations carried out in the humanitarian sector, Puri et al. (2017, pp. 526-527) found more than 900 existing evaluations of humanitarian interventions published between 2001 and 2014, but only 31 of them could be classified as ‘true’ impact assessments (as opposed to general scope ex-post evaluations).

3.4.2. Randomized controlled trials (experimental studies)

In response, many researchers have advocated for an empirical approach in determining the effectiveness of development and humanitarian programming. In their paper, Esther Duflo and Michael Kremer (2003) make the case for testing programme effectiveness in the field by utilizing randomized controlled trials (RCTs), best known from the medical sciences. (It should be noted that Duflo and Kremer were awarded with the Nobel Memorial Prize in Economics for their work on RCTs in development sector in 2019.) This approach effectively means dividing ‘subjects’ randomly into two groups (‘treated’ and ‘un-treated’, or ‘aided’ and ‘un-aided’ in this case) and manipulating an independent variable. Effectiveness could then be inferred from the comparison of the results of either group in the chosen issue area without fear of bias.

Evaluating the impact of livelihoods programmes in conflict-affected situations