Impact assessment of capacity building: A comparative case study on the ability of organisations to assess the impact of their programmes

(1)

1

Impact assessment of capacity building: A comparative case study on the

ability of organisations to assess the impact of their programmes

by

Myrthe den Ouden

Supervisor: Dr Clara Egger – University of Groningen August 2019

This thesis is submitted for obtaining the master’s degree in International Humanitarian Action. By submitting the thesis, the author certifies that the text is from

his/her hand, does not include the work of someone else unless clearly indicated, and that the thesis has been produced in accordance with proper academic practices.

(2)

2

Abstract

Capacity building activities are important to humanitarian practice, because they increase the level of knowledge and skills among first responders, which improves the recovery process and increases the resilience to future emergencies. This thesis uses the definition of capacity building formed by Brown, LaFond and Macintyre (2001), which defines capacity building as “a process that improves the ability of a person, group, organisation or system to meet its objectives”.The impact of capacity building activities is not always thoroughly evaluated. If the impact of capacity building

programmes is unknown, it is harder to improve future programmes and provide an evidence base to justify future funding of similar programmes. This study aims to contribute knowledge on the factors that enable or impede impact evaluations of capacity building projects. Building on previous literature in other fields, evaluation in the humanitarian sector and organisational change theory, this thesis answers the question: what factors influence the ability of an organisation to assess their capacity building projects?

Based on a review of the literature, a list of possible factors is identified. One of these factors, the relation with the client organisation, was chosen to be the explanatory variable in a comparative case study. The cases were chosen to be as similar as possible on all the other factors. The interviews with employees of the chosen organisations were analysed with a thematic analysis. The analysis showed that the relation with the client organisation was indeed a deciding factor in an organisation’s ability to assess the impact of their programmes. However, the results also showed that the subfactors of this relation were different than expected. The pre-training process was more important than was expected based on literature. Further research is needed to find out what other factors have an influence on impact evaluation in the humanitarian sector as a whole.

(3)

3

List of abbreviations

ALNAP Active Learning Network for Accountability and Performance ASTD American Society for Training and Development

MEL Monitoring, Evaluation and Learning M&E Monitoring and Evaluation

OECD Organisation for Economic Co-operation and development Q&E Quality and Effectiveness

RUG Rijksuniversiteit Groningen

UN United Nations

US United States

List of tables

Table 1: Evaluation of programmes ... 20

Table 2: Evaluation in organisations ... 21

Table 3: Percent not evaluating at each level for selected reasons ... 26

Table 4: The six steps of a thematic analysis ... 38

Table 5: Phases of the thematic analysis in this research ... 39

Table 6: Survey results - monitoring systems ... 42

Table 7: Survey results - programme design ... 43

Table 8: Initial codes ... 45

Table 9: Reviewed codes ... 51

List of figures

Figure 1: Relation with client organisation, its related and connected codes ... 53

(6)

6

Preface

Throughout my master I became increasingly interested in the monitoring and evaluation of humanitarian action. My internship at RedR UK in the quality and

effectiveness team was therefore a great opportunity to learn more about this field in humanitarian action. Conversations with colleagues have led me to the topic of this thesis, though it took a while for the topic to become specific enough to formulate a question. Writing this thesis has been interesting and challenging at the same time, but I am thankful to have been able to write it about a topic I believe is important.

Writing this thesis would not have been possible if it were not for the support of several people. First, I would like to thank my supervisor Dr. Clara Egger, whose expertise and knowledge was of invaluable help to me throughout the entire process. Without her, I would probably still be writing possible options for a research question. I would also like to acknowledge the help of my colleagues during my internship at RedR, who, probably unknowingly, helped me find possible topics for this thesis. I would particularly like to thank my supervisor, Selma Scheewe, for helping me think about possible topics and giving me possible starting points.

I would like to thank the participants to this thesis, who shall remain anonymous, but who spent their valuable time filling out a survey or doing an interview, which was of invaluable help to me.

I want to thank my family and Ulbe for talking some confidence into me and being a sounding board and a proof-reader. Lastly, I want to thank my friends for dragging me to the library on days with little motivation and for making me take breaks on days with a lot of motivation.

(7)

7

1. Introduction

During the 2016 World Humanitarian Summit, a new way of working was introduced. One of its goals was that “efforts should reinforce and strengthen the capacities that already exist at national and local levels” (Agenda for Humanity, 2017). This statement is part of a larger movement within the humanitarian sector to reduce the gap

between relief and recovery. As a part of this movement towards not only relief but also recovery, capacity building has become an integral part of the humanitarian sector. During the 31st_{International Conference of the Red Cross and Red Crescent, a} panel on ‘Bringing relief to development’ emphasised the need for ‘long-term capacity building strategies’ in the humanitarian sector (Tammminga, 2011). Capacity building is part of a constructive attitude towards partnerships in the humanitarian sector, which recognises that local organisations have intrinsic capabilities on which humanitarian cooperation should be based (Audet, 2011). With capacity building becoming a bigger part of the humanitarian sector, it is important to evaluate the impact of capacity building programmes. Capacity building can be aimed at different levels of impact, such as individuals, organisations, communities, or age or gender groups. Most studies included in this thesis deal with capacity building activities aimed at organisations (Popescu & Dewan, 2009; Klein et al., 2009; Minzner, Klerman, Markovitz, & Fink, 2013; Markovitz & Magged, 2008; Campbell, 2011). Impact evaluations enable organisations to improve current and future programmes. They also extend the evidence base of capacity building programmes, which is essential for the development of the humanitarian sector and the gathering of funds.

Unfortunately, the current state of the impact evaluations in the capacity building field leaves room for improvement, both in academic evaluations and evaluations done by organisations themselves. Research about impact evaluation in the humanitarian sector notes a lack of quality evidence (Roberts & Hofmann, 2004; Smith, Roberts, Knight, Gosselin & Blanchet, 2015; Blanchet et al., 2017). Furthermore, the few studies on the impact of capacity building often struggle with internal validity issues, such as the absence of baseline data or the lack of a control group, or investigate only the direct outcomes instead of the impact of a programme (Minzner, Klerman, Markovitz & Fink, 2013).

(8)

8 Evaluations done by organisations are also not always up to the mark. Most evaluation frameworks are built on the Kirkpatrick model of evaluation, which has four levels: reaction, learning, behaviour and results (Twitchell, Holton & Trott, 2008; Pearson, 2011; Kirckpatrick partners, 2019). However, few include all levels of the model

(Twitchell, Holton & Trott, 2008; Moller & Mallin, 1996). Though little research is done about the evaluation of capacity building in the humanitarian sector, research in other sectors shows that evaluations of training and capacity building programmes are mostly paying attention to the direct reactions to the event, not the longer-term impact of it. The majority of trainings are evaluated at the level of reactions,which includes the participant’s opinion on the training and the relevance to their job (Moller & Mallin, 1996; Twitchell, Holton & Trott, 2008). However, less than half of all

programmes are evaluated at the level of results, which is the degree to which targeted outcomes occur as a result of the training(Moller & Mallin, 1996; Twitchell, Holton & Trott, 2008).

Some of the previous literature also tries to investigate why organisations do or do not evaluate a project at all levels and name several possible factors, including a lack of resources, underqualified staff, lack of time, lack of managerial support and

methodological issues as possible options (Moller & Mallin, 1996; Twitchell, Holton & Trott, 2008; Attia, Honeycutt & Attia, 2002). Even though the studies on the amount of evaluation at different levels were done in various sectors of industry, it is likely their findings can be applied to the humanitarian sector. When capacity building is done in the context of the humanitarian sector, it adds a set of difficulties for evaluation. Wing (2004) lists seven issues that the humanitarian field encounters when assessing the effectiveness of capacity building. They include the issue of whose goals should be used to measure impact, the unrealistic timetables which are set for capacity building and its evaluation, and the high turnover in the humanitarian sector. The issue of the high turnover creates the need to not only measure the impact a capacity building project has had on the individuals in the organisation, but also the impact the training had on the systems and processes of the organisation (Wing, 2004). This final factor is related to research in the field of organisational change theory, which states that the success of a training is partly determined by events that take place after the training

(9)

9 has finished (Haddock, 2015; Colquitt et al., 2000; Arthur, Bennet, Edens & Bell, 2003). This, along with the final factor mentioned by Wing (2004) means that the relation the training organisation has with the client organisation is also a possible factor

determining an organisation’s ability to assess the impact of their capacity building activities.

The factors mentioned in the previous paragraph have been studied in various fields of literature, such as organisational change literature, literature on training evaluation in the private sector and literature on evaluation of humanitarian projects. All these fields propose different factors that might influence an organisation’s ability to assess their impact, but they have never been applied to capacity building activities in one study. This thesis aims to investigate these factors in more detail and assess their relevance to the evaluation of capacity building projects. The main question that forms the basis of this thesis is: what factors influence the ability of an organisation to assess their capacity building projects? This question will be answered using the following sub questions. What factors have been identified by past literature on the topic of impact assessment of capacity building? What methodologies and target populations were used to identify these factors? Why do people who are responsible for the evaluation of capacity building activities find these factors to be relevant to their work, if at all? Does a difference between the interviewed organisations in one of the factors coincide with a difference in that organisation’s ability to assess the impact of their activities? By answering these questions, this thesis aims to contribute to the knowledge on the factors that enable or impede impact evaluations of capacity building projects. This will enable humanitarian organisations to see what factors are impeding evaluations and what they can do about it. It is essential that the impact of capacity building

programmes is monitored, in order to implement and improve capacity building projects. Because most research on the quality of evaluation has been done in the health cluster of the humanitarian field, this thesis will also broaden the knowledge base of humanitarian evaluations in other sectors.

This thesis uses a qualitative approach and uses semi-structured interviews as the main data source. It builds, at the empirical level, a most similar systems design, also called the method of difference. This design compares cases that are similar in most

(10)

10 variables but different in one (Levy, 2008). The dependent variable that was compared between the organisations is their ability to assess the impact of their capacity building programmes. The organisations were chosen to be as similar as possible in terms of evaluation methodologies, time frames and types of questions. The explanatory, or independent, variable that was different in both organisations was the relation the organisation had with their client organisation, measured by their pre-training and post-training programme design. Organisations that provide capacity building services were contacted in order to get a clear picture of their monitoring system and their capacity building projects. This was done via a survey, which was sent out to a list of thirteen preselected organisations. These organisations were selected on the

requirements that they offered capacity building programmes intended to improve humanitarian action and that they had a website in English which was featured on the Reliefweb.int training page. Sampling cases like this could have led to a sampling bias. By looking up organisations on the Reliefweb.int training page and only shortlisting English-speaking organisations, some organisations may have been excluded. All organisations that were on Reliefweb.int also had English websites, so it is their presence on Reliefweb.int, not language that is the limiting factor. Unfortunately, Reliefweb.int did not publish any information about the global coverage of their users, which makes it hard to determine if there might be a selection bias.

Based on the responses of the survey, the most suitable organisations for a most similar case design were selected. Four organisations were selected to take part in the research based on the criteria shown above. However, two of the four organisations did not wish to participate in the research. This forms a second limitation of this thesis. Even though the loss of two cases reduced the amount of data in the thesis, the data that was collected formed a good comparative case study design, because they had similar monitoring systems and used the same evaluation methodologies. The Monitoring, Evaluation and Learning (MEL) coordinators or managers of these organisations were interviewed about their monitoring system and factors they thought would influence their organisations ability to assess the impact of their programmes. The interviews took place over Skype.

(11)

11 During the entire research process, this thesis complied with the RUG (Rijksuniversiteit

Groningen) International Relations ethics guide. The following measures have been

taken to ensure compliance with this guide. First, before both the survey and the interview, participants were informed of the scope and subject of the research, the voluntariness of participation and the right to withdraw consent. In the survey, this is done in the first question (which was mandatory). In the interview this was done by explaining consent on record before starting the interview. The personal information gathered during the survey consisted of the name of the employee’s organisation. The personal information gathered during the interview consisted of the name, email address and job title of the interviewee. The names of interviewees and organisations and email addresses will only be accessible to the researcher and supervisor and will be deleted after a maximum of two years. The job titles of the interviewees will be published in the research in order to be able to justify the choice of interviewee. A possible negative consequence for participants is that they might find it uncomfortable to talk about programmes that had little or no impact. This discomfort was decreased by assuring participants of the anonymity of the study.

As discussed earlier in this introduction, the literature points to several factors that are relevant to an organisation’s ability to assess the impact of their capacity building activities, namely ‘the amount of resources’, ‘managerial support’, ‘underqualified staff’, ‘lack of time’, ‘attitude of donors’, ‘methodological issues’, ‘unclear definition of impact’ and ‘relation with client organisation’. The main result of the research process was that the factor ‘relation with client organisation’ was one of the most prominent themes in the interviews, along with ‘the amount of resources’, ‘methodological issues’, and ‘a clear definition of impact’. However, the codes linked to the factor ‘relation with client organisations’ were different than anticipated based on the literature. One of the organisations did mention that having more contact with the client organisation after the training would benefit the organisation’s ability to assess the impact of their activities. However, the code that dominated this factor was the different stance the organisations took on evaluation. One organisation saw evaluation as one of the given tasks of the organisation and always included it in their

(12)

12 organisation presented evaluation as an option that the client organisation could choose, which meant that they had to rely on incidental evidence of impact. Based on the results, it can be concluded that the relation with the client organisation has a deciding effect on an organisation’s ability to assess the impact of their capacity building activities. One of the implications for humanitarian organisations would be that they need to take a critical look at not only their evaluation methods but also their pre- and post-training design in order to improve their impact assessment capabilities. This thesis will consist of the following chapters: chapter 2, which will discuss the concepts used in this thesis, past literature on the topic and the theories that the thesis is based on, and chapter 3, which will outline the methods of data collection and analysis. This will be followed by a description of the results in chapter 4 and a

(13)

13

2. Theoretical framework

This chapter discusses the key concepts used in the research, namely the concepts of capacity, capacity building and sustainable impact. After the conceptual analysis, this chapter discusses the theories behind the practice of capacity building evaluation, including the Kirkpatrick model and its weaknesses. Following that there will be an analysis of the literature on training evaluation and organisational change. This chapter will also take a closer look at the literature on evaluation in the humanitarian sector. Finally, this chapter will discuss what factors could influence an organisation’s ability to assess the impact of capacity building programmes.

2.1 Conceptual framework

Three concepts form the conceptual backbone of the thesis: ’capacity’, ‘capacity building’ and ‘sustainable impact’. They will be discussed in the following section in that order.

2.1.1 Capacity and capacity building

The word ‘capacity’ can have multiple meanings. It can mean the amount that can be held or produced by something, the ability to do something or a particular role (Cambridge Dictionary, 2019). Potter and Brough (2004) recognise that each of these meanings can be associated with capacity building in the humanitarian or development field. A hospital might not have enough capacity in the sense that it does not have enough beds; does not have enough knowledge or skills to carry out the work; or lacks decision making capacity, because the key decision makers are not in managerial positions (Potter & Brough, 2004). Though all these problems may be described as a hospital not having enough capacity, they have very different solutions. Most definitions in the field of capacity building lean towards the second meaning, the ability to do a certain thing. The Organisation for Economic Co-operation and

Development (OECD) has defined capacity as “the ability of people, organisations and society as a whole to manage their affairs successfully” (OECD, 2006). Though it covers the meaning of capacity, this definition is regarded as quite broad (Aragón, 2010). In the book Measuring capacity building, Brown, LaFond and Macintyre (2001) define capacity as: “the ability to carry out stated objectives”. This definition is shared by

(14)

14 many in the academic field (Baillie et al. 2008; Goodman et al. 1998; Shrimpton et al., 2013) and will be used in this thesis.

Where there is some consensus about the meaning of ‘capacity’, the meaning of ‘capacity building’ is more disputed. It is sometimes equated with training (Bower, 2000; Fricchione et al., 2012); however, this is not widely agreed upon. The difference between training and capacity building is discussed by Potter and Brough (2004) in their study on systematic capacity building. Potter and Brough (2004) emphasize that capacity building is not a synonym of training, because capacity building to them implies creating sustainable and robust systems, not just more knowledge. This has implications for the evaluation of capacity building. An increase in knowledge can be evaluated directly after an event by assessing the knowledge of a participant.

However, an improvement in sustainable systems takes longer to develop and therefore the evaluation of a capacity building event will have to take place over a longer period of time.

There is a widespread consensus that capacity building is a process (OECD, 2006; Brown, LaFond & Macintyre, 2001; Potter & Brough, 2004; Shrimpton et al., 2013; LaFond, Brown & Macintyre, 2002; Lusthaus, Anderson & Murphy, 1995). More contested is the goal and target of such processes. The OECD describes capacity building as a process intended to “unleash, strengthen, create, adapt and maintain capacity”. It is based on their broad definition of capacity mentioned in the previous paragraph, which makes this definition quite broad as well (OECD, 2006). According to Lusthaus, Anderson and Murphy (1995) the goal of capacity building processes is to “learn to develop and implement strategies in pursuit of objectives for increased performance in a sustainable way”. This definition is more specific, limiting capacity building to only the development and implementation of strategies and not including the development of skills based on an existing strategy. The conceptualisation of capacity building shared by some researchers is that capacity building is a process aimed at improving an actor’s ability to meet certain objectives (Wing, 2004; LaFond, Brown & Macintyre, 2002; Brown, LaFond & Macintyre, 2001). Still these studies are not in agreement as to what those objectives can be. Wing (2004) describes these objectives as “its mission”; Brown et al. (2001) use the words “its objectives”; and

(15)

15 LaFond et al. (2002) write “stated objectives”. Though they use different wordings, these three definitions share the notion of capacity building as a process that increases the ability to achieve some kind of objectives.

What makes these definitions different is the actors they list in their definitions as the possible receivers of capacity building. Wing (2004) has a quite limited definition, since his definition only states ‘an organisation’ as the recipient of capacity building. Though this research will mainly focus on the effect of capacity building on organisations, it still acknowledges that the recipients of capacity building can include other entities as well. Therefore, the definition by Wing (2004) was not chosen for this thesis. Brown et al. and LaFond et al. have relatively similar definitions, because Brown et al. (2001) mention ‘a person, group, organisation or system’ and LaFond et al. (2002) state ‘a person or entity’. The word ‘entity’ in the definition by LaFond et al. (2002) includes almost every group imaginable, whereas ‘a person, group, organisation or system’ in Brown et al. (2001) is more specific. This led to the use of the definition by Brown, LaFond and Macintyre (2001), which defines capacity building as: “a process that improves the ability of a person, group, organisation or system to meet its objectives.” 2.1.2 Impact and sustainability

The words ‘sustainability’ and ‘impact’ are commonly used, both in combination and seperately. The following section explores the meaning of the phrase ‘a sustainable impact’ for this thesis. The word ‘impact’ has many different meanings. One example is the definition used by Roche (2002): “a significant or lasting change in people’s lives, brought about by a specific action or series of actions”. This definition was not chosen because it is not specific enough. Instead, the definition by ALNAP (Active Learning Network for Accountability and Performance) was chosen to be the definition for this thesis, because it is more specific and acknowledges that programmes may not only influence individuals, but also communities or institutions. The ALNAP guide on the evaluation of humanitarian action describes impact as: “The social, economic,

technical, or environmental effect of a project on individuals, gender and age groups, communities, and institutions.” (ALNAP, 2013, p. 16). The guide then adds that impact can be both positive and negative as well as intentional and unintentional. Impact is usually considered to be the highest level of effect or result a project has, preceded by

(16)

16 outputs and outcomes (ALNAP, 2013). Sustainability also has multiple meanings, but, in this thesis, it is used to define a certain kind of impact. A sustainable impact is an impact that is able to continue at a particular level (Cambridge Dictionary 2019). Putting the definitions of impact and sustainability together, a sustainable impact is the effect of a project on individuals, gender and age groups, communities, and

institutions that is able to continue at a particular level. For example, if the impact of a certain project is a lower rate of childbirth mortality, a sustainable impact would be a lower rate of childbirth mortality for a certain period of time.

2.1.3 Sustainable impact in the humanitarian sector

In the past decades there has been a debate whether humanitarian action should aim to have a sustainable impact. Some people argued that humanitarian action should only be aimed at alleviating suffering in emergencies (Barnett, 2005). Since capacity building is usually part of a longer-term plan, it used to be mainly a part of the development sector (Kopinak, 2013). It is still seen as a key component to

development aid, because it has the potential to increase the sustainability of the impact of development aid (Kopinak, 2013). However, in the last two decades the need to build local capacities has become apparent to the humanitarian sector as well. In 2008, Riddell stated that one of the weaknesses of the humanitarian sector has been the failure to effectively contribute to building local capacities. This is in line with the new way of working introduced at the humanitarian summit in 2016. The new way of working includes the goal that wherever possible “efforts should reinforce and strengthen the capacities that already exist at national and local levels” (Agenda for Humanity, 2017). It also includes cooperation between the humanitarian sector, development sector, private sector, governments and United Nations (UN) agencies, blurring the line between these sectors and their goals. This, together with the reality of protracted crises, means that the humanitarian sector is slowly accepting that it should not only respond to emergency needs, but also to the need for “long-term capacity-building strategies” (Tamminga, 2011). And with these “long-term strategies” comes the need for impact evaluations, to evaluate whether or not the long-term strategies were successful.

(17)

17 This section has discussed the concepts most important to this research. It has defined capacity building as: “a process that improves the ability of a person, group,

organisation or system to meet its objectives” (Brown, LaFond & Macintyre, 2001, p. 5). It has discussed the concept of impact and has defended the choice for the definition of impact by ALNAP: “The social, economic, technical, or environmental effect of a project on individuals, gender and age groups, communities, and institutions.” (ALNAP, 2013, p. 16). This section has also discussed the concept of sustainability within the humanitarian sector and concluded that the movement towards a more sustainable humanitarian field and longer-term planning has resulted in a higher need for impact assessment. The phrase ‘a sustainable impact’ was defined by merging the definition of impact and the meaning of sustainable and reads: the effect of a project on individuals, gender and age groups, communities, and institutions that is able to continue at a particular level.

2.3 Impact evaluation in practice

As will be discussed in the next sections of this chapter, some projects are evaluated as by researchers as a part of their studies. However, the majority of capacity building projects is evaluated by the organisation organising the activities or an external party paid by this organisation. The following section discusses the models on which evaluation systems are built, both in the humanitarian sector and in other sectors. 2.3.1 The Kirkpatrick model

Over time, several models for the evaluation of training have been created. In 1987, Stufflebeam introduced the CIPP model, in 1996 Holton’s model was presented and in 2002 Kraiger presented a model based on learning, individual performance and organisation results (Stufflebeam & Zhang, 2017; Holton, 1996; Kraiger, 2002 in Alvarez, Salas & Garofano, 2014). One of the oldest and arguably most influential models is the model presented by Kirkpatrick in 1979. In order for this model to function, one must assume that there is a causal relationship between the educational programme and its outcome (Frye & Hemmer, 2012). This reflects the reductionist background of the model. Reductionism assumes that the whole of anything can be understood by investigating and understanding the contribution of the constituent

(18)

18 parts (Frye & Hemmer, 2012). Underlying this assumption is an assumed linear relation between the elements, meaning that if one of the elements changes the outcome will also change. Reductionism differs from system theory, which states that the whole is bigger than the sum of its parts, and that the relation between the elements is not linear (Frye & Hemmer, 2012). The fact that the Kirkpatrick model is based on

reductionism has implications for the model’s weaknesses. This will be explained in the next section of this chapter; to begin with the model will be explained in more detail. The Kirkpatrick model was designed to help in the evaluation of training, and it measures the result of the training on four levels of evaluation: reaction (1), learning (2), behaviour (3) and results (4) (Kirkpatrick partners, 2019). An effect on the reaction level is the degree to which participants find the training favourable, engaging and relevant to their jobs (Kirkpatrick partners, 2019). It includes for example, materials, instructors, facilities, methodology and content (Moller & Mallin, 2008). Change on the level of learning is measured by the degree to which participants acquire the intended knowledge, skills, attitude, confidence and commitment based on their participation in the training (Kirkpatrick partners, 2019). This level is evaluated to find out which learners have learned what and how various factors like materials and course outline influenced the learning process (Kirkpatrick, 1994 in Moller & Mallin, 2008). The third level (behaviour) is the degree to which participants apply what they learned during training when they are back on the job (Kirkpatrick partners, 2019). Level four is results: the degree to which targeted outcomes occur as a result of the training (Kirkpatrick partners, 2019). Several people have added to, or changed a part of, the model. For example, Tannenbaum, Cannon-Bowers and Mathieu (1993) added post-training attitudes to the Kirkpatrick model and divided the level 3 ‘behaviour’ outcome into training performance and transfer performance (Tannenbaum, Cannon-Bowers & Mathieu, 1993). However, organisations have largely stuck with the original model (Pearson, 2011; Twitchell, Holton & Trott, 1998).

Twitchel, Holton and Trott (1998) focused their research on the Kirkpatrick model because it “has dominated training evaluation discussion since it was first published forty years ago". In 2011 Jenny Pearson called the model “arguably the industry standard” in a paper on evaluation in the humanitarian sector (Pearson, 2011, p. 37).

(19)

19 This model has been the basis of many monitoring frameworks for capacity building activities. Its weaknesses therefore also influence the monitoring of capacity building, and this is what the next paragraph will discuss.

2.3.2 Weaknesses of the Kirkpatrick model

As stated before, the assumptions on which the Kirkpatrick model is based, have an influence on its weaknesses. The model assumes a linear relationship between

different elements of a training and its outcome (Frye & Hemmer, 2012). However, the Kirkpatrick mode has been criticised on the lack of any relation between the different levels of evaluation (Haddock, 2015; Sitzmann, 2008; Alliger & Jannak, 1989; Alliger, Tannenbaum, Bennet, Traver & Shotland, 1997). For example, an effect at level one and two cannot be used to infer an effect at level three and four (Haddock, 2015). Haddock (2015) cites the following studies to highlight this. Sitzmann (2008) carried out a meta-analysis which consisted of 354 research reports and found that self-assessment of course satisfaction (level 1) is only moderately related to learning (level 2). The research on the relations between levels three and four also shows little signs of a causal relation between these levels of the model. Alliger and Janak (1989) did a study on the relation between the evaluation levels and did not find a clear relation between the different levels. A study by Alliger, Tannenbaum, Bennet, Traver and Shotland (1997) confirmed that there that “at most, there are modest correlations between the various types of training criteria”. Knowing this, one would expect that organisations doing capacity building activities and using the Kirkpatrick model as the basis of their evaluations would measure every level of the Kirkpatrick model. Only that way organisations would have a complete picture of the outcomes and impact of their activities. The next section will discuss literature about the amount of evaluation done at each level of the model.

2.3.3 The amount of evaluation being done at each level

The end of the 80’s and beginning of the 90’s mark an increase in the level of interest in evaluating trainings, which led to many publications on how to do evaluations, for example chapters from Goldstein’s book Training and Development in Organizations (Arvey & Cole, 1991), handbook of training evaluation and measurement methods by Phillips (1991) and Evaluation: relating training to business performance by T. Jackson

(20)

20 (1989). Another book that falls within this category is Training for Impact by Robinson and Robinson (1989). This book is unique among the others because it includes data on the frequency of use of each evaluation level.

The study by Robinson and Robinson (1989) was done by means of a questionnaire administrated to the participants at the Training Directors’ Forum in 1987. They concluded that level one evaluations were routine among the respondents. The frequency of evaluation at level two varied greatly. One third of respondents reported not to evaluate at level three and over half of the respondents reported not to

evaluate at level four (Robinson & Robinson, 1989). Carnevale (1990) wrote a study based on a survey from the American Society for Training and Development (ASTD). Unfortunately, the paper by Carnevale is not available anymore, however the abstract and citations in other papers do give some of the results of the ASTD 1990 survey. The survey was sent to 12 corporations and found that the majority of the respondents evaluated programs at the participant reaction level, 25% evaluated at the learning level, and 10% at the behavioural level (Carnevale, 1990; Moller & Mallin, 1996). The ASTD has conducted more surveys on this issue. Unfortunately, their archive is not digitalised, but they are cited in Twitchell, Holton and Trott (2008) and the results can be seen in the table 1 and table 2 below.

Table 1: Evaluation of programmes

Percent of programmes using each level Level 1 Level 2 Level 3 Level 4 Twitchell, Holton & Trott (2008) 73% 47% 31% 21% Training magazine (1996) 83% 51% 51% 44% ASTD Benchmarking Service

(1999)

72% 32% 12% 7%

ASTD “Leading Edge” (1999) 81% 40% 11% 6%

Hill (1999) 81% 53% 31% 17%

Robinson & Robinson (1989) 72% 46% 25% 14%

(21)

21

Table 2: Evaluation in organisations

Percent of organisations using each level Level 1 Level 2 Level 3 Level 4 Twitchell, Holton & Trott (2008) 92% 84% 65% 53% ASTD Benchmarking Forum

(1995)

100% 90% 83% 40%

Training magazine (1996) 86% 71% 65% 49% Moller & Mallin (1996)1 _90% _71% _43% _21%

Carnevale (1990) 25% 10% 25%

Robinson & Robinson (1989) 97% 90% 69% 41% Catelanello & Kirkpatrick (1968) 77% 51% 54% 45%

(Twitchell, Holton & Trott, 2008; Robinson & Robinson, 1989; Carnevale, 1990; Moller & Mallin, 1996)

The other studies shown in tables 1 and 2 will be discussed in this paragraph, starting with Moller and Mallin (1996). They sent out 400 questionnaires to members of an organisation for performance improvers and instructional designers, 191 of which were answered. They asked the respondents if they routinely used each level of the Kirkpatrick model in their evaluations. 90% routinely evaluated at level one, 71% at level two, 43% at level three and 65% of respondents indicated that they routinely evaluated trainings at level four (Moller & Mallin, 1996). However, when they looked at the methods respondents used to evaluate at level four in the Kirkpatrick model, they found that some of these methods could not result in knowledge about the results of their training. “Using the most liberal interpretation, only 40 of 191 [21%] respondents identified methods sufficient to qualify as level Four evaluation” (Moller & Mallin, 1996). Moller and Mallin also investigated some explanations for the low percentage of use for each level which will be discussed in the next section of this chapter.

In the same year as Moller and Mallin (1996), Training Magazine, a professional development magazine on training, human resources and management, held a survey

1_{Question asked if organisations “routinely” used a certain level. The level 4 measurement is adjusted}

(22)

22 under its readers, measuring the percentage of organisations evaluating any programs at a certain level. The original data is not available, so this thesis has to rely on the citation in Twitchell, Holton and Trott (2018). The data they found can be seen in table 2 below. Most of the percentages of use of certain levels are higher than other studies. Especially their finding that 44% of programs are evaluated at level four is a lot higher than the same percentage in other studies. This might be explained similarly as the higher number in Moller and Mallin (1996), who found that respondents indicated they evaluated programmes at level four but used methods that could not result in knowledge about the impact of the training. However, since the original data for the study by Training Magazine is unavailable, this cannot be checked. Twitchel, Holton and Trott (2008) state that “because it is totally inconsistent with all other data, one has to question the validity of their finding”.

The studies discussed above were done in the field of training by sending surveys out to employee’s organisations or subscribers of a magazine about training. Later studies were done on other sectors of industry, for instance healthcare and technical training (Twitchell, Holton and Trott, 2008). In his doctoral dissertation from the University of Texas in Austin, Hill (1999) studied the percentage of programs using each level of evaluation in the United States (US) healthcare industry. Unfortunately, only dissertations from 2001 and later are stored electronically. Therefore, the current study has to rely on the cited numbers in Twitchell, Holton and Trott (2008). Hill found that 81% of programs are evaluated at level 1, 53% at level 2, 31% at level 3 and 17% at level 4. Hill used almost the same survey as Twitchell, Holton and Trott would do in 2008, who sent their survey out to organisations providing technical training. 146 of 322 surveys were returned. They found similar data as Hill (as cited in Twitchell, Holton & Trott, 2008), the ASTD (as cited in Carnevale, 1990) and Robinson and Robinson (1989) did and used this comparison to state that the level of evaluation had not changed much in the last 30 years (Twitchell, Holton & Trott, 2008). They made tables to compare the past thirty years of studies on the matter, which have been recreated in table 1 and table 2. Even though these percentages differ between studies and none of them are done in the humanitarian sector, the general trend is that impact

(23)

23 been no large-scale study on this in the humanitarian sector, Pearson (2011) observes that most of the training monitoring takes place at level one and two.

The lack of evaluation of the level of impact is problematic, because, as stated in the previous section, reactions of participants and the change in their knowledge cannot be used to measure the impact of a capacity building activity (Sitzmann, 2008; Haddock, 2015; Bates, 2004; Alliger & Janak, 1989; Alliger, Tannenbaum, Benett, Traver, & Shotland, 1997). If a programme’s impact is not measured, that programme cannot be properly evaluated. Evaluations are not only important for organisational learning, but also for providing an evidence base for future programmes. If there is no proof of the impact of capacity building, donors will be less likely to fund similar

programmes in the future. Therefore, it is important to know why organisations do not measure the impact of their programmes, which is what the next section will discuss.

2.4 Factors influencing an organisation’s ability to assess their impact

The research done about the factors influencing evaluation at different levels of the Kirkpatrick model has included studies in various sectors. First, factors found in studies in the private sector will be discussed and compared with the limitations in studies evaluating humanitarian programmes. Second, possible factors from organisation change theory will be explored. Third and last, factors from research about the quality and quantity of humanitarian evaluations will be discussed.

2.4.1 The evaluation of capacity building in the private sector

This section will mostly discuss studies on the evaluation of capacity building from the private sector. Even though the humanitarian sector was not studied in this research, its findings are still relevant to this thesis, because they give an insight into the factors involved in evaluating capacity building at outcome and impact levels. The relevance of some of the factors is confirmed by studies evaluating a capacity building project. The limitations of these studies often coincide with factors found in more thorough

research outside of the humanitarian sector. One of the first to investigate motivations for evaluation were Moller and Mallin (1996). In the same questionnaire that asked questions about the use of the evaluation levels, they also asked about the reasons for

(24)

24 not conducting an evaluation at all levels. 40% of the 191 respondents in their study gave the reason that evaluation was not part of their job description. 88% of

respondents said that there were barriers to evaluation in their organisation. The most mentioned barrier was a lack of time or resources. Another often mentioned barrier was a culture resistant to evaluation and a lack of access to data. A few also mentioned a lack of training on evaluation methodologies (Moller & Mallin, 1996). The finding of Moller and Mallin that time is the most prominent factor in the eyes of the

respondents is reflected as well by the many studies that cite the lack of time as the main limitation of their research. It was mentioned as a limitation in a study by

Popescu and Dewan (2009) which evaluated a capacity building project for faith-based or community-based organisations in the US. Unrealistic time frames are also one of the seven issues stated by Wing (2004) in his paper on assessing the effectiveness of capacity-building initiatives. Several other researchers have noted that their research fails to measure the long-term effects (Minzner, Klerman, Markovitz & Fink, 2013; Klein et al., 2009; Sobeck, 2008). Campbell (2011) is the only study that investigated the effect that capacity building for non-profits had on the clients of those non-profits, thereby measuring the impact of the capacity-building programme on society. This study is also one of the few to include a follow-up measurement, thereby investigating the sustainability of that impact. Using a longitudinal setup, Campbell (2011) studied a capacity-building programme for community-based non-profits in California. They found that the capacity-building programme had a positive effect on the short term, but the effect was no longer there at the follow-up measurement (Campbell, 2011). The factors found by Moller and Mallin were later explored in a more detailed way by Attia, Honeycutt and Attia (2002) in a paper solely dedicated to reasons for not conducting evaluations in the area of sales trainings. During their meta-analysis of quantitative studies on the issue, they found a number of reasons. They categorised these reasons as managerial perceptions, evaluation restrictions, methodological shortcomings and lack of empirical evidence (Attia, Honeycutt, & Attia, 2002). They cite the following managerial perceptions that are a barrier to evaluations: the belief that a successful company implies an effective training programme, a general belief that all training is good, a fear for negative results, the feeling that it is not a manager’s

(25)

25 responsibility to evaluate and the question if evaluations are worth the time and money. On top of these managerial perceptions there are evaluation restrictions such as a lack of time, money and effective evaluation tools. Under the third category of methodological problems, Attia, Honeycutt and Attia (2002), state a number of

methodological issues that can restrict evaluation. They include, the effect of variables outside of the trainer’s control and data collection problems, as well as the problem that some trainers work only in one branch of industry and thus have little means of comparing their findings. Furthermore, they mention the methodological issue that there is a need of confidential responses while making sure that everyone fills in evaluations for the right training. Another often cited problem is the fact that a control group is often lacking, and if there is one, it is usually not randomly assigned. Lastly, they name the lack of empirical evidence for evaluation models as a reason that evaluations are not always conducted (Attia, Honeycutt, & Attia, 2002). One of the new factors found by Attia et al. (2002), the lack of control group, is a factor

mentioned as a limitation in studies evaluating capacity building programmes as well. Sobeck (2008) and Leake et al., (2007) both noted the limitation of lack of control group, which was not possible in the programme designs they were evaluation. Though this does take away a part of the internal validity of the studies, the nature of the humanitarian sector is so that control groups, especially randomly assigned control groups, are rare.

Twitchell, Holton and Trott (2008) were the first to do research on the reasons for a low evaluation rate per level on the Kirkpatrick model. The reasons that was highest for all levels what that evaluation was not required by the management. Other reasons for not conducting level one evaluations were the cost and the perceived lack of value. The top two reasons for level two were the lack of training in evaluation methods and a lack of time. For level three and four the top reasons were the same, namely a lack of training and the costs of evaluating level three and four (Twitchell, Holton, & Trott, 2008). See table 3 below for an overview.

(26)

26

Table 3: Percent not evaluating at each level for selected reasons

Reason2 _{Level 1} _{Level 2} _{Level 3} _{Level 4}

Not required by management 28.8% 36.9% 44.1% 42.3%

Perceived little value 18.9% 19.8% 13.5% 15.3%

High cost 10.8% 18.0% 36.9% 36.9%

Not legally required 9.9% 14.4% 7.2% 8.1%

Lack of training in evaluation methods 9.0% 23.4% 34.2% 39.6%

Lack of time 8.1% 21.6% 3.6% 5.4%

Union 1.8% 4.5% 2.7% 1.8%

Prohibited 0.9% 3.6% 1.8% 1.8%

(Twitchell, Holton & Trott, 2008)

2.4.2 Organisational change and impact assessment

Another possible factor that could influence whether an organisation conducts an impact evaluation is the relation between the capacity building organisation and the client organisation. Programme failure or success is largely due to what happens after the programme has finished. In a study by the ASTD in 2006, 70% of organisations indicated that programme failure was due to something that happened after the programme, 20% failed because of something that happened before the programme and only 10% indicated that the reason for programme failure was the learning intervention itself (Haddock, 2015). They were not the first to make this link between programme success and post-training actions. Using a meta-analysis, Colquitt et al. (2000) found that job involvement and organisational commitment were positively related to knowledge transfer and job performance. Even earlier, Tracey, Tannenbaum and Kavanagh (1995) found that “both transfer of training climate and continuous-learning culture had direct effects on post-training behaviours”. In addition, Arthur, Bennet, Edens and Bell (2003) argued that post-training environment pays an important role in transferring trained skills to the job and effects the impact of the training. Therefore, studies looking for effects on a behavioural or results level should look into the post-training environment as well (Arthur, Bennett, Edens & Bell, 2003). This means that organisations have to remain in contact with their client organisations

(27)

27 after the programme has ended in order to fully understand the impact of their

programme and the reasons that impact came about. This has been noted by other studies as well: Kirkpatrick and Kirkpatrick (2016) write that “because Level four results are critically important to the organization, someone is likely already measuring them.” They continue by stating that the best way to obtain this data is therefore not to

measure it yourself, but to build a relationship with the people who can provide you with the information you need (Kirkpatrick & Kirkpatrick, 2016). This is supported by Markovitz and Magged (2008). In their evaluation Hope II project, one of the

limitations was the decline of cases with every data collection round. “Although the response rate for the comparison group was 100% at baseline, it was only 52% at first follow-up and 42% at second follow-up.” (Markovitz & Magged, 2008, p. 13). It is also in line with Wing (2004) who state that having access to the client organisation after the training has ended is crucial for evaluation. Wing (2004) adds that one should have access to both managers and participants to truly understand both individual and organisational impact. This is supported by Mizner, Klerman and Markovitz (2013), who argue that an evaluation should measure both behavioural changes as well as external [organisational] changes in order to have a complete picture of the impact of a programme. Both access to the organisation in follow-up and access to different groups can be seen as subfactors of the ‘relationship with client organisation’ factor. 2.4.3 Literature on evaluation in the humanitarian sector

The following section discusses the literature on evaluation in the humanitarian sector and the field of capacity building more specifically. The literature on evaluation in humanitarian action developed relatively late in comparison to evaluation literature in other sectors. A reason for this might be that demanding evaluations of humanitarian programmes is a relatively new development. Before the 1990s, humanitarian aid was presumed to be good, because the motivation behind it was good (Harrell-Bond, 1986 as cited in Proudlock, Ramalingam & Sandison, 2009). It was only in the second half of the 1990s that donors began to demand results-based evaluations (Barnett, 2005). Currently, the need for evaluation in the humanitarian sector has become evident. However, despite this change in culture, high-quality evidence on the impact of humanitarian programmes is still scarce (Puri, Aladysheva, Iversen, Ghorpade &

(28)

28 Brükck, 2015). In their IZA discussion paper, Puri et al. (2015) state that when looking through several databases, they found more than 900 evaluations, only 38 of which were impact evaluations (Puri, et al., 2015). The research on evaluation in the

humanitarian sector mainly builds on the health sector and concerns the quality of the evaluations. Overall, the quality of evaluation in the health sector is found to be low (Roberts & Hofmann, 2004; Smith, Roberts, Knight, Gosselin, & Blanchet, 2015; Blanchet et al., 2017).

Though the focus is on assessing the quality of evidence, some researchers also

investigate possible reasons for the low quality in evaluations. In their paper about the quality of impact evaluations in the health cluster, Roberts and Hofmann state

unqualified staff and the attitudes of donors as the main reasons for the low quality of impact evaluations in the health sector (Roberts & Hofmann, 2004). The factor

‘unqualified staff’ is comparable with the factor ‘lack of training of evaluation methods’ in the study by Twitchel, Holton and Trott (2008). The attitude of donors is an

interesting factor that was not found in the literature from other sectors. It can be compared to the managerial incentive factor that Attia, Honeycutt and Attia (2002) found, however there are some slight differences. Managerial incentive is related to support for evaluations within the organisation, whereas the attitudes of donors is related to the support for evaluations from an outside donor. Another new factor unique to the humanitarian field is raised by Wing (2004) who stated that different definitions of impact are impeding evaluations. It is supported by Popescu and Dewan (2009), who mention the largely quantitative definitions of success set by grantee as a limitation in their study.

2.6 Theoretical assumptions and argument

The theoretical assumptions used in this thesis are discussed in this section. The first assumption is that the factors found in other sectors are to some degree transferable to the humanitarian sector. The factors put forward by research in various sectors of industry are a lack of time, the amount of resources, unqualified staff, a lack of

managerial support and methodological issues (Attia, Honeycutt, & Attia, 2002; Moller & Mallin, 1996; Twitchell, Holton, & Trott, 2008). Though the research on these factors

(29)

29 was done in other sectors, this research assumes they will be applicable to the

humanitarian sector as well. Three arguments underpin this assumption. First, the research has been done in a multitude of sectors and has resulted in comparable answers. Based on this, one can assume that another sector has comparable answers as well. Second, no matter the sector, organisations all have certain characteristics like a hierarchy and a common goal that make them comparable. Three, for several of the factors found in previous literature, there is incidental evidence that they are

applicable to the humanitarian sector. For instance, several researchers evaluating capacity building projects have noted that the amount of resources impeded the evaluation (Popescu & Dewan, 2009; Klein et al., 2009; Minzner, Klerman, Markovitz & Fink, 2013). The lack of time is also noted by both Wing (2004), Popescu and Dewan (2009), Sobeck (2008).

The second assumption is that the anecdotal evidence of factors impeding evaluations in the humanitarian sector are applicable throughout the humanitarian sector. An example of such a factor is an unclear definition of impact, mentioned by Wing (2004) and supported by Popescu and Dewan (2009). Other studies have suggested the attitudes of donors towards evaluation as a possible factor (Roberts & Hofmann, 2004). However, none of these studies were of the scale of the studies done outside the humanitarian sector.

Wing (2004) also mentions the need to measure impact at both the organisational and the individual level, which relates to the final assumption. This assumption reads that the conclusions from organisational change literature are also applicable to

organisations in the humanitarian field. This assumption applies mainly to the findings from the applied psychology studies in the field of organisational change, which state that the post-training environment is of crucial importance to the success of a training (Haddock, 2015; Colquitt et al., 2000; Tracey, Tannenbaum & Kavanagh, 1995; Arthur, Bennett, Edens & Bell, 2003). Assuming this is also the case in the humanitarian sector, organisations need to be in contact with client organisations after the event has

finished in order to conduct a good impact assessment. Markovitz and Magged (2008) cited a failure to contact participating organisation in the follow-up phase as a

(30)

30 organisation, Wing (2004) argues that organisations also need to have contact with both managers and individual participants in order to be able to assess the impact on all level. All other factors have been found by asking organisations about their

evaluation process and what is impeding their evaluations. The factor of the relation with the client organisation has not been discussed with organisations in previous research but is the product of research on how and when organisational change is created (Haddock, 2015; Colquitt, LePine & Noe, 2000; Tracey, Tannenbaum & Kavanagh, 1995; Arthur, Bennett, Edens & Bell, 2003; Kirkpatrick & Kirkpatrick, 2016; Markovitz & Magged, 2008; Minzner, Klerman, Markovitz & Fink, 2013). This is why this factor is the main focus of this thesis. It will argue that the relation with the client organisation influences an organisation’s ability to assess the impact of their capacity building programmes. The other factors put forward by the research in this chapter are control variables. These factors are 1) a lack of time (Moller & Mallin, 1996; Twitchell, Holton & Trott, 2008; Popescu & Dewan, 2009; Minzner, Klerman, Markovitz & Fink, 2013; Klein et al., 2009; Sobeck & Agius, 2007; Wing, 2004), 2) a lack of resources or staff (Twitchell, Holton & Trott, 2008; Moller & Mallin, 1996), 3) a lack of managerial support (Sobeck & Agius, 2007; Attia, Honeycutt & Attia, 2002), 4) underqualified staff (Twitchell, Holton & Trott, 2008), 5) attitudes of donors (Roberts & Hofmann, 2004), 6) methodological issues (Attia, Honeycutt & Attia, 2002; Popescu & Dewan, 2009), and 7) unclear definitions of impact (Popescu & Dewan, 2009; Wing, 2004). The next chapter will discuss the methodology and methods of this research.

(31)

31

3. Methodology

This chapter presents the methodological set-up of the research. This chapter will discuss the following: first the research approach and design will be discussed. Then the process of case selection and data collection will be explained and discussed along with their possible limitations. After that, the different options for data analysis will be discussed and the choice for a thematic analysis will be defended. Finally, the data analysis process and its possible limitations will be discussed in more detail.

3.1 Research approach

This thesis uses a qualitative approach and is based on a small number of cases. It uses a comparative design on the factor of the relation with the client organisation. The reason for this design is twofold. First, there is a limited number of available cases, thereby ruling out approaches that need a large number of cases. Second, there is no previous research on this factor in the humanitarian sector, neither has this factor been studied in evaluation literature in the private sector. The comparative design was set up as follows: I) the dependent variable was the ability of the organisation to assess the impact of its programmes; II) the explanatory, or independent, variable was the relation with the client organisation; III) the control variables were all the other factors that might influence an organisation’s ability to assess the impact or their

programmes, namely 1) a lack of time (Moller & Mallin, 1996; Twitchell, Holton & Trott, 2008; Popescu & Dewan, 2009; Minzner, Klerman, Markovitz & Fink, 2013; Klein et al., 2009; Sobeck & Agius, 2007; Wing, 2004), 2) a lack of resources or staff

(Twitchell, Holton & Trott, 2008; Moller & Mallin, 1996), 3) a lack of managerial

support (Sobeck & Agius, 2007; Attia, Honeycutt & Attia, 2002), 4) underqualified staff (Twitchell, Holton & Trott, 2008), 5) attitudes of donors (Roberts & Hofmann, 2004), 6) methodological issues (Attia, Honeycutt & Attia, 2002; Popescu & Dewan, 2009), and 7) unclear definitions of impact (Popescu & Dewan, 2009; Wing, 2004). The expected outcome, or hypothesis, is that the factor ‘relation with the client organisation’ will influence an organisation’s ability to assess the impact of their capacity building activities.

(32)

32

3.2 Operationalisation of the dependent and independent variables

The dependent factor was initially operationalised using post-training programme design. Organisations that had follow-up opportunities were assumed to have a better relation with the client organisation than organisations that did not have any follow-up opportunities. Building on this operationalisation, a survey was made that will be discussed in more detail below. Based on the survey results, the organisations that were chosen for the interviews had indicated to have varying degrees of follow-up activities after the training. However, during the interviews, the post-training programme design of the interviewed organisations turned out to be too similar to make a good comparison. The organisations did differ greatly in terms of their pre-training programme design. This led to a slight change in operationalisation of the factor ‘relation with client organisation’. The final operationalisation was the degree to which the client organisation is involved in the training design and the amount of follow-up opportunities after the training.

The ability of an organisation to assess the impact of their programmes was measured by asking the interviewees about their perception of the impact of the capacity

building programmes and their confidence in the way that this perception is formed. If the employee responsible for MEL activities in the organisation indicated that they had a clear and supported vision of the impact of their programmes, that organisation was assumed to have a good ability to assess the impact of their programmes.

Alternatively, if the employee responsible for MEL activities indicated they did not have a clear and supported vision of the impact, that organisation was assumed to have a low ability to assess the impact of their programmes.

3.4 Case selection

Since the research topic is the impact of capacity building, the people measuring and reporting that impact are a primary source of information. Other sources would include: the clients of the building organisations or donors of the capacity-building organisations. However, capacity-capacity-building organisations seemed the most

(33)

33 accessible group for this research, since they usually advertise their activities online. The first stage of case selection took place online. A list was made of organisations that offer capacity-building activities to the other organisations and individuals in the humanitarian sector. This was done through online searching and through advertised trainings on the Reliefweb website. Reliefweb.int is a website for humanitarian organisations and professionals that provides information as well as a platform for organisations to advertise trainings and job openings. By looking up organisations on the Reliefweb.int training page and only shortlisting English-speaking organisations, some organisations have been excluded. Measures taken to reduce this possible bias included looking for capacity building organisations online and asking around if people happened to know organisations relevant to the topic. This search did not lead to any new organisations, so it can be assumed that the initial data collection via

Reliefweb.int was a good choice. The following selection criteria were used: (1) the organisation is a not-for-profit, non-governmental organisation that is not an educational institution or think tank; (2) the organisation does capacity-building activities for staff of humanitarian organisations; (3) the organisation aims to improve the functioning of the humanitarian sector. Of the 155 organisations, 40 organisations met the first of the criteria, 15 met criteria one and two and 13 organisations met all three criteria. This way of selecting cases is similar to other research done about evaluation of capacity building in other sectors which selected cases by using lists of subscribers to specialist literature or members of professional organisations (Robinson & Robinson, 1989; Carnevale, 1990; Moller & Mallin, 1996). However, the number of available cases for these studies was much higher than the 13 cases that met the criteria for this thesis.

The next step was a survey sent to all 13 organisations. This survey served two purposes: to collect initial data on the organisations and to help in choosing which organisations to include in the interviews. The survey was made using Google forms and is copied in Annex 1. The survey included questions about the organisation’s programme design, monitoring system and evaluation tools. Questions three and four about programme design were used to determine the relation the organisation had with their clients. The questions about monitoring systems and evaluation tools were

(34)

34 used to determine the control variables. Every organisation was emailed with the survey, followed by one round of reminder calls or emails depending on the availability of a phone number. Out of the 13 organisations that were sent a survey, eight

responded with a survey, one did not wish to take part in the research and four did not respond. This results in a response rate of 62%, which is good compared to other studies. Baruch and Holton (2008) did a meta-analysis of 1607 studies about response rates of surveys sent to organisations and found that surveys collecting data on an organisational level had an average response rate of 35.7% with a standard deviation of 18.8%, making the 62% response rate of the survey in this thesis part of the top 8% response rates (Baruch & Holtom, 2008). Based on the results of the survey,

organisations were chosen that had similar monitoring systems and evaluation methodologies. Organisations that have similar monitoring systems and

methodologies, probably have similar amounts of resources, time and people allocated to evaluating. In this way, the interviews could be set up as a most similar case study. The factors that could not be controlled for were managerial support, donor attitude and unclear definitions of impact. Four organisations had similar enough monitoring systems to be interviewed and, according to the survey results, they all varied on the post-training design. One of the organisations, had indicated that they did not wish to take part in an interview. Three interviews were scheduled; however, one organisation withdrew before the interview could take place, leaving two organisations to be interviewed. The unfortunate fact that two out of the four selected organisations did not wish to be interviewed, may have led to a selection bias. Both organisations did not give a reason as to why they did not want to be interviewed, however a possible reason might be that they were not confident in their knowledge of their

organisational impact or their evaluation strategies. 3.4.1 Description of selected organisations

The interviewed organisations named ‘organisation 5’ and ‘organisation 6’ in this research. Though the organisations will remain anonymous, this paragraph will provide a general description of the organisation’s characteristics. Both organisations provide capacity-building activities for individuals and organisations in the humanitarian sector. Both are English-speaking organisations and work internationally. Organisation 6 aims

Impact assessment of capacity building: A comparative case study on the ability of organisations to assess the impact of their programmes