18 Dec 2018 4 min read Evaluation as Practice

When can we claim 'success'?

I have long thought that evaluators don't define 'success' in meaningful ways. Now highly respected economist Matt Andrews has just released a very useful working paper that confirms how much more carefully we need to think about this important concept. Most of our definitions and measurement of so-called 'success' have serious limitations, and this has serious implications for our practice.

'Success' as we tend to define it has little meaning

As I noted earlier this year in a presentation at CECAN in London and again at the Systems Perspectives on Policy Development and Evaluation Conference organised by LSHTM, what we consider as "success" is a question that should keep us as international evaluation community awake at night.

Meeting project or programme objectives or targets does not mean the intervention has been successful in solving a particular problem. The objectives set might have been inappropriate, the targets insufficient from the start, or negative consequences unaccounted for.

Compliance with a work plan is meaningless if the plan was flawed in the first place

Meeting stakeholder expectations begs the question how 'stakeholders' were defined and their expectations identified - and whether the 'solution' will last.

Getting a range of positive outcomes does not actually count, unless potential or actual negative consequences of actions have been carefully studied and their inhibiting potential understood.

Very often the small percentages of improvement touted by RCT hangers-on as an indication of success cannot really be considered meaningful in the 'bigger picture', yet this is seldom pointed out in publications, which usually focus more on the technical merit than real meaning of such improvements within the bigger picture of development.

Satisfactory ratings for one or a few DAC development evaluation criteria - being relevant, effective, efficient, having some impact and demonstrating aspects of sustainability - does not count for much unless aggregated in a manner that makes a reasonable argument for contribution to development success. This has been one of my major arguments against the development evaluation community's obsession with and blind application of the DAC criteria.

And how meaningful is 'success' in achieving good results in multiple small projects when they do not add up to contribute meaningfully to the positive development trajectory of a country or society?

Even more challenging, given the growing importance of the 2030 Agenda with its SDGs and its emphasis on the need for transformation in the era of the Anthropocene: how do we know that efforts at 'systems change' or 'transformation' have been, or have good potential to be successful?

'Project or programme success' vs 'contributions to development success'

In his paper, Matt points out that public policy initiatives accounted for around US$13 trillion in 2017, yet we seldom know the extent of policy success or failure. Evaluative data and information that span policy portfolios tend to be very limited and are usually not comparable or accessible. And even citizen surveys are not reliable; some studies show that they might reflect "media-influenced frustration with economic and political conditions" rather than experience or evidence-informed views.

In order to get to grips with this question Matt studied 400 papers that evaluated or examined World Bank public policy interventions in multiple countries between 2016 and 2018. He found that failure happens somewhere around 25-50% of the time - depending on how one defines 'success' and 'failure'. In the World Bank the failure rate is 24% if the definition focuses on 'project or product success'. But it is 51% if the definition refers to whether the intervention solved the problem as intended or is likely to produce more ambitious development outcomes or impacts.

So, if the first definition is used, it seems to suggest that public policy failures are not that common; that success is more common than failure, with 'qualified progress' being the most common outcome. But it does not assess whether the intervention made a difference to the government's performance or to the development of the country.

The problem with 'plan and control' organisations

Confirmed by the analysis in this study, Matt argues that 'plan and control' organisations - and nearly all of us work with these - have a significant bias in their evaluations. They are more likely to focus on 'project and product success' than on 'problems solved with development impact'. This is very widespread in the development evaluation community.

I am very pleased that this study highlights the serious fallacy of considering evaluation meaningful even if it looks only at project or programme success in isolation of the surrounding circumstances and how it is embedded in the notion of 'development'. As I have said on many occasions, this is an especially serious problem in the Global South. Such a definition of success might work in societies that are reasonably prosperous, with fairly good societal and economic indicators, and stable institutions. But in countries that have to work hard to improve a variety of societal indicators from a low base - like most of the countries in the Global South - this is not sufficient, and often not even appropriate.

And our 'theories of change' obsession contributes to this problem

Conventional theories of change seldom detail pathways to the 'higher level' or 'longer-term' outcomes. Evaluative assessments are made based on promised early deliverables or relatively easy-to-determine, 'short-term' or 'medium-term' outcomes that are already defined at the planning stage, very often without any clear notion of if or how they will contribute to development outcomes.

I have often pointed this out - although I have also used theories of change without sufficient attention to how they connect the intervention to development outcomes. It is convenient to ignore this technically challenging issue. And theories of change certainly have some value, especially in drawing attention to what can or should be monitored and evaluated on the way to achieving the desirable end result - including contributions to development.

I agree with Matt that it is unlikely that citizens - especially in the Global South - will be pleased with such a superficial treatment of 'success' when governments expend public funds. But in my experience we seldom think about this important issue.

So what is 'success' when dealing with transformative change?

I will write about this important question in a subsequent post. The key point is: if we do not deal sufficiently well with 'success' when evaluating project or programme interventions, how well will we deal with it when helping to design for, or evaluating systems change or transformation?

Perhaps it will be actually be easier. But most certainly we have to keep on grappling with both the technical and political challenges around this issue - one that is very critical for the credibility, utility and value of evaluation as practice and as profession, especially in the Global South.