Evaluation for Transformation 3: What do you count as “success”?

** 6 min read **

I have long thought that evaluators don’t define ‘success’ in meaningful ways. Now highly respected economist Matt Andrews has just released a very useful working paper that confirms how much more carefully we need to think about this important concept. Most of our definitions and measurement of so-called ‘success’ have serious limitations, and this has serious implications for our practice.

‘Success’ as we tend to define it has little meaning

As I noted earlier this year in a presentation at CECAN in London and again at the Systems Perspectives on Policy Development and Evaluation Conference organised by LSHTM, what we consider as “success” is a question that should keep us as international evaluation community awake at night.

For example –

Meeting project or programme objectives or targets does not mean the intervention has been successful in solving a particular problem. The objectives set might have been inappropriate, the targets insufficient from the start, or negative consequences unaccounted for.

Compliance with a work plan is meaningless if the plan was flawed in the first place.

Meeting stakeholder expectations begs the question how ‘stakeholders’ were defined and their expectations identified – and whether the ‘solution’ will last.

Getting a range of positive outcomes does not actually count, unless potential or actual negative consequences of actions have been carefully studied and their inhibiting potential understood.

Very often the small percentages of improvement touted by RCT hangers-on as an indication of success cannot really be considered meaningful in the ‘bigger picture’, yet this is seldom pointed out in publications, which usually focus more on the technical merit than real meaning of such improvements within the bigger picture of development.

Satisfactory ratings for one or a few DAC development evaluation criteria – being relevant, effective, efficient, having some impact and demonstrating aspects of sustainability – does not count for much unless aggregated in a manner that makes a reasonable argument for contribution to development success. This has been one of my major arguments against the development evaluation community’s obsession with and blind application of the DAC criteria.


And how meaningful is ‘success’ in achieving good results in multiple small projects when they do not add up to contribute meaningfully to the positive development trajectory of a country or society?

Even more challenging, given the growing importance of the 2030 Agenda with its SDGs and its emphasis on the need for transformation in the era of the Anthropocene: how do we know that efforts at ‘systems change’ or ‘transformation’ have been, or have good potential to be successful?

‘Project or programme success’ vs ‘contributions to development success’

In his paper, Matt points out that public policy initiatives accounted for around US$13 trillion in 2017, yet we seldom know the extent of policy success or failure. Evaluative data and information that span policy portfolios tend to be very limited and are usually not comparable or accessible. And even citizen surveys are not reliable; some studies show that they might reflect “media-influenced frustration with economic and political conditions” rather than experience or evidence-informed views.

In order to get to grips with this question Matt studied 400 papers that evaluated or examined World Bank public policy interventions in multiple countries between 2016 and 2018. He found that failure happens somewhere around 25-50% of the time – depending on how one defines ‘success’ and ‘failure’. In the World Bank the failure rate is 24% if the definition focuses on ‘project or product success’. But it is 51% if the definition refers to whether the intervention solved the problem as intended or is likely to produce more ambitious development outcomes or impacts.

So, if the first definition is used, it seems to suggest that public policy failures are not that common; that success is more common than failure, with ‘qualified progress’ being the most common outcome. But it does not assess whether the intervention made a difference to the government’s performance or to the development of the country.

The problem with ‘plan and control’ organisations

Confirmed by the analysis in this study, Matt argues that ‘plan and control’ organisations – and nearly all of us work with these – have a significant bias in their evaluations. They are more likely to focus on ‘project and product success’ than on ‘problems solved with development impact’. This is very widespread in the development evaluation community.

I am very pleased that this study highlights the serious fallacy of considering evaluation meaningful even if it looks only at project or programme success in isolation of the surrounding circumstances and how it is embedded in the notion of ‘development’. As I have said on many occasions, this is an especially serious problem in the Global South. Such a definition of success might work in societies that are reasonably prosperous, with fairly good societal and economic indicators, and stable institutions. But in countries that have to work hard to improve a variety of societal indicators from a low base – like most of the countries in the Global South – this is not sufficient, and often not even appropriate.

And our ‘theories of change’ obsession contributes to this problem

Conventional theories of change seldom detail pathways to the ‘higher level’ or ‘longer-term’ outcomes. Evaluative assessments are made based on promised early deliverables or relatively easy-to-determine, ‘short-term’ or ‘medium-term’ outcomes that are already defined at the planning stage, very often without any clear notion of if or how they will contribute to development outcomes.

I have often pointed this out – although I have also used theories of change without sufficient attention to how they connect the intervention to development outcomes. It is convenient to ignore this technically challenging issue. And theories of change certainly have some value, especially in drawing attention to what can or should be monitored and evaluated on the way to achieving the desirable end result – including contributions to development.

In order to address this problem, the World Bank uses a ‘Risk to Development Outcomes’ (RDO) rating to indicate whether an intervention is likely to succeed or fail in getting or contributing to the desirable end (‘higher level’) outcomes. But Matt notes that like most organisations, when reflecting and reporting on success and failure, the Bank tends to quote the lower figure without taking the risk of (eventual) failure to achieve essential outcomes into account. So we end up with ‘project success’ or at best ‘programme (portfolio) success’, quite impervious of the big picture – the need for ‘development success’.

I agree with Matt that it is unlikely that citizens – especially in the Global South – will be pleased with such a superficial treatment of ‘success’ when governments expend public funds. But in my experience we seldom think about this important issue.

So what is ‘success’ when dealing with transformative change?

I will write about this important question in a subsequent post. The key point is: if we do not deal sufficiently well with ‘success’ when evaluating project or programme interventions, how well will we deal with it when helping to design for, or evaluating systems change or transformation?

Perhaps it will be actually be easier. But most certainly we have to keep on grappling with both the technical and political challenges around this issue – one that is very critical for the credibility, utility and value of evaluation as practice and as profession, especially in the Global South.

Share this article

Zenda Ofir

Zenda Ofir is an independent South African evaluator. Based near Geneva, she works across Africa and Asia. A former AfrEA President, IOCE Vice-President and AEA Board member, she is at present IDEAS Vice-President, Lead Steward of the SDG Transformations Forum A&E Working Group and Honorary Professor at Stellenbosch University.


  1. Adopting the Logical Framework method will help to evaluate the success of a project or program on it’s beneficiaries. The Logical Framework will help to monitor the realization of the benefits of the project or program to the beneficiaries or host community.

  2. Syl, thank you for reminding us that we need to use tools at our disposal. I must confess I am not a big fan of LFA/logframe, which has been in use since the 70s. It has many disadvantages, and one of it is that it misleads one about what can be seen as “success”. For detail see https://www.fasid.or.jp/_files/publication/oda_21/h21-3.pdf which is a very good article about the negatives of the LFA approach. Since around 2000 the LFA idea has been expanded to ‘theories of change’, which provide for more thoughtful planning and evaluation. But if you have a systems perspective of how things work – which we all should have – you will recognise that both these methods have very significant shortcomings that can be very misleading. Among others, we cannot predict the sequence and pace at which outcomes will emerge, so it is unwise to hold people accountable for logframe-determined outcomes without understanding why and how things are working (or not), for whom, when, etc.; we have to be clear on negative consequences of actions that might actually neutralise/eliminate the benefits, yet LFA does not account for that, and so on. The superficial use, and also misuse of LFA has led to tremendous problems in trying to get successful development interventions. Perhaps you can look at Michael Quinn Patton’s “Developmental Evaluation” for more information on methods and aspects that we need to consider beyond logframes or theories of change.

  3. I fully agree that development success is hard to pin down and that project level ratings (meeting major relevant goals efficiently) only tell part of the development story but Matt Andrews’ working paper does not do justice to the complex issues involved in assessing development impact. He proposes to measure project success by combining two ratings: the extent to which projects meet their relevant objectives efficiently and the extent to which they involve low risks to development. This is a dubious proposition: development is a risky business and more often than not high risks should be incurred to generate high rewards (as Silicon Valley start ups know only too well). Nor does Matt Andrews recognize that project level ratings are useful building blocks for higher level evaluations (country level, thematic level, blue marble level) that should be carried out to guide development policy. As I see it, evaluation should be carried out from different perspectives, at various levels and at various times to get a balanced picture of what success looks like. Rather than Matt Andrews I recommend Development Projects Observed by Albert O. Hirschman as a development evaluation guide book.

    • Bob, I fully agree with your very good point. I was more interested in Matt’s critique and pointing out of flaws in how success is defined, than in his proposed solution. As you rightly point out, there is much more to say about this important topic. For me the critical issue is that we should be more circumspect about the whole matter of defining success, and more mature in how we apply this in practice. But as always, technically more difficult than current practice in many organisations. And thank you for the Hirschman reference. His work remains very relevant today.

  4. Your plea for caution in interpreting performance ratings is well taken but, as I see it, Matt set up a straw man and demolished it. No IEG evaluator believes that project completion ratings measure development impact. The working paper is highly selective in the use of IEG indicators, e.g. it ignores sustainability ratings and it fails to note that IEG produces a wide range of evaluation reports addressing World Bank performance at the higher plane of country and thematic policy. Worse than this, in search for a justification for its central and dubious message (a majority of World Bank development interventions fail) the working paper concludes by putting forward a composite rating of intervention failure that is even more misleading than the ratings the working paper criticizes. Ironically, if this measure gained traction, it would likely have a chilling effect on creativity and innovation, the very opposite of what the working paper intended to achieve. I only wish you had not lavished praise on Matt’s working paper to make your entirely valid point..

Leave a Reply

Your email address will not be published. Required fields are marked *