Definitions of terms are crucial for common understanding in any field of human endeavour. The diversity of actors entering and playing on the evaluation field is both an advantage and a disadvantage – including in the fact that we do not have a common language. It is both frustrating and enriching to deal with so many different understandings and definitions of simple concepts – even of something like evaluation itself.
The best example of this is probably the battles that started around 2005 or so about the nature of impact evaluation. I was one of those in the midst of an effort between 2007-2009 that aimed to reduce the negative consequences of randomised control trials (RCTs), and in particular the “impact assessment with RCTs as gold standard” tsunami that hit us through the excellent marketing efforts of among others organisations such as 3ie, J-PAL and IPA.
How definitions shaped an evaluation wave
The effort to counter RCTs as gold standard was called NONIE, the Network of Networks on Impact Evaluation. The NONIE Guidance on Impact Evaluation explains NONIE as movement aimed at highlighting different options for impact evaluation methodology. It was the first concerted effort to bring less militant perspectives to bear on impact evaluation methodology, even though at the time it did not go far enough in recognising all the credible methodologies that we advocated for in what was called “Subgroup 2”. NONIE was also, to some extent at least, eventually captured by specific interests – a good example of ongoing (and understandable) power asymmetries in the global evaluation system.
Organisations such as 3ie, J-PAL and IPA hijacked the definition of impact evaluation by linking it to a specific notion of what constitutes credible evidence and “rigour” in evaluation (statistical rigour) and to a definition that privileged certain methodologies over others by capturing notions of ‘causal inference’ (and counterfactuals) and ‘attribution’, defined in very specific ways. A good example is the definition of impact evaluation by 3ie, which fortunately from the beginning had a somewhat broader definition than J-PAL and IPA. Although the latter grouping has since toned down their initial rhetoric about the hopelessness of any impact evaluation that did not use randomised control trials, this notion still lingers among many.
Sense has been slowly returning to the evaluation funding and commissioning community – encouraged by the recognition in the SDGs that things are interconnected, that changes cannot be isolated from one another, and that pathways to change are seldom linear or simple. Yet some of these ideas remain firmly embedded in the psyche of large swathes of evaluators. Just consider the Wikipedia entry about impact evaluation: “Non-experimental designs are the weakest evaluation design, because to show a causal relationship between intervention and outcomes convincingly, the evaluation must demonstrate that any likely alternate explanations for the outcomes are irrelevant.” (Why don’t we correct such biased statements?).
Many of us accept the important role that RCTs play in the field, but do not want all other designs to be seen as inferior – especially when the application of RCTs for development is very limited for both practical and conceptual reasons. The significant weaknesses in RCTs should also be acknowledged (people usually do not sniff at the views of a Nobel prize winning economist).
As noted, the credibility of RCT designs was to a great extent driven by capture of a definition. So definitions are important. They can move a whole field – and huge amounts of funding – in a different, not always desirable direction.
When I started to delve into definitions for “impact assessment” or “impact evaluation in 2007, I found 34 quite different definitions within 1.5 hours of a google search. This situation has improved somewhat. In development evaluation the DAC Glossary has been widely used. The 3ie glossary for impact evaluation is also useful, although the definitions should not always be taken at face value, given their rather narrow perspective on what constitutes credible evidence.
So this glossary, compiled by the Impact Management Project primarily for use by the private sector, is very welcome indeed, and very useful for all evaluators. It is the first effort to bring coherence to the field, and is seen as a living document that will continue to evolve. Here you can read more about how, by whom and why it was put together. There are many definitions that can be improved, such as the one on Impact Evaluation, but that exactly makes it an effort that should be supported. This is one example where private sector driven engagement with evaluative practice is overtaking conventional evaluation efforts (there are more – to be discussed in a separate post).
It will be especially important for evaluation communities in the Global South to make sure that the definitions capture the subtleties and nuances of their contexts. A concerted effort in this regard will be highly desirable.
Of course, such glossaries do not at all capture the interesting issues and challenges when evaluation terms are translated into, or reflected in indigenous local languages. That requires a blog post all on its own.
Zenda Ofir is an independent South African evaluator at present based near Geneva. She works primarily in Africa and Asia, and advises organisations around the world. She is a former AfrEA President, IOCE and IDEAS Vice-President, AEA Board member, Honorary Professor at Stellenbosch University, Richard von Weizsäcker Fellow, and at present Interim Council Chair of the new International Evaluation Academy.