Updating the DAC Evaluation Criteria, Part 5. Non-negotiable criteria

It is ironic that evaluation has been unable to prevent – or even just sufficiently critique – some of the most damaging policies and strategies that have originated in both economically rich and poor countries. What should evaluators have done to prevent at least some of the destruction wreaked upon countries in the 1980s and 1990s by the World Bank and IMF-driven Structural Adjustment Programmes (SAPs)? Would we today have done better than the largely ineffective monitoring systems that failed to provide analyses and syntheses that reflected reality? Would we have been able to foresee the dire negative consequences of the Green Revolution (see here, here, here and here) in Asia? Are we about to repeat them? If evaluators are unable to prevent such disasters from happening, what good is the work that we do? How can similar situations be prevented in future?

These are some of the questions that should keep us awake at night.

Much depends on the evaluands. Robert Picciotto and others have been arguing for years that we should be in a position to evaluate the effects on development of macro issues such as trade policies, financial architectures and flows, foreign direct investment, global value chains, and so on – the “transmission belts of globalisation”. We have little power in these domains. There is too often a lack of resources or political will to attend to those issues that have a major influence on the development trajectory of a country.

But much also depends on our evaluation questions and criteria, and on the flexibility and nuance with which we are allowed to apply them. So, what should change if we want our criteria to support development more effectively?

Our criteria depend on what we consider to be ‘Development’ in the SDG era

The basic tenet of my argument around a revised set of DAC criteria is that they must ENSURE that we evaluate FOR development that can sustain, rather than just help us to ‘evaluate development’. See my previous post in this regard, as well as my article in Evaluation Matters of the African Development Bank, where I first wrote about the DAC criteria. So our evaluation criteria should compel us to attend to those critical aspects that are supposed to enable or support positive development trajectories at national or regional level.

This means that we should at the very least consider the following:

The values, ideologies and/or models within which we evaluate contributions to ‘development’. Development is a highly contested concept, with vastly different ideas about how it can be achieved. Understandably, evaluators usually do not get entangled in evaluating a specific development model or ideology (examples can be found in a 2013 book chapter by AK Shiva Kumar and myself where we refer to the human development, human rights and human security approaches to development). But when we evaluate (portfolios of) projects, programmes or single sector policies, we almost always fail to make explicit the values, ideologies and models about development that frame what we evaluate, how we assess trade-offs, and arrive at summative judgements.

‘Development effectiveness’, not only ‘intervention effectiveness’. Development effectiveness is usually assessed at country or regional level. The proliferation of global development related indices (and now also the SDG reporting modalities) compels comparison and competition between countries and regions. But not every investment labelled a ‘development intervention’ automatically contributes to development from a national perspective, even if successfully executed – especially when unrealistic assumptions are made at the ‘higher levels’ of theories of change, with vague statements about the linkages between the intervention objectives and societal or ecosystem impacts. As a result, we seldom focus on the development trajectory of a country or region.

The risk of ‘ersatz’ development. Eminent economist Ha-Joon Chang refers to ‘ersatz’ development as development based upon uncoordinated interventions that do not build on synergies or enable system coherence that facilitates change at a macro level. Such interventions tend to attend to the level of ‘community’ or ‘pilots’, without scaling, and strengthen the illusion that “every bit helps”. This very common problem can be propagated by aid or philanthropic agencies, the private sector or a government. It is rife where governments are dysfunctional, fragile or lacking in confidence or expertise, or where a funder has little interest in engaging with integrated development policies and plans. A ‘bottom-up’, micro perspective of development tends to prevail, and so do debates about the much-lamented and much-debated micro-macro disconnect, see here and here, for example.

The risk of ‘development without development’. Development without development’ means that interventions focus on enabling conditions such as poverty reduction, individual betterment or meeting basic needs, without a vision of how the country can and will sustain positive economic, socio-cultural and environmental development trajectories in the long term. A positive development trajectory at national or regional level demands economic advancement, where financial flows from within and outside a country are large and sustained enough to fund development activities across sectors in an integrated manner over a prolonged period. In the SDG era it It also demands the integration of economic, socio-cultural, environmental and political aspects.

The most successful countries in the Global South over the past few decades have tended to follow top-down, integrated strategies complemented by freedoms and incentives that allowed for bottom-up innovation and improvisation. Such efforts have been led by effective government policies and strategy execution in ways that (i) release the energy in society to do more, enabling positive impacts to ripple in unexpected and/or sustained ways, and (ii) generate the resources necessary for the country or region to continue to perform and succeed in a sustained manner.

The implications for our criteria of how we define ‘Development’

Cognisant of the four issues noted above, our evaluation questions and criteria should force us to

  • focus on summative judgments about contributions to development effectiveness, not only on intervention effectiveness;
  • ensure that we assess the significance of the evaluand at a particular time with reference to the development trajectory of a country or region;
  • consider issues of synergy, complementarity and coherence (or alignment and harmonisation), and evaluate for scaling, where appropriate, in order to help ensure that any micro-macro disconnect is minimised;
  • have a strong focus on the sustainability of positive development impacts, and the role of dominant resource flows in enabling this (or not).

Our criteria also depend on Development as Complex Adaptive System (CAS) 

When we consider the implications of viewing and evaluating development from a complex adaptive systems perspective, we need to consider the implications for our profession and practice of the following defining concepts of complexity:

Interconnectedness leads to interdependence between the elements and dimensions of a system, and gives rise to dynamic, complex behaviours. Change is discontinuous (periods of stability alternated with periods of change) – influenced by the agents in the system, their ‘rules of behavior’ and the types of relationships between them.

Emergence: The dynamic networks of interactions between components, many of which cannot be predicted, can lead to a whole (system) that emerges at a ‘higher order’ that is greater than the sum of the parts. The order is emergent, not predetermined or controlled.

Self-organisation, adaptation: The system is never static. Individual and collective behaviours self-organise as things change, adapting to the changing environment influenced by history and feedback in order to increase survivability and find the ‘best fit’ with the environment. Some overall order arises from local interactions between parts of an initially disordered system; macro-scale patterns of behaviour can therefore arise spontaneously, amplified by positive feedback.

Non-linearity: The relations between the system and its environment is non-linear, and has multiple feedback loops. It is highly sensitive to initial conditions; and small changes in inputs, physical interactions or stimuli can cause very significant changes; vice-versa, a huge upset to the system may not change it in significant ways. Reinforcing feedback loops can lead to tipping points (when thresholds are overcome) and transformational change (when the system moves from one stable state to another).

Path-dependence: A set of decisions for any given circumstances is limited by the decisions made in the past, even though past circumstances might no longer be relevant. Initial conditions can also leave a persistent mark (imprint) on a society or organisation, shaping behaviours and outcomes in the long run even when contexts change.

Co-evolution: Dynamic systems adapt and evolve with a changing environment. They co-evolve with other related systems in an ecosystem (and should not be seen as always adapting to a ‘changing environment’), and across multiple levels or scales.

The implications for our criteria of viewing Development as CAS

First, an insufficiently integrated perspective will inevitably lead to oversimplification of situations and changes, and incorrect assumptions about how transformational change and development are likely to happen, or development trajectories will evolve. Development interventions cannot be considered in isolation of one another and of the ecosystem around them. Economic, environmental, socio-cultural and political aspects can also not be treated as separate entities, as the philosophy underlying the SDGs clearly confirms (although the 2030 Agenda underplays the importance of culture).

Second, development dynamics and outcomes are often but not always unpredictable. Historical contexts and experiences, and societal cultures that have co-evolved over time can and should provide insights into what might happen, as such co-evolution shapes the psyche of a society. But demanding predictability through rigidly kept logframes and results-based management (RBM) as currently practiced will stymie development.

Third, understanding at least to some extent what might be essential and/or sufficient initial conditions or preconditions towards desired changes will assist with development planning and implementation, but cannot be taken as a certainty. Adaptive management and agile responses to co-evolving systems are essential, demanding the freedom and incentives to innovate and improvise, and embeddedness in local understandings.

Fourth, desired changes in a certain direction will not happen automatically through interventions, and possibly not at all, unless appropriate, culture-sensitive new ideas and incentives are in place, and obstacles to progress at local level are removed. Furthermore, if development is to be effective and accelerated, concepts such as feedback loops, synergistic effects, harmonisation, catalytic change, tipping points, sustainability and transformational change should be understood from both theoretical and practical perspectives, and designs and implementation strategies, as well as their evaluations, done with these in mind.

A non-negotiable set of evaluation criteria for development

All these implications direct us to a set of criteria that should be seen as non-negotiable during the SDG era – including the nuances in their descriptions – if we are serious about evaluating for development that will sustain. Even though assessment based on some of these criteria might be challenging, they have to be considered.

Alternatively, if we find that within the normal resources and operations of evaluators it is impossible to attend to one or more of these, we have to consider the implications for our profession and practice – in particular, for those of us who purport to fund, commission and do evaluations in support of ‘development’.

COHERENCE (& INTEGRATION). Its description and rubrics will be nuanced to focus assessments on the extent of (i) integration during planning and execution of development efforts (of whatever is deemed important, such as disciplinary approaches, models, and in particular given the need for sustainable development, also relevant environmental aspects; (ii) complementarity and synergy – whether synergistic effects have been planned for or achieved – this is where for example policies, interventions and/or cross-cutting principles work together to ensure acceleration or amplification of development impacts; (iii) harmonisation (or alignment) – whether relevant policies, interventions, principles, etc. have been aligned or at least do not oppose or obstruct one another towards development objectives at national or regional level; and (iv) coordination between efforts, wherever relevant. This will demand intensive engagement with influencing factors. Two excellent recent ICSU reports here and here highlight ways to work with these concepts, with interesting examples relating to the SDGs. This approach can be expanded to cover other types of work.

SIGNIFICANCE. Its description and rubrics will be nuanced to focus assessments beyond ‘Relevance’, on what is actually ‘Significant’ from a development perspective. Rubrics will make explicit the values and reasoning around the definition of “Significance’, but will include issues such as (i) the scope (breadth, depth, coverage) of the effort; (ii) its relevance and timeliness at a particular point or over a specified period of the evolution of the effort given the development trajectory of the country or region; and (iii) the extent to which deeper layers of causes (what we used to call “root causes”), catalytic action, bottlenecks, tipping points and/or transformational change have been considered and/or addressed during planning, or have been achieved – and contributions made to that – during or after execution.

RESPONSIVENESS & IMPROVISATION (can also be defined as ADAPTABILITY & ADAPTATION). Its description and rubrics will be nuanced to focus attention on the need to be flexible and agile, as well as innovative and able to improvise in (i) responding to context and changing circumstances, including the co-evolution of context and culture; and in (ii) anticipation of risk, including dealing with destructive power asymmetries and other important negative influences on success.

IMPACT & SUSTAINABILITY. Its description and rubrics will be nuanced to focus attention on the need to (i) connect impact and sustainability, whether during the planning or execution phases, or afterwards – and hence include ‘Impact sustainability’; (ii) examine with vigour any (potentially) neutralising effects that can emerge from foreseen or unforeseen negative consequences or impact ripples (the ICSU approach described here and here can again be very helpful); and (iii) Ecological sustainability, to reinforce the importance of environmental considerations for sustainable development.

DEVELOPMENT EFFECTIVENESS. Its description and rubrics will be nuanced to ensure attention to summative judgments that (i) pull together the assessments based on individual criteria, and (ii) consider, as relevant, intervention as well as development models and trajectories at national and/or regional levels. We should be challenged to engage – at least to some extent – with the sum of what we have assessed, and what it means for a country or region at different stages of its development within its chosen development path (or lack thereof). This will challenge us to clarify and make explicit the values and beliefs about ‘development’ that underpin our evaluative judgments.

Finally ….

Each description and use of the set of criteria has to ensure that some attention is shifted back to the merit of the design and execution approaches, and to preconditions that are essential and sufficient for development success. An obsession with ‘Impact’ in isolation of everything else can lead to very wrong conclusions about the extent to which certain efforts have supported development that is likely to sustain.

Thinking about these matters, and about this ambitious framing of potential criteria for our use, should also challenge us to consider the limits of the value we are in a position to add as evaluators to countries’ and regions’ development efforts. And whether we should do more to ensure that we add more value than we have done to date.

Share this article

Zenda Ofir

Zenda Ofir is an independent South African evaluator. Based near Geneva, she works across Africa and Asia. A former AfrEA President, IOCE Vice-President and AEA Board member, she is at present IDEAS Vice-President, Lead Steward of the SDG Transformations Forum A&E Working Group and Honorary Professor at Stellenbosch University.

28 Comments

  1. Zenda
    I stand by the view that the DAC development effectiveness criteria have had a largely positive effect on evaluation practice. It did help development evaluation reach out beyond input and output thinking. Nothing in the criteria discourages fulsome stakeholder engagement in the evaluative process. This said, I agree that coherence (already acknowledged by humanitarian evaluators as critically important) should at long last be included in the DAC criteria and that a summative assessment that integrates judgments across criteria (as recommended by Michael Scriven) should be actively promoted. But to define or re-define development as part of the proposed revision of the DAC criteria seems a bridge too far. Nor should it be necessary given the legitimacy and universality of the SDGs. And while I fully agree that addressing complexity, emergence, etc. is critical for high quality evaluation in the contemporary social world, systems thinking has more to do with how to carry out evaluations (i.e. apply agreed criteria) than with the criteria themselves. Thus significance, responsiveness and adaptation are implicit in the existing criteria. As I see it adding too many bells and whistles to the DAC criteria may do more harm than good.

    • Bob
      I agree with you about the value of the DAC criteria, as I have pointed out in Part 1 in the series. It is always challenging to add some complexity to any existing practice. But here I believe we should move out of our comfort zone. As you note, the DAC criteria helped us to move beyond input/output thinking. Now, two decades later, it is time for a next stage in our evolution – unless we are at too fragile a stage to cope with renewal. You indicate that significance, responsiveness and adaptation are implicit in the current set. I am not quite sure of this, and we definitely do not attend to these important issues. Is it then only a matter of making them explicit in the current set? Not sufficient, I believe. We also need to understand why we use any criterion. In a development environment we cannot just go with what stakeholders want, and using development – as confirmed through the SDGs, and especially the non-ideological elements – as a basis is essential in my view. But the most important is to have this type of conversation now. So thank you for thoughtful counter-arguments. I am interested in whether others will agree, and look forward to alternative perspectives.

  2. Zenda
    I applaud your initiative. A global conversation about evaluation criteria may indeed be timely but I would urge restraint. As for all prescriptive rules diminishing returns (and eventually declining returns) inevitably result from robust efforts to make quality criteria ever more comprehensive. To quote or perhaps misquote Einstein, criteria “should be as simple as possible but not simpler”.
    Bob

  3. This is a valuable blog. The DAC evaluation criteria have been a significant innovation and it is worthwhile to engage in a discussion about them. Several important points have been made in Zenda’s blog, and also by Bob Picciotto in his comment as well as by Caroline Heider and Hans Lundgren in the 2017 IEG blog. What may be added is that In times in which populism and the critique of globalization have gained momentum, evaluation criteria can play a role both in assessing the actual effects of development interventions and in directing attention to important aspects that should be taken into account in the design of interventions. The signalling function of the criteria is not always considered but it can make a contribution to the design of better policies, programs and projects. Sustainability (in its different dimensions, as elaborated by IFAD) is an evaluation criterion that may help to partially mitigate the risk of designing populist unsustainable development interventions and to assess them, while “equity” is an evaluation criterion that directs attention to the distribution of benefits, which is a crucial issue in the context of the critique of globalization. Furthermore, rather than arguing about “development without development” it may be more fruitful to refer to “growth without development” (it can be argued that growth is necessary but not sufficient for development)- Finally, the “core criteria” terminology used by the Evaluation Cooperation Group may be more appropriate than “non-negotiable” criteria.

    • Osvaldo
      I agree with the valuable points you raise. I like your use of the term the “signalling function” of the criteria. This is exactly why I believe our criteria should be solidly grounded in both the technical and ideological aspects of development – and why it is not enough just to depend on stakeholder views of what should be evaluated. I also like your references to “growth without development” and “core criteria”. I emphasised “non-negotiable” as counterpoint to the “flexible” sets in Part 4 of the series.

  4. The common thing is applicable to this discussion- without a good set of evaluation criteria, chances are that evaluators will not look for the right things when conducting assessments of development interventions. Therefore, getting those criteria right, is the first key step to help push the field forward. The five evaluation criteria are the most prominent and widely adopted criteria used for aid evaluation by most bilateral and multilateral donor agencies, as well as international nongovernmental organizations (INGOs). However, critiques of the quality of development aid evaluation are still abundant. Thus, it is reasonable to question how those criteria can be improved. Given the importance and level of influence of the DAC criteria in the development world, it is appropriate to submit them to independent scrutiny. Three sensible questions to orient a reflection on the five criteria include:
    a. Are they sufficient to provide a sound assessment of the quality, value, and significance of an aid intervention?
    b. Are they necessary? and
    c. Are they equally important?

    The five OECD/DAC evaluation criteria have been an important step forward to make the evaluation of aid interventions more comprehensive. However, there are some key issues related to focus (the need to refocus relevance and effectiveness on needs of potential beneficiaries and not on funders’ and/or governments’ priorities), omissions (need to include quality of process and exportability as part of the criteria) and important determination (need to establish bars for some key criteria) that should be addressed so the DAC criteria can, once again, lead the aid evaluation field to a more advanced position.
    Finally, the current definition of the five criteria implies that they all have the same level of importance. A reasonable question to ask is whether the criteria should have different weights in determining the overall assessment of an intervention. For instance, should the impact produced by a project receive higher weight in comparison to the other criteria in the overall summative assessment about that project

    • Bhabatosh
      Thank you for confirming that it is now important to review the DAC criteria. As you indicate, we acknowledge their good contributions – as I also have done in Part 1 of this series of posts. But they have been too mechanistically applied, and we have become lazy in our thinking about evaluation criteria. It is time for a shake-up. Your important point about allocating different weights to the criteria refers to their implementation rather than their conceptualisation, as this will be context-dependent. Many organisations do not ensure summative judgments with consideration of the importance of each criterion under specific circumstances. This is a major weakness in our practice.

  5. Compliments for all the above views about the commonly used DAC criteria in evaluations. While all the components in the said criteria are important, the cultural aspects of the stakeholders should be highly considered. This is because some ingredients of the criterion used in evaluation may not necessarily be compatible to all the cultures within which the evaluation is conducted. This element greatly affects relevance of the evaluation commissioned/ conducted.

    • Mayie
      A few agencies have included cultural responsiveness under the Relevance criterion. In my opinion this has not been effective – this important aspect in any case tends to be neglected. It should be much more explicit to get evaluation commissioners and evaluators to attend to it.

  6. I’ve lots of things to say about this excellent initiative and related discussions. And I’ll return to this discussion from time to time to add them. But initially I want to make an observation that nobody has so far hinted at. One of the negative consequences of the DAC criteria is that they have, in many cases, removed the responsibility and – in my observation – ability of many evaluators in the ‘development’ space to be able to develop, understand and work within ‘criteria’. The irony of this is breathtaking, for what is left for evaluators to do (rather than researchers, development practitioners, funders ….) if we are not helping stakeholders to reach some form of judgment or appreciation of an intervention or situation? And every judgment is based on criteria – implicit or explicit. Without a fundamental understanding of criteria, expertise in their establishment and a full appreciation of the ethical responsibility of working with them, evaluation will largely remain a craft rather than the profession it aspires to be and often mistakenly believes it is. We have outsourced the very thing that defines and distinguishes us.

    • Bob
      You raise a very fundamental issue. This is a key reason for my insistence that we figure out exactly why we select the criteria that we do use. Even if we again establish a “core” set we should continue questioning its value and merit in every context. We need to have some reasoning behind our choice of criteria instead of the mechanistic application of a standard set that then directs the evaluation questions. And many evaluations do not even have judgments per criterion; summative judgments across criteria are even less frequent, and we end up with descriptions rather than assessments. It is no wonder that many “evaluators” think what we do is no more than research – and in many cases that is indeed all that is done. On the other hand, as Bob P has said in a comment, much of what is wrong is a matter of implementation rather than conceptualisation.

  7. Dear Zenda, thank you so much for the five valuable blog posts on rethinking the DAC criteria. I agree with you that, two decades later, it is a good time to update the DAC criteria to reflect the current agenda of development. Revising the evaluation criteria is likely to involve a lot of difficult work but it’s worth it. And, I am pleased to hear that DAC criteria revision process will be launched sometime. In my view, for updating DAC criteria, a good process to bring different perspectives together maybe more important than others at the early stage. Such process would be more inclusive in opening up to wide-ranging consultation, especially the voice of Non-DAC members should be taken in to consideration. The DAC criteria have been set by the DAC members but are also used to assess the initiatives in Non-DAC countries, which differ from that of the traditional donor-recipient modality. Hopefully, in DAC criteria revision process, the stakeholder group could include a certain number of non-DAC representatives, instead of only those within the DAC Evaluation Network.
    I like your consideration for making a distinction between “non-negotiable” set and the “flexible” set of evaluation criteria. But I agree with Osvaldo that the “core criteria” may be more appropriate than “non-negotiable” criteria.
    Additionally, I would like to discuss with you about the relationship between criteria, Norms & Standards and evaluation principle. From my observation, for national evaluation capacity development, there are many things we need to work on, and the criteria are only one of them. ECD goes beyond the criteria. China as an example, for national M&E system, it is an essential efforts to build consensus around evaluation principles (Independence, Credibility and Utility), evaluation policy and Norms & Standards. The efforts on evaluation criteria are the second level.

    • Zhaoying
      Thank you for raising the very important issue of a credible review process that will ensure that non-DAC voices are equally solicited and heard. This will be crucial – much has changed since 1991. Such a process will also require responsibility and commitment to good process and useful input by both DAC and non-DAC actors.
      You also raise a second very important issue. I find that there is a lot of confusion around criteria, norms, standards and principles – and as you note, all of these are very important parts of our global evaluation system and movement towards becoming a full-fledged profession. We also have to be clearer about the extent to which some or all of these can be – or have to be – generic rather than context-specific as determined by each organisation or grouping. So much is driven by UNEG, ECG and OECD DAC, which is to their credit and necessary, but I would like to see more South-South and Triangular Cooperation to make sure we strike the right balance. As you rightly note, there is still so much to do on these very important matters.

  8. A very good discussion — and important. Thanks for initiating it, Zenda. Like I’ve suggested earlier, I believe it’s worth taking a critical look at the criteria, although the DAC criteria have served a very helpful purpose in focusing evaluation and making sure certain dimensions are covered. As others have noted, the problem has often been in a mechanistic way of interpreting and using these criteria, which has led to a reductionist view and at looking at an intervention in isolation. Therefore, I think it is very important that we think about ‘relevance’, ‘impact’ and ‘sustainability’ as a package, dimensions that are interrelated and where one cannot really exist without the two other.
    Another problem that needs to be addressed head-on relates to unexpected consequences, which evaluations using logic models too often miss. Especially in CAS context, there are bound to be unexpected consequences, both negative and positive, and evaluation must be very alert to them.
    Finally, and very close to my heart, I think one important non-negotiable pertains to the environmental pillar of sustainable development. Strangely to me, the environmental dimension often gets left behind (perhaps because so many development evaluators are social scientists?). But every evaluation should consider the (often unintended or un-thought-of) impacts, which every development intervention will have.

    • Juha
      Thanks for these very valid and useful points. I connected in my posts Impact and Sustainability – which I believe is essential to do – but not Relevance. I think the latter should be replaced by Significance, unless we manage to define and forcefully apply Relevance in much more nuanced ways. I completely agree with your point about unintended consequences. Its importance demands that it should be a requirement to assess systematically in all evaluations – at any stage, and of any type. At the very least it should be made much more explicit than it has been to date under Impact and also under Sustainability, as it is very relevant for both. I too care deeply about the environmental dimension. I accommodated it in the suggested approach as Ecological Sustainability, explicitly noted as sub-criterion of Sustainability but can also be separate to give it more prominence. It could be part of what I suggest is a flexible “Norms” set, but I preferred to include it in the “Core” set. It should also be made explicit under Coherence where integration is a sub-criterion. The critical point is that we need appropriately nuanced definitions (in terms of sub-criteria), where key issues are explicit, and these nuances must be applied, not neglected as is so often the case today.

  9. I have been reading the posts on the DAC criteria with great interest! as well as the discussions and it is indeed a very crucial development at this point in time. Thank you Zenda.
    Being immersed in this field in a practical way, I have been working in the South and dealing with stakeholders from the North and then being able to touch base with thinking (evaluation) from Asia and Latin America. Especially experiencing the “follow the dots approach” as apposed to looking in general at the situation in a more creative or innovative manner based on its unique merit. So yes, renewal and or change would be excellent.
    When it comes to the policies and frameworks of the IFM and World Bank as well as the Green Revolution I am of the opinion that they have more than just “development” as a core agenda and therefore any major changes to those will perhaps be a battle too far however putting forth new and improved suggestions to their frameworks should remain important. Especially providing clear evidence of negative impacts based on rigorous evaluations and empowering countries to make improved policy decisions. Even more important is to empower the communities and other organisations within or across countries or even regional organisations such as SADC, ECOWAS, EAC etc and ALL stakeholders as to the findings and especially the implications of the negative affects. IMF & WB feedback on the negative findings should be much more highlighted in the press and more broadly communicated as well as debated.
    I agree that the DAC criteria was indeed a great benchmark in its development for those in the North and South. From my perspective in terms of the South, it became one of the must haves if as an organisation was to obtain international funding and even recognition. It played a huge role in that “era”, however now it is perhaps time to bring in more robust changes about the global recognition of the true value, understanding and worth of the South, especially in the light of innovation, changes and developments. More specifically by influencing evaluators in being more innovative in their application of evaluation and not just following the criteria without seeing the broader view. I am particularly looking at part 5.
    Perhaps in a way I am thinking in line with the following developments and tracing the footsteps of the BRICS. Do we see a need in evaluation of such a radical transformation for the South or the globe in terms of Evaluation criteria?
    My question is the following: Is there space for the development of a basic set of policies and or frameworks inclusive of criteria developed broadly for the South by the South with input from North (incorporating DAC Criteria) and given equal standing as the DAC criteria? I do not punt to discard the great work of the DAC and its criteria but propose a choice (more of an evolution perhaps revolution) in development. This then to take cognizances of issues highlights such as “core or non neg” criteria (As reflected in your part 3 – CAS, Org/SC/Global norms/mandates, Other stakeholders). I also do think it is very good to look at flexibility, wider application and especially enabling meaningful synthesis as reflected in part 4. Here the crucial part is the recognition and application that there are indeed variances in Mandates, Norms and Values etc. across the globe and they should be taken into consideration in evaluations on a more equal basis. If these proposed “New South” based criteria, norms and values are being implemented with good results and are given the same standing as the DAC then the possibility exists that it could be incorporated into a more “global” set, should the DAC so wish. I base this on the premise that perhaps because the North has invested much in the DAC system as it stands they might be less likely to include some major changes and thus there might be more push back. However in the future they might be willing to incorporate some changes if there is sufficient positive findings of the “new evolutionary evaluation criteria utilisation”. The process of deliberating and putting forward a working set of improved truly global evaluation criteria should be applauded and supported and hopefully forwarded for testing sooner rather than later.

    • Elma
      Thank you for highlighting the importance of assessing negative impacts (at times, some are actually intended). This remains an extremely important issue that our evaluation questions and criteria should ensure we address in all evaluations of all kinds.
      Your other key comment is also very important. Is enough difference between the Global South (GS) and Global North (GN) in terms of problems, demands, needs, sensitivity to cultural differences, etc., to justify a set of evaluation policies and frameworks that go beyond what is on offer at present? Of course, there have been efforts to tailor policies, guidelines, standards, criteria and so on to GS contexts, but too little thought is going into this issue, and we do not come up with very visible well reasoned alternatives or additions. So it is no wonder that generally donor demands and approaches dictate. Of course there are also power asymmetries. But all of this makes it necessary to contemplate seriously the questions you ask about the process of advancing our conceptualisation and use of evaluation criteria globally.

  10. Many thanks to Zenda for this very valuable series of posts and to all for the vigorous discussion on the role and meaning of criteria!

    Allow me to offer a few observations:

    (1) Evaluation is the professional practice of making judgments of of value. Judgments are made on the basis of some criteria—aspects, qualities, or dimensions that make it possible to determine that a program or policy is “good,” “poor,” “successful,” “unsuccessful,” “better (or worse) than some alternative,” “effective,” “ineffective,” “worth its costs,” “morally defensible,” and so on. These criteria rest (often rather implicitly) on values held by stakeholders and evaluator—normative beliefs about how things should be, strong preferences for particular outcomes or principles that individuals and groups hold to be desirable, good, or worthy. These beliefs, preferences, and principles include social-political values such as accountability, equity, effectiveness, and security as well as moral values such as respect for persons, dignity, and freedom of self-determination. It is on the basis of such values that individuals and groups (those with various ‘stakes’ in what is being evaluated) judge success or failure. In other words, what we value surfaces in the kinds of criteria we think are important. In the field of evaluation, inquiry into values addresses the question, “What are the criteria by which the success of a policy or program (or anything that is being evaluated) should be judged”? These criteria must be made explicit if an evaluator is to offer a reasoned, defensible judgment of the merit or worth of a program, policy, strategy, etc.

    (2) It is a wonderful idea to debate the meaning of any criterion, its value, and its means of determination–for example, what do effectiveness and efficiency mean and to whom; what would a criterion for evaluating the success of innovation look like, etc.

    (3a) However, ANY effort to stipulate, authorize, legitimate (whether directly or indirectly) a particular set of criteria as stable or permanent or semi-permanent, etc. is a mistake. It is a mistake (as Bob Williams has pointed out in his replies) because criteria (and the values on which they rest) are something to be debated and negotiated in every evaluation. We cannot transcend the limitations, uncertainty, and contingency of our knowing and valuing by positing some set of criteria as THE way we are to evaluate.

    (3b) The institutionalization of criteria such as the OECD/DAC criteria (or any replacement set for that matter) is precisely the problem–let’s not confuse an important debate about the meaning, utility, and determination of various criteria with the bid to establish a new and “better” set of criteria. The former is a good idea, the latter simply further fosters the “criteriological project” in evaluation.

    (4) As I argued 20 years ago in “Farewell to Criteriology” [Qualitative Inquiry, 1996 Vol 2(1)] Lists of criteria should be regarded as nothing more than that, just lists–perhaps heuristic devices that might aid in our thinking.

    • Tom
      Thanks for your very clear line of argument, which is extremely important in this discussion. Identifying criteria per evaluation based on the values held about things by stakeholders and evaluators is of course one of the basic concepts on which our evaluation practice has been built. Can we consider changing this – especially when we consider global experiences, demands and challenges? I used to share your perspective, but have come to the opposing/alternative insights I have put forward in this series of posts. I am particularly concerned from the perspective of development and humanitarian work, where the complexity of what needs to be done and the asymmetric power relations far exceed anything that can be found in the Global North where/when evaluation originated. My fundamental argument is therefore that political imperatives, ignorance or lazy evaluation design will often prevent stakeholders and evaluators from identifying what really matters when dealing with interventions and issues in these domains. We cannot hope that evaluators will have the power or inclination to ensure that negotiations around criteria are also cognisant of what really matters, not only of stakeholder interests during that snapshot in time. The DAC criteria have therefore been invaluable in drawing attention to the importance for development of emphases such as relevance, impact and sustainability. I believe that unless we have a mechanism that forces us to recognise in every evaluation aimed at so-called “development” (or same for humanitarian intervention), the fundamentals of what development is supposed to be – and especially, its very nature including as CAS – we will do development a great disservice. And this is where I find that a set similar to what DAC has done and achieved will continue to have great value – IF we can also avoid, as several people have pointed out, their mechanistic use and poor implementation.

  11. Give a kid a hammer, and he will decide that everything within reach needs pounding. Complexity is my hammer so as I read these blog posts, a few notions popped into my head with respect to the place of complexity in the evaluation of development initiatives. Here they are, in no particular order. I have not elaborated much here. If anyone cares enough, I’m happy to expound at length. You know where to find me. Also, my blog and YouTube channel have much to say about my inclinations toward complexity and evaluation.

    EVOLUTIONARY BIOLOGY
    This is something I have thought about a lot. I think it’s worthwhile to think of programs as organisms adapting on a fitness landscape. That leads to a whole lot of program theory, logic modeling and evaluation methodology that would not be obvious through other lenses.

    COMPLEX ADAPTIVE SYSTEMS
    See below for why I’m not a big fan of complex systems as a way of understanding programs or doing evaluation. But I will say this. If I did have to define a complex system, I’d say it is a system that exhibits sensitive dependence, which means that: 1) its hard to predict the arc of their change, and 2) commonly accepted analytical methods are not totally satisfactory because those methods bow before the alter of the general linear model. What matters are means and variances, i.e. group characteristics. (Which is most certainly NOT to say that we should abandon those methods. That would be truly foolish and shortsighted.)

    COMPLEX BEHAVIOR
    My big problem with complex systems is that I don’t know how to define one other than to say that it’s a system that exhibits complex behavior. That’s too tautological for my tastes. In any case, as an evaluator I don’t know what to do with a complex system. But I do know how complex systems act. I know their behaviors. I can also make some decent decisions about which complex behaviors matter for the evaluation that I am doing. So I can pick particular complex behaviors and do something about them. I can devise a research design. I can figure out what data to collect and how to analyze it. I know the program theory I need to use when interpreting the data.

    IMPLEMENTATION ACROSS SETTINGS
    The problem of external validity, or scale-up, or transfer across setting, or by whatever name you want to give it, bedevils any kind of social intervention. I’d recommend reading the recent articles by Laura Leviton on this topic. They are some of the most well thought out and compressive stuff I have seen on the topic.

    One idea that Laura came up with is the idea of crowd sourcing wisdom on applicability across settings. This involves conversation among people who have tried to implement similar programs in different settings. The idea is that it may not be possible to have a formula or recipe that will “predict” success, but enough opinion from different contexts will help explain success across settings, and by so doing lead to decisions that are more likely to succeed.

    Please do not misunderstand the above paragraph as my endorsing the abandonment of “fidelity” as critical to the success of programs. But I do think diverse opinion will lead to insight that will make a difference.

    PROGRAM THEORY
    To my way of thinking, this is where complexity is most relevant. In terms of methodology all our familiar methodologies will work just fine. (There are a few exceptions, but nothing that stands in the way of the point I am trying to make.) The challenge is to figure out what evaluation questions we should deploy those familiar methodologies to answer, and that is a question of program theory.

    Here is one simple example I have been thinking about lately. Imagine some program scenario where there are many different service providers and levels of service. Maybe a bunch of primary care health clinics, supported by a few secondary facilities, which are in turn supported by fewer tertiary facilities. Relationships among them include dimensions of geography, referral patterns, informal cooperation among the staff, and so on and so forth. One evaluation question has to deal with robustness, i.e. the ability of the system to function if relationships among them are disrupted. A second question is about efficiency. If I were developing a program theory to evaluate this program, I’d posit that a fractal pattern of relationships would be the best compromise between efficiency and robustness. That’s program theory. It drives the methodology needed, i.e. we need data to look at patterns of relationships. The methodology itself is no big deal.

    • Interesting Jonny. Zenda’s blog prompted me to have a long conversation with a NZ colleague who is exploring the idea of complexity in evaluation. Like me, you and others, she questions strongly the current way in which complexity ideas are being discussed in evaluation (and by funders of interventions) – and the potential cul-de-sec that many of these conversations are leading us. Zenda’s contribution is far better informed than most (I’ve read some terrible stuff sent to me by evaluation journals for review), but still begs the question what the role is.

      Although the evidence is improving, there is still relatively little ‘scientific’ evidence that the real world of the kind of interventions we evaluation actually behave in the way that complexity theorists believe. If the ontological behaviour of complexity theories is still to be proven, I’m still frequently disappointed in the epistemological ‘insights’ that many evaluators using complexity ideas bring to the table. Based on my experience, and reading those of others, while complexity notions have good explanatory powers … they are often not much more than anyone with a good knowledge of the situation could do. Nothing wrong with that, but I think it is – as yet – not something that is going to bring the kind of fresh insights that people claim and hope for. As you say many of our current methods and methodologies are actually pretty good – it’s the way they are instigated and managed that’s a major problem.

      Anyway this isn’t a blog about complexity, but I’d caution Zenda’s hope that complexity theories are going to lead us to genuinely fresh insights into the serious issue of setting appropriate criteria for evaluation. Certainly it will help bolster certain ideas – but it’s currently a stretch to say that they are based on a deeper understanding of how the world works.

  12. I have been involved in attempts to improve research, capacity-building as well as the design and delivery of, good policy management and governance processes and outcomes (especially in so-called ‘lesser developed’ societies), for at least the last three decades. On reflection, resulting from these activities, I started to conclude the following:

    Criteria: ‘Criteria’ provide us with measuring instruments (indicators) to assess progress towards (programme or evaluation) goal achievement. These criteria always have to a lesser or larger extent inherent normative elements. Whether one should regard them as non-negotiable, core or flexible also indicate value preferences. These value preferences should be transparent and not hidden. I applaud Zenda for being open and transparent about her value preferences that I mostly tend to agree with. One of the values that is relevant here, is ‘development’.

    Development: One dominant trend in northern discourses is to undervalue the importance of sustainable development as a primary strategic goal of good governance in public programmes in so-called ‘more developed’ societies, probably inter alia because of the relatively high levels of development that have already been achieved in more affluent countries. This is an important reason why the SDGs are exclusively focussed on poverty alleviation and virtually totally ignore the sustainability of higher levels of development that have already been achieved in more developed (northern) societies. Development is therefore frequently defined in terms of southern contexts and not in terms of northern contexts. This is a fatal error, because development as strategic goal for all governments, can be unpacked in different consecutive stages. These stages start 1) with attempts by governments to provide effective choices in different ways to individuals to decide themselves how to satisfy their fundamental individual needs in poor, underdeveloped societies. 2) It then inevitably progresses to attempts by governments in middle class societies to ensure the availability of choices in the establishment, provision and maintenance of individual and collective middle class services and facilities in those societies. 3) Strategic development goals in relatively affluent societies eventually inevitably comprise attempts by those governments to enable and facilitate higher level customised and enriched individual and collective lifestyle choices that are not always available in middle class societies. Developmental goals are therefore as strategically important in northern societies as in the south. Societies in the south experience a perpetual struggle to catch up with what most northern societies regard as even the most basic or minimum levels of development.

    This is why it is imperative for the DAC (which attempts to promote development), to refine and continuously improve its criteria to measure progress with different stages of development, and not only in poor or underdeveloped ones, although, admittedly, the desperate conditions prevailing in poor societies justify a special focus on the promotion of stage 1 and 2 development approaches rather than on stage 3. The criteria for measuring development, however, need to be inclusive enough to accommodate the different stages of development summarised above.

    Process and outcome: Programme outputs, outcomes and impacts cannot be totally isolated from design and implementation processes through which they come about. Policy process deficiencies like different manifestations of inefficiency, non-consultation with significant stakeholders, bad planning or bad implementation practices have a direct bearing on the eventual product and its consequences (eg equitable, fair, accountable, transparent and sustainable developmental outcomes and impacts). This is inter alia what we have learned from complexity thinking. Against this background it therefore seems to me to be just logical to try to ensure that policy processes resulting in desired policy outcomes and impacts comply with good policy design and implementation practices like those already referred to above. This includes as a starting point the accurate identification of the simple or complex nature of the problem to be addressed as well as the attempted solutions to try to improve these undesirable conditions in the society concerned. The ‘complex’ nature of complexity, how to approach, understand and apply it to enrich and improve the accuracy of our understanding of the phenomenon that we are dealing with, therefore again seems to me to be an inevitable focus for the development of agreed upon ‘criteria’ to deal with a very fuzzy and controversial concept. After all, one of the basic functions of science is to try to simplify reality as far as is possible in standardised analytical categories to enable comparisons and assessments of similarities, differences and the descriptive (factual) or normative (good or bad) implications of these similarities and differences.

    I have also learnt over time from my exposure to good management theory and practice that one should continuously reflect on your performance and search for possibilities of improvement, especially when conditions change. Isn’t this exactly the point where we are now with the DAC criteria that were developed in a different era? Therefore, congratulations with your excellent initiative, Zenda. I trust that our new knowledge and insights into the dynamics of assessing developmental outcomes, impacts and processes will assist in refining and improving further the measuring instruments that we can use to determine more accurately whether we are getting closer to achieving our different developmental goals..

  13. Dear Zenda,

    Thanks for pioneering the DAC criteria evaluation discourse. In my view, your articles reflect an in-depth analysis premised on two-pronged principles, firstly, the evaluation for development and the currency of context. Overall, colleagues seem to appreciate the timeliness of the initiative and its relevance.

    In due course, your insightful inputs will be incorporated and applied by both practitioners and commissioners of evaluations. Once again, thank you for your thought-provoking contributions.

    Regards

    Mokgophana

  14. Thank you for pointing me towards your blog. Interesting to see your thoughts about development in the SDG era and the implications of complexity. But I am not convinced by your argument that we need to revisit the DAC evaluation criteria. As a practitioner since the 1970s I believe the criteria have brought stability and consensus among evaluators that has underpinned progress towards professionalisation. With the DAC criteria we speak a common language and share a conceptual framework. Are they perfect? No. Can they be improved? Certainly. The move towards inclusion of coherence is a good example of evolution and I suspect we will see more like that.
    I don’t support your arguments that the criteria in some ways discourage or limit the perspectives taken or engagement with stakeholders. The criteria are neutral about methods and tools. I undertake rather more QA assignments nowadays than evaluations and have been interested to see a trend among some commissioners to ask evaluators to explain how the criteria will be applied to the topic in hand. That opens the door to more flexible application and is to be welcomed. It encourages careful thinking about how to interpret the criteria across complex settings and I would like to see that supported by greater use of summative judgment for assessments as well. Plenty to build on without turning to something different.

  15. One thing that’s frustrating me about this discussion, indeed virtually all the discussions I read on the topic of the DAC criteria, are claims about how significant they have been in achieving X or Y in the development field. Yet I have still to see a single piece of evidence to back up those claims. The irony of this, I hope, will not be missed by evaluation colleagues.

  16. Zenda, thank you for sharing this thought triggering post! I might take a slightly different route here and express my self a little differently. As a development practitioner and an evaluator myself, I feel resistance to the idea of “criteria”. Somehow what feels enjoyable for me in this profession is the whole concept of starting off with a certain intent, be it a programme/ a policy / a road map, and the purpose (for me) of setting criteria, principles and or guidance is to create some sort of “safety” around the exercise; to come on later and realize that these criteria were not really part of the “intent” and need to be revisited or vanished all together. And, we start all over again! From my own experience, criteria as we set them are rarely (if ever) fully fulfilled and this is their purpose (to be contested CONSTANTLY).. it is during the process, the operation, the making of the exercise, the evaluation, that we should be able to break free from the criteria and more often discover new venues where these criteria become obsolete..so how is this reflected in the DAC, or any other forms of evaluation guidelines? To me, they are a safety nets to mediate a politically charged topic that is “evaluating human development”, especially within “BIG “ institutions where decision making can be deflated and spread among different agents. Also , the DAC emanate from a very specific world point of view on what development is, and should be, which is rooted in post war aid and assistance. Would these same criteria be applied to nation led development? Middle Income Countries who are experimenting their “own” recipe for development, through political and economic reforms? I don’t think so. UNEG guidance should also be mindful of inheriting criteria (whether DAC or other) ..and this is where things become interesting. I by no means defend the dismantle/ or letting go of criteria, especially in the context of DAC where it has been proven useful to harmonize donor approach on how they conduct evaluation (and they have shown they are fit for purpose). I would be super mindful on the “safety” aspect while revisiting/ reformulating the criteria and guidelines..are we coming from a place of “control” ” predictability” or a place to “facilitate”, “exploratory” , “adaptive” and forward looking?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.