24 July 2024

Address to Asian Development Bank Institute Research Conference, Manila, Philippines

Note

Evidence‑based development

Thank you to the Asian Development Bank Institute, the Asian Development Bank and Australian National University for hosting this conference on inclusive economic growth.

It’s an honour to be here representing the Australian Government and to have the opportunity to pay tribute to a great Australian and international citizen, Dr Peter McCawley.

It’s especially pleasing to be invited to speak at an event jointly organised by my friend Professor Hal Hill, one of Australia’s greatest‑ever development economists, and Dr Daniel Suryadarma, one of my former PhD students, who has gone on to make a huge contribution to research and policymaking, including in his role at the Asian Development Bank Institute.

Like many of us here today, Dr McCawley had several professional lives bound by a deep desire to understand economic equality (Hill 2024).

In a 2019 essay on why he was interested in Southeast Asia, Dr McCawley said: ‘I was worried by the huge gaps between rich and poor countries. They seemed to me then – as they still seem to me now – a key global issue.’ (Hill 2024)

Dr McCawley was active in the media and regularly shared his expertise with the press on the global issues that fascinated him, particularly his major academic output on Indonesian economic development (Hill 2024).

Dr McCawley understood the value of speaking to the media as an educative tool – especially when complex global issues are misunderstood or misinterpreted in the broader community (Hill 2024).

His desire to educate extended to the online world. It may interest you to know that in his last decade Dr McCawley was a prolific contributor to Wikipedia, particularly on his specialised subject area of Indonesia (Wikipedia 2023). There’s still some snobbery around about Wikipedia. Dr McCawley’s contributions remind us that it really is the encyclopaedia of record, and is likely to still be around when each of us have logged off for the last time.

Before I entered politics I spent 6 years as an economist at the Australian National University, from 2004 to 2010. Peter McCawley was an emeritus professor in the latter part of this period, but we didn’t overlap. It’s a pity, since I would’ve loved to get his insights on the limited research that I did on Indonesia (Leigh and van der Eng 2009), the 3 years that I lived in Jakarta and Banda Aceh as a child, and what he learned as he moved between the worlds of power and ideas. He knew my father, Michael Leigh AM, a South‑East Asian researcher, but Peter and I didn’t connect directly.

Dr McCawley understood the value of bringing academic insights into the policymaking process. In common with many modern Australian development economists – Lisa Cameron, Stephen Howes, Lata Gangadharan, Pushkar Maitra and many others – Dr McCawley’s career helped to change lives through research findings and policy advocacy.

Today, I want to focus on an area of policy where this approach is particularly valuable. In July 2023, our government established the Australian Centre for Evaluation, with a mandate to conduct high‑quality randomised trials and other impact evaluations across government, including Australia’s contributions to international organisations.

The Australian Government understands its obligation to ensure that the aid we deliver has the maximum positive impact. Which is why the Australian Centre for Evaluation isn’t ideological, it’s practical.

The more we can figure out what works, the better we can make government work for everyone – especially for the most disadvantaged.

Because it’s the people who rely on aid services who suffer most when those services do not work.

For the most affluent, it doesn’t matter much whether government works. They can rely on private healthcare, private education and private security. They are less likely to be unemployed and have family resources to draw upon in challenging times. For the elite, dysfunctional government is annoying, but not life‑threatening.

However, for the most vulnerable among us, government can mean the difference between getting a good education or struggling through life unable to read and write.

Those who depend on government depend on knowing that the programs government is delivering actually work.

So in that sense, rigorous evaluation isn’t just about improving the efficiency of government; it’s also vital for reducing inequality.

Those of us who advocate for its use are informally known as ‘randomistas’. I used this name as the title of my book on the history, development and evolution of randomised trials (Leigh 2018).

Dr McCawley may not have called himself a randomista, but I believe he would have approved of the mission of the Australian Centre for Evaluation. Namely, its potential to identify good policy and save money by identifying and ending ineffective policy.

This is indicated by one of the outcomes of the 1983 ‘Jackson Committee’ Review of Australia’s overseas aid program, of which Dr McCawley was a member. The review recommended sweeping changes, including to inject greater analytical content into the program (Hill 2024).

The case for evidence‑based evaluation

The focus of my speech today is to make the case in favour of randomised trials in development.

In particular, I want to discuss the benefits and refute some of the criticisms.

Let’s start with the most tangible marker of global poverty – ill health.

One disease that receives significant attention is malaria, which claims the life of a young child almost every minute (UNICEF 2024).

Because mosquitos are most active at night, a simple solution is to sleep under a bed net.

The challenge for aid workers was discovering how to increase bed net use.

In 2006, development economist William Easterly was one of many who argued that if bed nets were handed out free of charge, people would be less likely to use them (Easterly 2006:12).

As a result, the World Health Organization used a co‑payment system (Leigh 2018).

However, economic theory did not provide a decisive answer to the question of how to get as many people as possible to use a bed net.

The answer was eventually settled by randomised experiments, which were conducted in a range of developing nations.

And the results were clear.

People who received a free bed net were just as likely to use it as someone who purchased it via a co‑payment. But because they were free, many more people took them up (Cohen and Dupas 2010).

That translated into practice and has helped save thousands of lives across Africa and the rest of the developing world.

This is just one example of how randomised trials can make a difference.

Perhaps this is why there has been such a rapid growth in their use in development economics.

In the 1990s there were fewer than 25 randomised experiments from developing countries published globally each year. In 2012 – the last year for which we have good data – there were 274 published (Banerjee and others 2016).

Today, the Abdul Latif Jameel Poverty Action Lab at MIT, known as J‑PAL, just one organisation, averages 140 randomised evaluations a year (J‑PAL 2024).

Evaluations aim to address various questions.

For instance, to find out if a spending program met its deadlines and budget.

And if it was implemented in a way that was consistent with its original design.

For new programs and small‑scale pilots, gathering feedback from participants or service providers can help overcome flaws in design and implementation.

After all, poor implementation can hinder the performance of good new policies.

These questions are important, but they don’t reveal if a program is effective, for who, why, or under what conditions.

The goal of best practice evaluation is to do one basic thing: determine the counterfactual.

What would have happened if you didn’t participate in the program?

In real life, we only get to see one version of reality, so we need to construct the alternative.

Randomised trials do this by tossing a coin.

Heads, you’re in the treatment group.

Tails, you’re in the control group.

The 2 groups are equivalent at the outset because luck determines whether or not you get the treatment. So any difference we see between them must be due to the intervention.

Low‑quality evaluations sometimes construct the counterfactual by assuming a person’s outcomes would remain the same without the intervention.

This can end up giving too much credit to the program.

Most sick patients eventually get better.

Many jobless people find jobs.

Many poor regions eventually grow, even if fitfully.

Therefore, an evaluation that assumes the world would otherwise have remained static is likely to produce a flawed result.

High‑quality evaluations can provide policymakers with certainty.

Here in the Philippines, a randomised trial confirmed that a mobile phone‑based tutoring program during the COVID‑19 pandemic led to a 40 per cent increase in students’ achievement in mathematics (Angrist and others 2023).

In Niger, a randomised trial demonstrated that scholarships provided to middle‑school students to cover the cost of schooling halved the rates of child marriage for girls (Giacobino and others 2024).

In Liberia, a randomised trial demonstrated that an 8‑week cognitive behavioural therapy program was able to reduce criminality in at‑risk men in the short term. This effect stuck and 10 years later, the intervention halved criminal offending (Blattman and others 2022).

In Kenya, a study of the economic impacts of deworming clearly illustrated why policymakers should consider the long‑term cost‑benefit analysis of health interventions (Walker and others 2023).

This evaluation looked at adult Kenyans who had participated in a randomised trial of de‑worming treatments 20 years earlier.

They had participated in the trial when they were school children.

Now they were adults and many had children of their own.

The study found that children of participants who had received the de‑worming treatment were 24 per cent less likely to die before the age of 5.

This was due to the many ways in which de‑worming had boosted participants’ standard of living and improved the lives of their children.

This is powerful evidence that can be used to support effective investments in public health programs targeting parasitic infections, particularly in regions where such conditions are prevalent and can impede economic development.

Addressing criticisms of randomised trials

There are several criticisms of randomised trials in policy, some of which I’ll examine now.

Ethical concerns

First is the concern about ethics and the issue of fairness, which is often the first to arise.

Certainly, if we know for sure that if an intervention works then I agree that it is unethical to put people in the control group.

But there’s a flipside to that.

If we don’t know whether an intervention works then it is unethical not to find out. We cannot countenance programs being rolled out without robust evidence to back it up.

Another aspect of the ethical discussion is that conducting randomised trials can help strengthen our democracy (Tanasoca and Leigh 2024).

By using solid evidence to design programs, citizens can see that government is making programs based on what works, not on ideology or partisanship.

It is no coincidence that authoritarian regimes have been the most resistant to science and evidence.

Creating a more effective feedback loop shows the public that the government is focused on finding practical solutions to problems. And because the results of randomised trials are intuitively easy to understand, everyone can see what works.

Distraction

There’s also a criticism that focusing on policies that can be evaluated using randomised control trials may be a distraction from evaluating more important policy programs, where such trials are not feasible (Pritchett 2020).

Now I’m happy to acknowledge that there are major structural reforms where randomised trials are not feasible – and may not be necessary.

As Hal Hill noted when we were discussing this topic a few weeks ago, there is plentiful non‑randomised trial evidence for trade liberalisation, central bank independence and clear fiscal policy rules.

But there is a plethora of examples where randomised trials can be helpful. As development economist David McKenzie observes, ‘there are many policy issues where … even after having experienced the policy, countries or individuals may not know if it has worked. …[this] type of policy decision is abundant, and randomized experiments help us to learn … what cannot be simply learnt by doing’ (McKenzie 2020).

It is true that different evidence and methods are suited to different types of policies.

However, we have learnt that randomised trials are feasible far more often than critics suggest. And when they are feasible, they often provide compelling evidence that other methods cannot.

Having said that, we must also be willing to draw on other empirical tools and methods where they are not.

Average versus individual treatment effects

Another criticism is that while randomised trials show the average treatment effect, they cannot observe the individual treatment effects.

This is a problem when the impacts of a policy or program are highly varied (see, for example, Westhorp 2009 and Rogers 2023). Sometimes the effect can even be negative for one sub‑group and positive for another.

In other words, ‘what works’ on average can be ineffective or even harmful for certain groups.

And ‘what doesn’t work’ – on average – might still be effective in certain circumstances.

As an example, consider the 7‑year randomised trial evaluation of the ‘Early Head Start’ program in the United States. This program sought to promote children’s learning, and the parenting that supports it, within the first 3 years of life (Love and others 2004).

Researchers found that the program worked – on average. But with a sufficient sample size, this randomised trial was able to go deeper. They looked at the impact of the Early Head Start program overall and its impact on 27 different sub‑groups. The sub‑groups included maternal age, birth order, race and ethnicity.

By studying sub‑groups the researchers were able to uncover different effects. They showed that, for some sub‑groups, the program was more effective than the average effect. However, they also showed that the Early Head Start program was ineffective for families that had multiple risk factors (Love and others 2004:356–57).

Armed with this knowledge, policymakers could proceed with confidence that the program was helpful for many of the target population, and then redouble their efforts to provide assistance to those for whom the program remained insufficient.

Another approach evaluators are starting to use is to employ causal machine learning tools, which can estimate effects at the individual level (Athey and Wager 2019).

This helps evaluators find sub‑groups with effects they would not have known to test in advance.

Causal learning tools can provide evaluators with confidence that they have not missed a sub‑group that is not benefitting from – or even being hurt by – a program.

Admittedly, such comprehensive analysis isn’t always possible.

So there is a role for complementary quantitative or qualitative research, as part of a mixed‑methods randomised trial, to explore the possibility of differential impacts for different groups.

But critics sometimes overlook the potential that randomised trials themselves can offer.

It is so vital to those we serve – through national governments and multilateral institutions – that we improve the rigour of evaluations, that we build more randomised trials into policy development.

Institutionalising evaluation

Australia is a staunch supporter of the work of J‑PAL in Indonesia.

And we understand that multilateral institutions do their best work when they are driven by evidence rather than ideology.

Nobel Laureate Esther Duflo, the founder of J‑PAL, is responsible for hundreds of randomised trials, including the bednets example I mentioned earlier. Professor Duflo emphasises the importance of neutrality when assessing the effectiveness of antipoverty programs.

She once said: ‘One of my great assets is I don’t have many opinions to start with. I have one opinion – one should evaluate things – which is strongly held. I’m never unhappy with the results. I haven’t yet seen a result I didn’t like.’ (Leigh 2018)

Professor Duflo starts her randomised trials with a range of strategies that she puts to the test. She says: ‘When someone of good will comes and wants to do something to affect education or the role of women or local governments, I want them to have a menu of things they can experiment with.’ (Leigh 2018)

Professor Duflo admits that policies sometimes fail. But it is because people are complex, not because there is a grand conspiracy against the poor (Leigh 2018).

Randomistas like Professor Duflo are providing answers that help to reduce poverty in developing nations. These results are usually messier than the grand theories that preceded them, but that’s the reality of the world in which we live.

But it’s not all chaos. Just as biologists and physicists build up from the results of individual experiments to construct a model of how larger systems operate, randomised trials combine the results of multiple experiments to inform policymakers.

At Yale University, Innovations for Poverty Action (IPA) plays a similar role to J‑PAL, conducting randomised trials and summarising their results for decision‑makers.

IPA is a non‑profit organisation that has conducted over 900 evaluations in 52 countries (Karlan 2022). IPA is responsible for the mobile phone‑based tutoring program trial in the Philippines, which I highlighted earlier.

IPA’s founder, Professor Dean Karlan, was appointed to the role of Chief Economist of USAID in 2022, where he is helping the agency incorporate iterative testing, experimental design, and behavioural insights into programming and decision‑making (USAID 2022).

Professor Karlan is a pioneer of using randomised trials to test the impact of interventions on poverty reduction.

Karlan and co‑author Nathanael Goldberg argue in one defence of randomised evaluation that, ‘… the bottom line is that most programs are not evaluated using randomized techniques, and more (but not all) should be.

‘Well‑implemented randomized trials provide particularly powerful ways to measure impact or improve product design because they guarantee the best unbiased estimates of program or product design impact. It is time to stop speculating and start collecting rigorous evidence about what works and why.’ (Karlan and others 2009)

Closing remarks

More rigorous evaluation means we pay more attention to the facts.

Randomistas are less dogmatic, more honest, more open to criticism, less defensive. We are more willing to change our theories when the data prove them wrong.

Ethically done, randomised experiments can change our world for the better.

Randomised trials may not be perfect, but the alternative is making policy based on what one pair of experts describe as ‘opinions, prejudices, anecdotes and weak data’ (Leigh 2018).

As the poet WH Auden once put it, ‘We may not know very much, but we do know something, and while we must always be prepared to change our minds, we must act as best we can in the light of what we do know’.

References

Angrist N, Ainomugisha M, Bathena SP, Bergman P, Crossley C, Cullen C, Letsomo T, Matsheng M, Panti RM, Sabarwal S and Sullivan T (2023) Building Resilient Education Systems: Cost‑effective Mobile Tutoring in the Philippines and Beyond, Innovations for Poverty Action, accessed 4 July 2024.

Athey S and Wager S (2019) ‘Estimating Treatment Effects with Causal Forests: An Application’, Observational Studies, 5(2):36–51.

Blattman C, Chaskel S, Jamison J and Sheridan M (2022) Cognitive Behavior Therapy Reduces Crime and Violence over 10 Years: Experimental Evidence, National Bureau of Economic Research Working Paper 30049, Cambridge MA, NBER.

Cohen J and Dupas P (2010) ‘Free distribution or cost‑sharing? Evidence from a randomized malaria prevention experiment’, Quarterly Journal of Economics, 125(1):1–45.

Deaton A and Cartwright N (2018) ‘Understanding and misunderstanding randomized controlled trials’, Social Science & Medicine, August 2018, 210:2–21.

Easterly W (2006) The White Man’s Burden: Why the West’s Efforts to Aid the Rest Have Done So Much Ill and So Little Good, Penguin Press.

Giacobino H, Huillery E, Michel BP and Sage M (2024) Schoolgirls Not Brides: Secondary Education as a Shield Against Child Marriage (English), World Bank Group, accessed 4 July 2024.

Hill H (2024) ‘Peter McCawley: Development Economist with a Mission’ [conference presentation], Asian Development Bank Institute conference in memory of Dr Peter McCawley, Manila.

J‑PAL (2024) About Us, J‑PAL website, accessed 21 June 2024.

Karlan D (15 November 2022) ‘Administrator Samantha Power At The Swearing‑In Ceremony For Dr. Dean Karlan As USAID Chief Economist’ [speech], USAID, accessed 13 June 2024.

Karlan D, Goldberg N and Copestake J (2009) ‘Randomized control trials are the best way to measure impact of microfinance programmes and improve microfinance product designs’, Enterprise Development & Microfinance, 20(3):167–176.

Leigh A and van der Eng P (2009) ‘Inequality in Indonesia: What Can We Learn from Top Incomes?’, Journal of Public Economics, 93:209–212.

Leigh A (2018) Randomistas: How Radical Researchers Changed Our World, Black Inc, Melbourne.

Leigh A (2023) ‘Evaluating Policy Impact: Working Out What Works’, Australian Economic Review, 56(4):431–441.

Love JM, Kisker EE, Ross C, Schochet PZ, Brooks‑Gunn J, Paulsell D, Boller K, Constantine J, Vogel C, Fuligni AS and Brady‑Smith C (2004) ‘Making a Difference in the Lives of Infants and Toddlers and Their Families: The Impacts of Early Head Start, Volume I: Final Technical Report’, Mathematica Policy Research, accessed 21 June 2024.

McKenzie D (2020) ‘If it needs a power calculation, does it matter for poverty reduction?’, World Development, 127:104815.

Pritchett L (2020) ‘Randomizing Development: Method or Madness?’, Florent Bédécarrats, Isabelle Guérin, and François Roubaud (eds), Randomized Control Trials in the Field of Development: A Critical Perspective, Oxford University Press.

Rogers P (28 August 2023) ‘Risky behaviour – 3 predictable problems with the Australian Centre for Evaluation’, Patricia Rogers Better evidence use, better world blog, accessed 12 June 2024.

Tanasoca A and Leigh A (2024) ‘The democratic virtues of randomized trials’, Moral Philosophy and Politics, 11(1):113–140.

USAID (15 November 2022) USAID Announces Appointment of Chief Economist [media release], US government, accessed 13 June 2024.

Walker M, Huang A, Asman S, Baird S, Fernald L, Hicks J H, de la Guardia F H, Koiso S, Kremer M, Krupoff M, Layvant M, Ochieng E, Suri P and Miguel E (2023) Intergenerational Child Mortality Impacts of Deworming: Experimental Evidence from Two Decades of the Kenya Life Panel Survey, National Bureau of Economic Research.

Westhorp G (14 June 2009) ‘Using Indicators to Advocate for Policy or Programs’ [conference presentation], Communities in Control Conference, Melbourne, accessed 21 June 2024.

Wikipedia (2023) Wikipedia: Deceased Wikipedians/2023 [website], Peter McCawley, accessed 13 June 2024.