Address to the UK Evaluation Task Force, 9 Downing Street, London

Randomised trials, living evidence reviews and global collaboration: ‘What Works’ for the next generation

In mid‑1998, a year after the election of the Blair government, I decided that I’d hop on a plane and see it up close. I’d just finished up an intense year as a judge’s associate, working for Justice Michael Kirby on the High Court of Australia, and wanted a break. In Australia, John Howard had just begun what would become the second‑longest Prime Ministership in Australian history. Tony Blair’s election was pretty exciting for a young Labo(u)r supporter.

Before leaving Australia, I sent about 50 faxes off to different Labour MPs, asking if they had any work for someone whose enthusiasm greatly exceeded my knowledge of British political institutions. Half a dozen MPs politely agreed to have a cup of tea with me, and I picked up some part‑time work with 2: Fiona Mactaggart and Ross Cranston.

I attempted to fill in the gaps in my knowledge of British politics, reading the hard‑bitten works of Philip Gould and Peter Mandelson, the political philosophy of Anthony Giddens and the tales of John O’Farrell, summed up in Things Can Only Get Better: Eighteen Miserable Years in the Life of a Labour Supporter. For much of the time, I lived in a share house in Kennington, and often walked to work, crossing the Thames at Westminster Bridge, photobombing a tourist photo in front of Oliver Cromwell’s statue, and arriving at work at the ironically named Palace of Westminster. I only spent 4 months here, but went home impressed.

If there was one lasting impression that I took away from the New Labour project, it was a strong sense that the government was determined not to be bound by old ideological divides, but to be open to the evidence on what works. This was a government driven by values, but less inclined to be dogmatic about particular programs. Good ideas could come from unexpected places. What mattered were results, not program labels.

At that stage, the first ‘What Works’ centre was about to be created, David Halpern was yet to join the Prime Minister’s Strategy Unit, and the Behavioural Insights Team was over a decade away. But the focus on practical reform was palpable. There was a keen focus on measuring outcomes – an essential precondition to good evaluation, and a sense that much of what the previous government had tried had been ineffective. There was a mood of idealism about what could be achieved, but scepticism about many of the programs in place to achieve it.

The development of the UK What Works Centres under both Labour and Conservative Governments has been a world‑leading achievement. Across health and housing, education and employment, hundreds of randomised trials have been conducted. For a practitioner, policymaker or curious member of the public, it is now easier than ever to see what we know, and what we do not. The culture of policy experimentation is as strong in Britain as it is anywhere else in the world.

Under Prime Minister Anthony Albanese, we’re committed to taking a stronger approach towards evidence‑based policymaking in Australia. Last July, we established the Australian Centre for Evaluation in the Treasury, with around a dozen staff and a budget of approximately A$2 million per year. The main role of the centre is to collaborate with other government departments to conduct rigorous evaluations, including randomised trials. Already, several trials are underway, including quality improvement trials aimed at improving the employment services system. Finding out the impact of career coaching or in‑person versus online services are crucial questions that will guide us in future reforms of the system.

The Australian Centre for Evaluation also provides evaluation training to other government agencies, and engages with the broader evaluation community. Most evaluators who work for the Australian Government are outside the Centre, so we have set up the ‘Evaluation Profession’, a cross‑agency network that builds a stronger community of practice among public servants whose job is to evaluate programs. We were pleased to see an editorial in Nature last month acknowledge the Australian Centre for Evaluation as one of the initiatives worldwide that are seeking to raise the quality and quantity of evidence.

The push towards good evidence goes beyond government. Last week, I spoke at a workshop at the University of Melbourne, organised by Philip Clarke, Robyn Mildon and Peter Choong. The workshop brought together academics with a background in randomised trials (principally economists and health researchers) and charities and charitable foundations who are interested in using randomisation to determine what works.

Australia’s largest charitable foundation, the Paul Ramsay Foundation, is presently running a A$2.1 million grant round on experimental evaluation of social programs, similar to a successful model that the Laura and John Arnold Foundation have deployed in the United States over the past decade. My hope is that over the coming decade, partnerships with Australian charities will see hundreds of randomised trials conducted, building the evidence base for what works, and creating an impetus for more philanthropic giving as donors see the power of randomised trials to change lives for the better.

But it’s not enough to produce good evidence; we also need to ensure that it is accessible to policymakers. Evidence synthesis began with single‑country meta‑analyses, and then expanded to cover multiple countries. Now, there is a recognition that the best practice involves living evidence reviews, continuously updated as new studies are published.

Following last month’s Global Evidence Summit in Prague, there have been several exciting developments:

The Wellcome Trust has announced that it will provide £45 million over 5 years for the development of new data and tools for accelerating living evidence synthesis.
The Economic and Social Research Council has announced funding of £11.5 million over 5 years to develop and administer a global evidence synthesis infrastructure to transform the evidence ecosystem, including through the use of artificial intelligence.
The heads of JBI, Cochrane and Campbell have announced a collaboration to build a truly global evidence ecosystem.
The Global Commission on Evidence has produced a ‘SHOW ME the evidence’ statement, outlining the key features of an approach to reliably getting research evidence to those who need it.

Alongside this, David Halpern and Deelan Maru have released their Global Evidence Report: A Blueprint for Better International Collaboration on Evidence, with an enthusiastic foreword from the Australian Chief Statistician, David Gruen. The report’s recommendations include standardised reporting and publication protocols to facilitate inter‑governmental sharing of evaluated interventions; evidence gap maps across priority policy areas; living evidence reviews; and international public service professional networks to accelerate the transfer and adoption of best practices across countries.

Randomised trials have provided rigorous evidence supporting a number of interventions. The Coalition for Evidence‑Based Policy, a US nonprofit that was recently re‑launched under the leadership of Jon Baron, gives examples of social programs that have been rigorously shown to produce large gains:

Job training programs Year Up and Per Scholas are targeted at low‑income adults. They focus on fast‑growing industries with well‑paying jobs, and provide paid internships with local employers. Randomised trials find that these programs boost long‑term earnings by 20 to 40 per cent.
For students from disadvantaged backgrounds, ASAP (Advancing Success in Associate Pathways) and ACE (Advancing Completion through Engagement) provide comprehensive academic, personal, and financial supports for low‑income students at community colleges and 4‑year colleges respectively. Randomised trials show that the programs raise college graduation rates by 11-15 percentage points.
Saga Tutoring, an intensive schoolwide maths tutoring program for 9th and 10th graders was shown in randomised trials to raise mathematics achievement in high‑poverty high schools by over two‑thirds of a grade level.

Conversely, rigorous evaluations have shown up many ineffective programs:

Scared Straight, a program aimed to expose juvenile delinquents to prison, was found in randomised trials to increase offending rates – the opposite of what the program intended.
After school programs in the United States were found in a randomised trial to have no positive impact on academic outcomes, but significant negative impacts on behavioural outcomes.
Randomised evaluations of abstinence‑only programs found no evidence that they reduced the age at which young people first had sex, or their number of sexual partners

Randomised trials have also helped temper the claims of advocates:

Despite claims that police body cameras would transform officers’ interactions with the public, randomised trials of body cameras suggests that the cameras produced only small and statistically insignificant effects on police use of force and civilian complaints (though they may provide better evidence when cases go to court)
Contrary to the claims that microcredit would lead to a substantial increase in entrepreneurship rates, its impact appears to be driven largely existing business owners starting more businesses, rather than new households becoming entrepreneurs. Randomised trials of microcredit showed little evidence that microcredit affected spending on education or health, or made women feel more empowered.
Despite bold claims for the impact of universal basic income on recipients, a recent randomised trial of universal basic income in 2 US states found that a payment of US$1000 per month over 3 years has fairly modest impacts on job quality and health.

I hope that at least some of these findings surprised you. People are complex, and it would be strange if we could predict from pure theory what impact programs will have on the people they are intended to help.

The same occurs in medicine, where a recent study concluded that the overall success rate of clinical trials is 8 per cent. Nonetheless, we spend over US$1 billion a year on clinical trials, because such trials identify which treatments work, and which do not. Clinical trials have saved lives by weeding out ineffective treatments. Clinical trials have extended lifespans by proving effective treatments.

The same can be true of policy, where a more rigorous evidence ecosystem can save money and improve lives. Donald Campbell, after whom the Campbell Collaboration is named, argued in 1991 for what he called an ‘experimenting society’. The experimenting society, Campbell said, ‘will be one which will vigorously try out proposed solutions to recurrent problems, which will make hard‑headed and multidimensional evaluations of the outcomes, and which will move on to try other alternatives when evaluation shows one reform to have been ineffective or harmful.’ Such a society, he said, would be honest and non‑dogmatic, accountable and scientific. It would encourage active citizenship, and prize due process. These are what philosopher Ana Tanasoca and I called the ‘democratic virtues’ of randomised trials.

In the period since Campbell wrote his article on the experimenting society, evaluation science and evidence science have advanced substantially. We know much more about the ways that low‑quality evaluation can lead us astray, and techniques for synthesising research for policymakers have advanced markedly.

Today, no public servant or politician should fool themselves that there are a range of equally good ways of devising policy. Evaluation and evidence science are not fields in their infancy. We have decades of experience about how to identify evidence gaps, put policies to the test, and implement the most effective programs. Policymaking by focus groups and gut feel alone is the modern‑day equivalent of bloodletting and lobotomies in medicine.

The initial phase of the What Works movement has delivered extraordinary gains in our understanding of policymaking in health, policing, education and other areas of social policy. As we move into the next phase of the What Works movement, there is an opportunity to improve the quality of evidence, to create living evidence reviews, and to share evidence across countries.

Thank you to all those who have helped to build this evidence infrastructure, and who are looking ahead to how it can be improved. Just as the energy of a reforming British Government impressed a 26‑year‑old antipodean visitor in 1998, the work that the British Government is doing on evaluation and evidence today helps inspire our policy reforms.

Australia and Britain enjoy a strong relationship, rooted in deep historical and cultural connections. The Australian Government looks forward to working together to develop better policies that shape a better world.