The Counter-Revolution of Development Economics: Hayek vs. Duflo

This post is by Adam Martin, a post-doctoral fellow at DRI. F.A. Hayek, well known as a critic of central planning, also criticized what he called “scientism,” a blind commitment to the methods of the physical sciences beyond their realm of applicability. In The Counter-Revolution of Science, Hayek opposed to “scientism” the genuine spirit of scientific inquiry.

Esther Duflo’s emphasis on small-scale experimentation has affinity with Hayek’s critique of grand schemes of central planning. As Duflo said in an interview with Philanthropy Action: “I think another untested and potentially wrong idea is that you have to do everything at the same time or else. This is a pretty convenient untested belief because if you live in that world, it is almost impossible to evaluate what you do.”

But Hayek’s concerns about “scientism” might yet apply to Duflo. She continues in the same interview:

Whereas if you say, I am going to press on this button and see whether it provides this result, you might find there are many things that do work surprisingly well with surprising consistency. So it is not that the world is so incredibly complex that every place needs a unique combination of five factors just to produce anything. I don’t know that we would have been able to say the same thing five years ago, but now we are starting to be in the position to say that a number of things, if well designed, just work pretty well in a lot of contexts.

Hayek, in contrast, argues that sheer, context-independent experimentation is not a viable path to development:

An experiment can tell us only whether any innovation does or does not fit into a given framework. But to hope that we can build a coherent order by random experimentation with particular solutions of individual problems and without following guiding principles is an illusion. Experience tells us much about the effectiveness of different social and economic systems as a whole. But an order of the complexity of modern society can be designed neither as a whole, nor by shaping each part separately without regard to the rest, but only by consistently adhering to certain principles throughout a process of evolution. (Law, Legislation, and Liberty Vol. I, p. 60)

Experimentation, for Hayek as well as Duflo, is the chief instrument of social change. Making experimentation work for development requires institutional feedback mechanisms which can fit together newly-discovered ways of doing things in mutually reinforcing ways. What Hayek defends as "liberal" principles are the ways of coordinating individual experiments in a way that enhances human welfare. Ad hoc, "pragmatic" approaches might solve some local problem, but without coordination with other projects the progress will not be reinforcing and self-sustaining.

Promoting progress is like playing leap-frog in the dark. Big leaps into the unknown can easily end in disaster. Experiments are small leaps. Only when we combine those small leaps together according to some rules do we leap-frog in a definite direction and reinforce each other's progress, rather than ambling about and running into each other. Without systematic feedback mechanisms that are effective at coordinating different projects, randomized trials are like those small leaps. They might be able to solve particular problems--especially in mitigating the ill effects of poverty--but they would not lead to the self-reinforcing process of wealth generation necessary to eliminate poverty.

Read More & Discuss

Michael Clemens won’t let up on the Millennium Villages + bonus links

It’s nice to see scholars bringing attention to the critical need for evaluation and informed public dialogue (not just “success stories” or short-term impact evaluation) for the Millennium Villages Project, which we have also covered on this blog. Michael Clemens of the Center for Global Development is currently carrying on a very revealing dialogue with Millennium Villages. In Michael’s first blog post which we blogged, he makes three central points:

  1. The hundreds of thousands of people living in the Millennium Villages, present and future, deserve to know whether the project’s combination of interventions is backed up by good science.
  2. Randomized evaluation is the best way to do this. While it may be too late to properly evaluate the first wave of villages, there is still time to conduct such a study for the next wave of villages.
  3. The MVP evaluation should demonstrate long-term impact before it is scaled up.

In a subsequent post, Michael parses the curious non-answer he receives from the director of monitoring and evaluation for the MVP, Dr. Paul Pronyk. He breaks down—for those of us not intimately involved in the finer details of impact evaluation—the difference between true scientific evaluation and what the MVP says it is doing, namely “matched randomly selected comparison villages.”

What the MVP has done is something very different from…a rigorous evaluation.  First, village cluster A1 was chosen for treatment, for a range of reasons that may include its potential for responding positively to the project.  Then, long after treatment began, three other clusters that appear similar to A1 were identified — call these “candidate” comparison clusters A2, A3, and A4.  The fact that all three candidates were chosen after treatment in A1 began creates an enormous incentive to pick those candidates, consciously or unconsciously, whose performance will make the intervention in A1 look good.  Then the comparison village was chosen at random from among A2, A3, and A4.

Differences between the treated cluster and the comparison cluster might be due to the MVP. But those differences might also be due to how the original Millennium Village was chosen, and how the three candidate comparison villages were chosen.  This is not a hypothetical concern…

So, either the MVP director of evaluation does not understand evaluation...or he thinks we won't know the difference.

Dr. Pronyk promises the release of the MVP’s midpoint evaluation at some unspecified time later this year, and said they “look forward to an active discussion about the initial findings regarding poverty, hunger, and disease in the Millennium Villages.” We hope the scholarly community and the wider reading public concerned with development issues will give Dr. Pronyk precisely what he’s asking for.

Bonus Links

* Sounds a bit like a parody we wish we’d written….but it’s true. Yesterday’s NYT features this quote from a story on China’s bid to supply California with technology, equipment and engineers to build a high-speed railway, and to help finance its construction:

“We are the most advanced in many fields, and we are willing to share with the United States,” Zheng Jian, the chief planner and director of high-speed rail at China’s railway ministry, said.

* We’d be remiss not to mention this helpful timeline of celebrity aid to Africa featuring an interactive map from Mother Jones (and some additional commentary from Wronging Rights and Texas in Africa.)

Read More & Discuss

The coming age of accountability

There was such a great audience yesterday at the Brookings event on What Works in Development. (If you are a glutton for punishment, the full length audio of the event is available on the Brookings web site.) In the end, what struck me was the passion for just having SOME way to KNOW that aid is benefiting the poor, which dwarfed the smaller issue of Randomized Experiment methods vs. other methods.

And extreme dissatisfaction with aid agencies who ignore even the most obvious signs that some aid effort is not working. (Example cited in the Brookings book: a World Bank computer kiosk program in India celebrated as a "success" in its "Empowerment" sourcebook. Except that the computers sat in places without functioning electricty or Internet connections. Critics pointed that out, and yet officials still defended the program as contributing to "Empowerment." Abhijit Banerjee asked "empowered how ? through non-working computers?")

It is awfully hard to get an accountabilty movement going that would have enough political power to force changes on aid agencies, say, away from forever mumbling "empowerment," towards actually making computers light up.

Accountability is not something that anyone accepts voluntarily. It is forced on political actors by sheer political power from below. That's what democratic revolutions are all about. Can we hear a lot more from people in the poor countries protesting bad aid (thank you, Dambisa Moyo) and praising good aid (thank you, Mohammed Yunus)? Can we hear A LOT more from the intended beneficiaries themselves? Can their allies in rich countries help them gain more political leverage with rich country aid agencies?

I don't know yet. But there is a lot more awareness of the accountability problem then there was a decade ago. The dialogues of the blogs on making Haiti disaster aid work is one example.  The size, savvy, and enthusiasm of the audience yesterday was one more small hopeful sign.

Watch out, aid agencies, accountability is coming.

Read More & Discuss

"What works in development" is available again

what_works_in_developmentAfter my whining in a previous post that the great publication "What works in development: Thinking Big and Thinking Small" was not easily available, it is once again available on Amazon. Let me repeat the previous sales pitch: although I am one of the editors, this is not about self-promotion.  The main contribution of the book is to feature a galaxy of academic superstars heatedly debating how or whether to apply randomized evaluation to development projects (see one of our most popular posts of all time:  "The Civil War in Development Economics"). This is the only book anywhere that summarizes the arguments on both sides.

Any interest? Can we bump up its current rating on Amazon from #81,373?

Read More & Discuss

A balanced reaction to "The Civil War in Development Economics"

Aid Watch received the following very thoughtful comment. The author wishes to remain anonymous: The debate in the academic world sounds fascinating!  And it mirrors in some ways the ongoing debates I have within the international development practitioner community, where I work.  Due to my background and current job, I'm the resident RCT "expert" of sorts in my organization and get to have lots of fascinating discussions with program and M&E staff. I see the following pros and cons for randomized evaluation (or RCT's - randomized control trials - as they are often called in the NGO world):


- As always, the key idea that you can't attribute causality of impact without a randomly-assigned control group.  Selection bias and other problems affect any other method to varying degrees.

CONS (or rather, arguments for having additional approaches in your evaluator's toolbox):

- RCT's are harder to do for long-run impacts. You either have to leave the control group without the program for 10-20 years, which is an ethical and logistical challenge.  Or you have to rely on some assumptions to add effects together from repeated follow-up surveys. For example if you delayed the start of a program in the "control group" for three years and then did a follow-up survey every three years, then you could add the difference between 3 and 0 years plus the difference between 6 and 3 years plus the difference between 9 and 6 years, etc, but you'd have to assume some stuff like linearity in the effect over time or specific types of interactions with global on-off events?  (I'm still thinking about this whole idea.)

- With a complex or system-wide program, you often can't have a control group, such as if you are working on a national scale. For example, working to change gender injustices in a country's laws. - Context is important and you can't always get that with good background research or a good pilot before an RCT, though you should try.  My organization talks a lot about "mixed methods" - mixed quantitative and qualitative research being a good way to combine the strengths of each.  In fact the RCT that I'm overseeing includes a team of anthropologists. - Qualitative research can also be more responsive if you get unanticipated results that are hard to explain.

So, being a good two-handed economist, I do see both sides now, though I'm still pro-RCT.  It helps that I was at that bastion of qualitative methodology, the American Evaluation Association conference (another AEA!) and heard some good indoctrination on the anti-RCT side.

It's particularly interesting to be at my INGO since much of the organization's work is focused on areas that are tough to evaluate with RCT's including lobbying the U.S. govt; humanitarian relief work (though we have a few staff who want baselines for refugee camps); and many small-scale, long-term, idiosyncratic projects in communities facing severe challenges.

The closest I've come to agreement with people who are anti-RCT is to have all of us agree that it's a great tool in the right circumstances but that it's one of many good tools.  What we always disagree on is whether RCT's are overused (them) or underused (me).  And many people hate the words "gold standard".  It's a red flag.  I use it anyway, as in "RCT's are the gold standard for short-run impact evaluations that you want to be free from selection bias."

I think that the "right circumstances" for RCT's would include important development approaches such as clean water or microcredit that haven't been evaluated yet with RCT's; or big programs that are finally stable in their implementation after an initial period of experimentation and adaptation. Pilots are OK, too, though that is a harder sell; program staff want to be able to get in there and experiment away with what works and what doesn't without worrying about rigorous evaluation.

It'll be interesting to see where these discussions are in 5 or 10 years.

Read More & Discuss

The Civil War in Development Economics

what_works_in_developmentFew people outside academia realize how badly Randomized Evaluation has polarized academic development economists for and against. My little debate with Sachs seems like gentle whispers by comparison. Want to understand what’s got some so upset and others true believers? A conference volume has just come out from Brookings. At first glance, this is your typical sleepy conference volume, currently ranked on Amazon at #201,635.

But attendees at that conference realized that it was a major showdown between the two sides, and now the volume lays out in plain view the case for the prosecution and the case for the defense of Randomized Evaluation.

OK, self-promotion confession, I am one of the editors of the volume, and was one of the organizers of the conference (both with Jessica Cohen). But the stars of the volume are the speakers and commentators: Nava Ashraf (Harvard Business School), Abhijit Banerjee (MIT), Nancy Birdsall (Center for Global Development), Anne Case (Princeton University), Alaka Halla (Innovations for Poverty Action), Ricardo Hausman (Harvard University), Simon Johnson (MIT), Peter Klenow (Stanford University), Michael Kremer (Harvard), Ross Levine (Brown University), Sendhil Mullainathan (Harvard), Ben Olken (MIT), Lant Pritchett (Harvard), Martin Ravallion (World Bank), Dani Rodrik (Harvard), Paul Romer (Stanford University), and David Weil (Brown). Angus Deaton also gave a major luncheon talk at the conference, which was already committed for publication elsewhere. A previous blog discussed his paper.

Here’s an imagined dialogue between the two sides on Randomized Evaluation (RE) based on this book:

FOR: Amazing RE power lets us identify causal effect of project treatment on the treated.

AGAINST: Congrats on finding the effect on a few hundred people under particular circumstances, too bad it doesn’t apply anywhere else.

FOR: No problem, we can replicate RE to make sure effect applies elsewhere.

AGAINST: Like that’s going to happen. Since when is there any academic incentive to replicate already published results? And how do you ever know when you have enough replications of the right kind? You can’t EVER make a generic “X works” statement for any development intervention X. Why don’t you try some theory about why things work?

FOR: We are now moving in the direction of using RE to test theory about why people behave the way they do.

AGAINST: I think we might be converging on that one. But your advertising has not yet got the message, like the JPAL ad on “best buys on the Millennium Development Goals.”

FOR: Well, at least it’s better than your crappy macro regressions that never resolve what causes what, and where even the correlations are suspect because of data mining.

AGAINST: OK, you drew some blood with that one. But you are not so holy on data mining either, because you can pick and choose after the research is finished whatever sub-samples give you results, and there is also publication bias that shows positive results but not zero results.

FOR: OK we admit we shouldn’t do that, and we should enter all REs into a registry including those with no results.

AGAINST: Good luck with that. By the way, even if do you show something “works,” is that enough to get it adopted by politicians and implemented by bureaucrats?

FOR: But voters will want to support politicians who do things that work based on rigorous evidence.

AGAINST: Now you seem naïve about voters as well as politicians. Please be clear: do RE-guided economists know something the local people do not know, or do they have different values on what is good for them? What about tacit knowledge that cannot be tested by RE? Why has RE hardly ever been used for policymaking in developed countries?

FOR: You can take as many potshots as you want, at the end we are producing solid evidence that convinces many people involved in aid.

AGAINST: Well, at least we agree on the on the much larger question of what is not respectable evidence, namely, most of what is currently relied on in development policy discussions. Compared to the evidence-free majority, what unites us is larger than what divides us.

Read More & Discuss

Development Experiments: Ethical? Feasible? Useful?

A new kind of development research in recent years involves experiments: there is a “treatment group” that gets an aid intervention (such as a de-worming drug for school children), and a “control group” that does not. People are assigned randomly to the two groups, so there is no systematic difference between the two groups except the treatment. The difference in outcomes (such as school attendance by those who get deworming vs. those who do not) is a rigorous estimate of the effect of treatment. These Randomized Controlled Trials (RCTs) have been advocated by leading development economists like Esther Duflo and Abhijit Banerjee at MIT and Michael Kremer at Harvard. Others have criticized RCTs. The most prominent critic is the widely respected dean of development research and current President of the American Economics Association, Angus Deaton of Princeton, who released his Keynes lecture on this topic earlier this year, “Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development.” Dani Rodrik and Lant Pritchett have also criticized RCTs.

To drastically oversimplify and boil down the debate, here are some criticisms and responses:

1. What are you doing experimenting on humans?

Denying something beneficial to some people for research purposes seems wildly unethical at first. RCT defenders point out that there is never enough money to treat everyone and drawing lots is a widely recognized method for fairly allocating scarce resources. And when you find something works, it will then be adopted and be beneficial to everyone who participated in the experiment. The same issues arise in medical drug testing, where RCTs are mainly accepted. Still, RCTs can cause hard feelings between treatment and control groups within a community or across communities. Given the concerns of this blog with the human dignity of the poor, the researchers should be at least be careful to communicate to the individuals involved what they are up to and always get their permission.

2. Can you really generalize from one small experiment to conclude that something “works”?

This is the single biggest concern about what RCTs teach us. If you find that using flipcharts in classrooms raises test scores in one experiment, does that mean that aid agencies should buy flipcharts for every school in the world? Context matters – the effect of flipcharts depends on the existing educational level of students and teachers, availability of other educational methods, and about a thousand other things. Plus, implementing something in an experimental setting is a lot easier than having it implemented well on a large scale. Defenders of RCTs say you can run many experiments in many different settings to validate that something “works.” Critics worry about the feeble incentives for academics to do replications, and say we have little idea how many or what kind of replications would be sufficient to establish that something “works.”

3. Can you find out “what works” without a theory to guide you?

Critics say this is the real problem with issue #2. The dream of getting pure evidence without theory is usually unattainable. For example, you need a theory to guide you as to what determines the effect of flipcharts to have any hope of narrowing down the testing and replications to something manageable. The most useful RCT results are those that confirm or reject a theory of human behavior. For example, a general finding across many RCTs in Africa is that demand for free life-saving products collapses once you charge a price for them (even a low subsidized price). This refutes the theory that fully informed people are rationally purchasing low cost medical inputs to improve their health and working capacity. This would usefully lead to further testing of whether the problem is lack of information or the assumption of perfect rationality (the latter is increasingly questioned for rich as well as poor people).

4. Can RCTs be manipulated to get the “right” results?

Yes. One could search among many outcome variables and many slices of the sample for results. One could investigate in advance which settings were more likely to give good results. Of course, scientific ethics prohibit these practices, but they are difficult to enforce. These problems become more severe when the implementing agency has a stake in the outcome of the evaluation, as could happen with an agency whose program will receive more funding when the results are positive.

5. Are RCTs limited to small questions?

Yes. Even if problems 1 through 4 are resolved, RCTs are infeasible for many of the big questions in development, like the economy-wide effects of good institutions or good macroeconomic policies. Some RCT proponents have (rather naively) claimed RCTs could revolutionize social policy, making it dramatically more effective – this claim itself can ironically not be tested with RCTs. Otherwise, embracing RCTs has led development researchers to lower their ambitions. This is probably a GOOD thing in foreign aid, where outsiders cannot hope to induce social transformation anyway and just finding some things that work for poor people is a reasonable outcome. But RCTs are usually less relevant for understanding overall economic development.

Overall, RCTs have contributed positive things to development research, but RCT proponents seem to oversell them and seem to be overly dogmatic about this kind of evidence being superior to all other kinds.

Read More & Discuss