If an evaluation is released on the internet and no one comments, does it make a sound?

The release of the Millennium Villages Project mid-point evaluation has so far been met with no discernable public response. Strange, since the release is billed as the “first major scientific report on progress after three years of MVP activity.” Doubly strange, since the MVP is an ambitious project that reaches into nearly all areas of its 500,000 recipients’ lives, and proposes, in scaled-up version, to completely change the architecture and delivery of aid to Africa.

So why the silence? Two possible reasons come to mind. Perhaps:

  1. The evaluation doesn’t contain much that is unexpected or useful, and/or
  2. No one really cares about evaluation.

We knew that the report would give the mid-point results of a longitudinal study comparing data from 300 Millennium Village families collected when the project began and again three, and five years later. (Although this is no longer the midpoint of anything, as the project has since expanded from 5 to 10 years.)

The new data give a picture of encouraging results across all sectors compared to the baseline. In Mwandama, Malawi, for example, bednet use for children under five increased from 14 percent to 60 percent and malaria prevalence for all age groups fell from 19 percent to 15 percent. Maize yields increased dramatically from .8 tons per hectare to 4.5 tons per hectare.

Such short-term results are positive in the sense that they describe real, immediate changes in the lives of thousands of very poor people. But they are not surprising given what we know about the level of resources and intensive technical expertise invested in these villages: the project doubles the size of the local economy—it is roughly equivalent to a 100 percent increase of per capita income per year (see here for calculations from Michael Clemens).

Unfortunately the results are also not that useful: Three years is too short a period to know how to interpret this dramatic increase in maize yields, for example. Is this consistent with normal variation in crop yields? Was 2006 an unusually good or bad year for maize? We don’t know.

The results also don’t help us determine whether current and future resources should be shifted away from other existing or even yet-to-be invented approaches, towards the MVP template. Will those short-term gains last beyond the timeline of the project? Can the project become self-sustaining?

Again, we don’t know, in part because not enough time has passed. Consider this anecdote from a New York Times blog series by Jeff Marlow on the Millennium Village of  Koraro, Ethiopia:

In 2005, all fertilizer was given away, leading to a significant increase in food production. Fertilizer subsidies were then progressively rolled back; by last year, only 50% of the cost was covered. For the 2009 growing season, the project tried something new: farmers were given loans for fertilizer, but they are expected to pay back the full cost plus interest when the harvest comes.

For many Koraro farmers, this is a daunting challenge. “The project used to help us with fertilizer,” says Brhana Syum…“But now it’s very expensive, and there’s no way to pay for it all.” Many farmers facing similar constraints have chosen to scale back their farms, thereby requiring less fertilizer, rather than face enormous debts…

So this particular push towards sustainability has come up against some obstacles. It may yet succeed, or it may fail. We don’t know the end of the story.

Supporters of the project argue that the individual interventions have already been proven: for example, we know that using better seeds and adding fertilizer will increase crop yield. But what the MVP says it is proving with this evaluation is the “value and feasibility of integrated community-based investments”—that is, the whole package of interventions, as well as the management systems used to deliver them. And this is precisely what the MVP does not have the data to demonstrate.

This evaluation repeats the call to scale up the project within existing project countries and expand to new ones, as quickly as possible. But the MVP as a whole remains an untested and unproven intervention, while the lives of Millennium Villagers—their habits, beliefs, livelihoods, and sources of authority—are  inevitably being changed in profound ways. This evaluation does nothing to change the argument of my previous post that the MVP should live up to their promise to be a ‘proof of concept:’ to be seriously and independently evaluated, and proven to work—beyond immediate short-term effects—before it is scaled up.

If you were sick and someone offered you a drug that hadn’t been tested, would you take it? And even if you would, would you want hundreds of millions of people whose lives depended on it to forego other types of treatment and take that drug too?

Read More & Discuss

The Counter-Revolution of Development Economics: Hayek vs. Duflo

This post is by Adam Martin, a post-doctoral fellow at DRI. F.A. Hayek, well known as a critic of central planning, also criticized what he called “scientism,” a blind commitment to the methods of the physical sciences beyond their realm of applicability. In The Counter-Revolution of Science, Hayek opposed to “scientism” the genuine spirit of scientific inquiry.

Esther Duflo’s emphasis on small-scale experimentation has affinity with Hayek’s critique of grand schemes of central planning. As Duflo said in an interview with Philanthropy Action: “I think another untested and potentially wrong idea is that you have to do everything at the same time or else. This is a pretty convenient untested belief because if you live in that world, it is almost impossible to evaluate what you do.”

But Hayek’s concerns about “scientism” might yet apply to Duflo. She continues in the same interview:

Whereas if you say, I am going to press on this button and see whether it provides this result, you might find there are many things that do work surprisingly well with surprising consistency. So it is not that the world is so incredibly complex that every place needs a unique combination of five factors just to produce anything. I don’t know that we would have been able to say the same thing five years ago, but now we are starting to be in the position to say that a number of things, if well designed, just work pretty well in a lot of contexts.

Hayek, in contrast, argues that sheer, context-independent experimentation is not a viable path to development:

An experiment can tell us only whether any innovation does or does not fit into a given framework. But to hope that we can build a coherent order by random experimentation with particular solutions of individual problems and without following guiding principles is an illusion. Experience tells us much about the effectiveness of different social and economic systems as a whole. But an order of the complexity of modern society can be designed neither as a whole, nor by shaping each part separately without regard to the rest, but only by consistently adhering to certain principles throughout a process of evolution. (Law, Legislation, and Liberty Vol. I, p. 60)

Experimentation, for Hayek as well as Duflo, is the chief instrument of social change. Making experimentation work for development requires institutional feedback mechanisms which can fit together newly-discovered ways of doing things in mutually reinforcing ways. What Hayek defends as "liberal" principles are the ways of coordinating individual experiments in a way that enhances human welfare. Ad hoc, "pragmatic" approaches might solve some local problem, but without coordination with other projects the progress will not be reinforcing and self-sustaining.

Promoting progress is like playing leap-frog in the dark. Big leaps into the unknown can easily end in disaster. Experiments are small leaps. Only when we combine those small leaps together according to some rules do we leap-frog in a definite direction and reinforce each other's progress, rather than ambling about and running into each other. Without systematic feedback mechanisms that are effective at coordinating different projects, randomized trials are like those small leaps. They might be able to solve particular problems--especially in mitigating the ill effects of poverty--but they would not lead to the self-reinforcing process of wealth generation necessary to eliminate poverty.

Read More & Discuss

Esther-mania!

Esther Duflo is having a good month, first the John Bates Clark medal for best economist under 40, and now a new profile in the New Yorker. It’s great to see development economists appearing in the New Yorker (link to abstract, full article alas requires subscription).

Esther is very deserving of this recognition. Anyone who gets hundreds of other academics and researchers approaching things in a new way (“randomized controlled trials” to measure the impact of development projects by comparing treatment and control groups) deserves tremendous credit.

A couple of things I liked about the New Yorker article:

Her childhood view of the poor, Duflo said, was shaped by “Protestant left-wing Sunday School.”

In Duflo’s view, both sides of the Sachs-Easterly argument reflect an unrealistic public desire “for an expert discourse, which is going to be able to tell you: This is going to be the end of poverty.” Duflo…argues that “there is not going to be le grand soir – one day, the big revolution, and the whole world is suddenly not corrupt. But maybe you create a small little virtuous group here and something else there. All these things are incremental.”

What I really did NOT like about the New Yorker article:

It gives short shrift to criticisms of the limits of Randomized Trials,  quoting Lant Pritchett and Angus Deaton so briefly that I doubt the general reader will even get the criticism. The article spends more time on a completely inappropriate attack on the tone and alleged “elitism” of these critics than explaining why MANY in economics today feel discomfort with the claims made for Randomized Trials.

Read More & Discuss

Michael Clemens won’t let up on the Millennium Villages + bonus links

It’s nice to see scholars bringing attention to the critical need for evaluation and informed public dialogue (not just “success stories” or short-term impact evaluation) for the Millennium Villages Project, which we have also covered on this blog. Michael Clemens of the Center for Global Development is currently carrying on a very revealing dialogue with Millennium Villages. In Michael’s first blog post which we blogged, he makes three central points:

  1. The hundreds of thousands of people living in the Millennium Villages, present and future, deserve to know whether the project’s combination of interventions is backed up by good science.
  2. Randomized evaluation is the best way to do this. While it may be too late to properly evaluate the first wave of villages, there is still time to conduct such a study for the next wave of villages.
  3. The MVP evaluation should demonstrate long-term impact before it is scaled up.

In a subsequent post, Michael parses the curious non-answer he receives from the director of monitoring and evaluation for the MVP, Dr. Paul Pronyk. He breaks down—for those of us not intimately involved in the finer details of impact evaluation—the difference between true scientific evaluation and what the MVP says it is doing, namely “matched randomly selected comparison villages.”

What the MVP has done is something very different from…a rigorous evaluation.  First, village cluster A1 was chosen for treatment, for a range of reasons that may include its potential for responding positively to the project.  Then, long after treatment began, three other clusters that appear similar to A1 were identified — call these “candidate” comparison clusters A2, A3, and A4.  The fact that all three candidates were chosen after treatment in A1 began creates an enormous incentive to pick those candidates, consciously or unconsciously, whose performance will make the intervention in A1 look good.  Then the comparison village was chosen at random from among A2, A3, and A4.

Differences between the treated cluster and the comparison cluster might be due to the MVP. But those differences might also be due to how the original Millennium Village was chosen, and how the three candidate comparison villages were chosen.  This is not a hypothetical concern…

So, either the MVP director of evaluation does not understand evaluation...or he thinks we won't know the difference.

Dr. Pronyk promises the release of the MVP’s midpoint evaluation at some unspecified time later this year, and said they “look forward to an active discussion about the initial findings regarding poverty, hunger, and disease in the Millennium Villages.” We hope the scholarly community and the wider reading public concerned with development issues will give Dr. Pronyk precisely what he’s asking for.

Bonus Links

* Sounds a bit like a parody we wish we’d written….but it’s true. Yesterday’s NYT features this quote from a story on China’s bid to supply California with technology, equipment and engineers to build a high-speed railway, and to help finance its construction:

“We are the most advanced in many fields, and we are willing to share with the United States,” Zheng Jian, the chief planner and director of high-speed rail at China’s railway ministry, said.

* We’d be remiss not to mention this helpful timeline of celebrity aid to Africa featuring an interactive map from Mother Jones (and some additional commentary from Wronging Rights and Texas in Africa.)

Read More & Discuss

Aid agencies announce they will be accountable to independent evaluators; This blog to permanently close

IRINA News, April 1, 2010 Geneva, Switzerland—A coalition of aid agencies meeting in Geneva today announced a historic agreement to reform the international aid system. In signing the agreement, heads of aid agencies formally committed to accept the verdicts of independent evaluators of the programs and projects in their portfolios.

The new measures require the 39 multilateral and bilateral aid agencies to scale up only those programs with a proven track record of success. Programs shown by independent evaluation to have no impact—or a negative impact—on their intended beneficiaries will not be funded.

“An international agreement of this type is long past due,” said Mr. Poshtoff Van der Peet, the spokesperson for the coalition. “We believe this is a major step towards making sure our aid monies are spent in such a way that they actually reach the poor. Some of use may even have to go out of business, but this is a price well worth paying to make sure aid reaches the poor.”

We were very glad to see this story on the wires today. Because of this breakthrough, Aid Watch blog will discontinue operations itself effective immediately. Laura Freschi and William Easterly will shift to doing research on the fundamental determinants of long run prosperity, leaving the commentary on aid in the safe hands of independent evaluators.

Read More & Discuss

Best in Aid: The Grand Prize

As long as there are disasters, there will always be people who want to help by whatever means first strikes their fancy. There will be those who insist on giving shoes (including such high profile experts as Jessica Simpson and Kim Kardashian). Still others offer used yoga mats, or baby formula. Ports and roads clogged up with shoes and yoga mats cannot deliver essential medicines, food and supplies. Then there are those who swoop in to adopt children before their extended families have had time to locate them; or just show up to ‘help’ as unskilled volunteers, adding to the confusion and occupying jobs that could go to locals. And there will always be organizations around to capitalize on those uninformed good intentions.

But now there is a small but growing chorus of voices dedicated to equipping individual donors with information on how to help effectively in a crisis. This movement has the power to harness the generosity of individuals, change ingrained giving practices, and create positive pressure on NGOs and aid agencies to demonstrate the impact of their work.

That’s why the award for Best in Aid goes to…the Smart Giving movement, nominated by Saundra Schimmelpfennig of the blog Good Intentions are Not Enough.

This year, a week after the Haiti quake, Stephanie Strom of the New York Times wrote a story on the “unprecedented effort” to teach Americans to resist the impulse to send the wrong goods to Haiti.  Many advocated just sending something very much needed and which has a low transport cost to value ratio: cash. The advice to send cash “appears to be reaching a tipping point,” wrote Strom. Some Americans saw first-hand the piles of unneeded clothing donations in the aftermath of Katrina, or heard about aid distribution problems after the Asian tsunami. Now, people are hearing the message from politicians and policy makers spreading the word on Smart Giving to Haiti in real time, in time to prevent mistakes that cause unnecessary suffering and tragedy.

Contrast Strom’s story with the high profile stories that have appeared consistently since the current surge in interest in global poverty started earlier this decade, like this NYT headline:

Coverage of both global poverty and disasters always stressed the same thing: how much was needed in TOTAL donations. It was never about the danger of the WRONG donations. Today it is.

Saundra Schimmelpfennig herself appeared in the NYT article, and many other news sources (among them CNN, NPR, USA Today, Canada’s CBC radio, WNYC, The Daily Beast, The San Francisco Chronicle, and the Christian news magazine World) sought her advice on everything from the dangers of adoption in the immediate aftermath of a disaster, to how to evaluate disaster relief volunteer opportunities. Here on Aid Watch, guest blogger Alanna Shaikh’s post on how not to help in Haiti, called Nobody wants your old shoes, became the blog’s second most popular and most-widely circulated piece ever (the first was a satire, which we’re no longer allowed to talk about).

The campaign against relying on overhead ratio as a measure of charity effectiveness is also part of the good giving message. In collaboration with six other nonprofits, Tim Ogden of Philanthropy Action launched a campaign last December to convince donors to dump the overhead ratio - the measure of how much money goes to programs versus administrative costs - as a primary means of evaluating the effectiveness of a charity. “We’re finally at a point where people do have an alternative,” said Ogden. In the last few years, organizations like GiveWell, Philanthropedia and Great Nonprofits have emerged to give people more useful information about charities, and to pressure charities to devote the resources to collecting that information and making it public.

Finally, the intensity of the debate on evaluation with randomized controlled trials in the academic world, and new organizations like 3IE (the International Initiative for Impact Evaluation) and DIME (the Development Impact Evaluation initiative at the World Bank), are other facets of the same movement. Behind the heated debate on what methods of evaluation to use, we see a much larger point – many more donors now insist on serious EVALUATION and ACCOUNTABILITY than used to do so.

As we’ve said on this blog before, accountability is not something that anyone accepts voluntarily. It is forced on political actors, aid agencies, and NGOs by sheer political power from below, from well-informed advocates for the poor and listening to poor people themselves. All of this may still be in its early stages, but since aid really CANNOT work without serious accountability, the Smart Giving movement is the best news to come along in aid in quite a while.

UPDATE: (3/20, 8:21am) the Center for Global Development reacts to our inclusion of 3IE, which was their brainchild.

Read More & Discuss

Who gets the Last Seat on the Plane? Why Aid Hates Economics

Not long ago, I was returning home from a trip when the airline bumped me from my flight due to overbooking. The airline rep was very sympathetic, but I didn’t want her sympathy, I wanted A Seat On the Plane. She had traded off my wishes against those of other passengers, and I lost. Economists are unpopular because we say there is always SOME resource that is overbooked in aid, and aid is Forced to Choose: who is going to get the Last Seat on the Plane?

Politicians and advocates try to argue their way out of the Scarcity and Tradeoffs, using one or another of these proven strategies:

(1)   There really is no scarcity

This is Sachs’ central argument for more money in aid –you should never be forced to choose who should live and who should die, so you should always ask for more aid money. This has been effective as advocacy, but still doesn’t make aid money an infinite resource – there is still a limit on how much rich people will give. And the scarce resource is not only money – it is also political capital, rich peoples’ attention, or effective and accountable aid workers in the field. So using AIDS as an example, sure you should do some of both treatment and prevention – but how much of each? In the end, they are still competing for limited Seats on the Plane.

(2)   Our project doesn’t use any scarce resources

This argument is usually made by omission. The Millennium Villages don’t advertise that they are dependent on one extremely scarce resource -- Western experts -- perhaps it would then become obvious that they are neither scalable nor sustainable. And of course there is a big tradeoff between the Millennium Villages and better projects you could do with this scarce Western expertise. A better project replaces the scarce foreign expertise very soon with more abundant local expertise and labor – such as training programs to transmit foreign technical skills to locals, who will in turn pass it on to other locals.

(3)   My cause actually is the same as your cause

Advocates of one cause often argue many other causes NEED their cause. If the necessity is absolute, then indeed the tradeoff disappears. If it is less than 100 percent absolute, there is still a tradeoff. Hey, Other Passenger who took my seat: don’t claim that You are so Important that it’s pointless for Me to get on a plane without You! Unless You are the Pilot.

In summary, there really is scarcity and aid really is forced to make intelligent choices. Be sure to give a seat to the pilot.

Read More & Discuss

The coming age of accountability

There was such a great audience yesterday at the Brookings event on What Works in Development. (If you are a glutton for punishment, the full length audio of the event is available on the Brookings web site.) In the end, what struck me was the passion for just having SOME way to KNOW that aid is benefiting the poor, which dwarfed the smaller issue of Randomized Experiment methods vs. other methods.

And extreme dissatisfaction with aid agencies who ignore even the most obvious signs that some aid effort is not working. (Example cited in the Brookings book: a World Bank computer kiosk program in India celebrated as a "success" in its "Empowerment" sourcebook. Except that the computers sat in places without functioning electricty or Internet connections. Critics pointed that out, and yet officials still defended the program as contributing to "Empowerment." Abhijit Banerjee asked "empowered how ? through non-working computers?")

It is awfully hard to get an accountabilty movement going that would have enough political power to force changes on aid agencies, say, away from forever mumbling "empowerment," towards actually making computers light up.

Accountability is not something that anyone accepts voluntarily. It is forced on political actors by sheer political power from below. That's what democratic revolutions are all about. Can we hear a lot more from people in the poor countries protesting bad aid (thank you, Dambisa Moyo) and praising good aid (thank you, Mohammed Yunus)? Can we hear A LOT more from the intended beneficiaries themselves? Can their allies in rich countries help them gain more political leverage with rich country aid agencies?

I don't know yet. But there is a lot more awareness of the accountability problem then there was a decade ago. The dialogues of the blogs on making Haiti disaster aid work is one example.  The size, savvy, and enthusiasm of the audience yesterday was one more small hopeful sign.

Watch out, aid agencies, accountability is coming.

Read More & Discuss

If Martin Luther King had been an aid official -- the Powerpoint version of I Have a Dream

If only Martin Luther King Jr. had been an aid agency official, he would have been able to use Powerpoint and aid terminology to get his main points across more effectively. Using advanced econometric methods, we were able to project the Powerpoint (via PDF) slides that would have resulted (open or save here).

Read More & Discuss

"What works in development" is available again

what_works_in_developmentAfter my whining in a previous post that the great publication "What works in development: Thinking Big and Thinking Small" was not easily available, it is once again available on Amazon. Let me repeat the previous sales pitch: although I am one of the editors, this is not about self-promotion.  The main contribution of the book is to feature a galaxy of academic superstars heatedly debating how or whether to apply randomized evaluation to development projects (see one of our most popular posts of all time:  "The Civil War in Development Economics"). This is the only book anywhere that summarizes the arguments on both sides.

Any interest? Can we bump up its current rating on Amazon from #81,373?

Read More & Discuss

A balanced reaction to "The Civil War in Development Economics"

Aid Watch received the following very thoughtful comment. The author wishes to remain anonymous: The debate in the academic world sounds fascinating!  And it mirrors in some ways the ongoing debates I have within the international development practitioner community, where I work.  Due to my background and current job, I'm the resident RCT "expert" of sorts in my organization and get to have lots of fascinating discussions with program and M&E staff. I see the following pros and cons for randomized evaluation (or RCT's - randomized control trials - as they are often called in the NGO world):

PROS:

- As always, the key idea that you can't attribute causality of impact without a randomly-assigned control group.  Selection bias and other problems affect any other method to varying degrees.

CONS (or rather, arguments for having additional approaches in your evaluator's toolbox):

- RCT's are harder to do for long-run impacts. You either have to leave the control group without the program for 10-20 years, which is an ethical and logistical challenge.  Or you have to rely on some assumptions to add effects together from repeated follow-up surveys. For example if you delayed the start of a program in the "control group" for three years and then did a follow-up survey every three years, then you could add the difference between 3 and 0 years plus the difference between 6 and 3 years plus the difference between 9 and 6 years, etc, but you'd have to assume some stuff like linearity in the effect over time or specific types of interactions with global on-off events?  (I'm still thinking about this whole idea.)

- With a complex or system-wide program, you often can't have a control group, such as if you are working on a national scale. For example, working to change gender injustices in a country's laws. - Context is important and you can't always get that with good background research or a good pilot before an RCT, though you should try.  My organization talks a lot about "mixed methods" - mixed quantitative and qualitative research being a good way to combine the strengths of each.  In fact the RCT that I'm overseeing includes a team of anthropologists. - Qualitative research can also be more responsive if you get unanticipated results that are hard to explain.

So, being a good two-handed economist, I do see both sides now, though I'm still pro-RCT.  It helps that I was at that bastion of qualitative methodology, the American Evaluation Association conference (another AEA!) and heard some good indoctrination on the anti-RCT side.

It's particularly interesting to be at my INGO since much of the organization's work is focused on areas that are tough to evaluate with RCT's including lobbying the U.S. govt; humanitarian relief work (though we have a few staff who want baselines for refugee camps); and many small-scale, long-term, idiosyncratic projects in communities facing severe challenges.

The closest I've come to agreement with people who are anti-RCT is to have all of us agree that it's a great tool in the right circumstances but that it's one of many good tools.  What we always disagree on is whether RCT's are overused (them) or underused (me).  And many people hate the words "gold standard".  It's a red flag.  I use it anyway, as in "RCT's are the gold standard for short-run impact evaluations that you want to be free from selection bias."

I think that the "right circumstances" for RCT's would include important development approaches such as clean water or microcredit that haven't been evaluated yet with RCT's; or big programs that are finally stable in their implementation after an initial period of experimentation and adaptation. Pilots are OK, too, though that is a harder sell; program staff want to be able to get in there and experiment away with what works and what doesn't without worrying about rigorous evaluation.

It'll be interesting to see where these discussions are in 5 or 10 years.

Read More & Discuss

The Civil War in Development Economics

what_works_in_developmentFew people outside academia realize how badly Randomized Evaluation has polarized academic development economists for and against. My little debate with Sachs seems like gentle whispers by comparison. Want to understand what’s got some so upset and others true believers? A conference volume has just come out from Brookings. At first glance, this is your typical sleepy conference volume, currently ranked on Amazon at #201,635.

But attendees at that conference realized that it was a major showdown between the two sides, and now the volume lays out in plain view the case for the prosecution and the case for the defense of Randomized Evaluation.

OK, self-promotion confession, I am one of the editors of the volume, and was one of the organizers of the conference (both with Jessica Cohen). But the stars of the volume are the speakers and commentators: Nava Ashraf (Harvard Business School), Abhijit Banerjee (MIT), Nancy Birdsall (Center for Global Development), Anne Case (Princeton University), Alaka Halla (Innovations for Poverty Action), Ricardo Hausman (Harvard University), Simon Johnson (MIT), Peter Klenow (Stanford University), Michael Kremer (Harvard), Ross Levine (Brown University), Sendhil Mullainathan (Harvard), Ben Olken (MIT), Lant Pritchett (Harvard), Martin Ravallion (World Bank), Dani Rodrik (Harvard), Paul Romer (Stanford University), and David Weil (Brown). Angus Deaton also gave a major luncheon talk at the conference, which was already committed for publication elsewhere. A previous blog discussed his paper.

Here’s an imagined dialogue between the two sides on Randomized Evaluation (RE) based on this book:

FOR: Amazing RE power lets us identify causal effect of project treatment on the treated.

AGAINST: Congrats on finding the effect on a few hundred people under particular circumstances, too bad it doesn’t apply anywhere else.

FOR: No problem, we can replicate RE to make sure effect applies elsewhere.

AGAINST: Like that’s going to happen. Since when is there any academic incentive to replicate already published results? And how do you ever know when you have enough replications of the right kind? You can’t EVER make a generic “X works” statement for any development intervention X. Why don’t you try some theory about why things work?

FOR: We are now moving in the direction of using RE to test theory about why people behave the way they do.

AGAINST: I think we might be converging on that one. But your advertising has not yet got the message, like the JPAL ad on “best buys on the Millennium Development Goals.”

FOR: Well, at least it’s better than your crappy macro regressions that never resolve what causes what, and where even the correlations are suspect because of data mining.

AGAINST: OK, you drew some blood with that one. But you are not so holy on data mining either, because you can pick and choose after the research is finished whatever sub-samples give you results, and there is also publication bias that shows positive results but not zero results.

FOR: OK we admit we shouldn’t do that, and we should enter all REs into a registry including those with no results.

AGAINST: Good luck with that. By the way, even if do you show something “works,” is that enough to get it adopted by politicians and implemented by bureaucrats?

FOR: But voters will want to support politicians who do things that work based on rigorous evidence.

AGAINST: Now you seem naïve about voters as well as politicians. Please be clear: do RE-guided economists know something the local people do not know, or do they have different values on what is good for them? What about tacit knowledge that cannot be tested by RE? Why has RE hardly ever been used for policymaking in developed countries?

FOR: You can take as many potshots as you want, at the end we are producing solid evidence that convinces many people involved in aid.

AGAINST: Well, at least we agree on the on the much larger question of what is not respectable evidence, namely, most of what is currently relied on in development policy discussions. Compared to the evidence-free majority, what unites us is larger than what divides us.

Read More & Discuss

Millennium Villages Comments, We Respond

We received the following comment this morning from the Director of Communications at the Earth Institute, regarding the Aid Watch blog published yesterday, Do Millennium Villages Work? We May Never Know. My response is below.----- It’s unfortunate that the author of this post chose to publish such an uninformed blog on the Millennium Village Project’s monitoring and evaluation activities. She and William Easterly at Aid Watch were invited to meet with our scientists and discuss the science and research behind the Villages and the details of the MVP monitoring and evaluation process before publishing any commentaries. Instead the author hastily chose to publish without talking with MVP researchers. The inaccuracy of the blogpost is a reflection of the lack of rigor and objectivity with which the Aid Watch authors approach this subject time and again.

For readers interested in reading factually accurate information about the Millennium Villages project and its monitoring and evaluation strategy, please see: http://www.millenniumvillages.org/progress/monitoring_evaluation.htm

Erin Trowbridge Director of Communications The Earth Institute

----- Dear Erin,

I had hoped for a different kind of response, one that addressed the specific points made by the piece. Your only comment on content is to say the piece was "uninformed." It would be helpful if you would clarify exactly what you think the piece got wrong, and offer what you view as the correct information to replace it. I would be happy to post such a response on Aid Watch.

Your comment is in any case an inaccurate characterization of our interaction over the past two months. I sent seven separate emails to you and one to CEO John McArthur beginning in mid-August, asking for information on the overall MV evaluation strategy, and eventually asking specifically for an explanation of how the thinking of the team had evolved from 2006 (when Jeff Sachs said there were no controls) to 2009 (when we were informed that there are comparison villages for 10 MV sites). Your responses were represented fairly in the blog post that Aid Watch published yesterday. We expressed willingness to meet with the research scientists after you offered this; it is unfortunate that we were unable to find a mutually convenient time to meet before our publication deadline, which we had already postponed several times.

Thank you for sharing further details of the MVP evaluation process with the information that has now appeared on the link you provided. Interested readers can now independently judge for themselves the merits and demerits of the ongoing MVP evaluation.

Frequent readers here may tire of hearing it, but it is our belief that greater transparency and a greater willingness on the part of donors and aid practitioners to share information with supporters and skeptics alike will make aid better.

Laura

Laura Freschi Associate Director Development Research Institute

Read More & Discuss

Can We Push for Higher Expectations in Evaluation? The Millennium Villages Project, continued

There's been some good discussion—here in the comments of yesterday’s post and on other blogs—on the Millennium Villages and what sort of evaluation standard they can (realistically) and should (ideally) be held to. Yesterday on Aid Thoughts, Matt was distressed that over 70 percent of the student body at Carleton University voted in a tuition hike—$6 per student, per year, every year—to fund the Millennium Villages. The students apparently concluded that the MV program "offers the most effective approach for providing people with the tools they need to lift themselves out of extreme poverty. It is also the only approach that, if scaled, would see countless lives saved, an African Green Revolution and, amongst other things, every child in primary school."

How is it that students are coming away with that glowing impression from a project that—as Matt points out—has yet provided little evidence that its benefits are scalable, sustainable, persistent or transferable?

Focusing in on results published on one specific MVP intervention, blogger Naman Shah pointed us to his analysis of the MVP scientific team’s early malaria results.  The project's claims to have reduced malaria prevalence were “disproportionate to the evidence (especially given the bias of self-evaluation)” and suffered from some “bad science” like a failure to discuss pre-intervention malaria trends.

Chris Blattman stepped into the wider debate to offer an evaluator's reality check, questioning whether rigorous evaluation of the MVP is feasible.  Chris said:

[T]here are other paths to learning what works and why. I’m willing to bet there is a lot of trial-and-error learning in the MVs that could be shared. If they’re writing up these findings, I haven’t seen them. I suspect they could do a much better job, and I suspect they agree. But we shouldn’t hold them to evaluation goals that, from the outset, are bound to fail.

But if the industry standard of best practice is moving towards funding interventions that are measurable and proven to work, why is the MVP encouraging the international community to shift limited aid resources towards a highly-visible project apparently designed so that it can't be rigorously evaluated?

Fact is, none of us know exactly what kind of evaluation the Millennium Villages Project is doing, or the reasoning behind why they’re doing what they’re doing, since they haven't yet shared it in any detail.  Perhaps someone at the MVP will respond to our request to weigh in.

Read More & Discuss

Do Millennium Villages work? We may never know

Jeffrey Sachs’ Millennium Villages Project has to date unleashed an array of life-saving interventions in health, education, agriculture, and infrastructure in 80 villages throughout ten African countries. The goal of this project is nothing less than to “show what success looks like.” With a five-year budget of $120 million, the MVP is billed as a development experiment on a grand scale, a giant pilot project that could revolutionize the way development aid is done.

But are they a success? To address that question, we need to know: What kind of data is being collected? What kinds of questions are being asked? Three years into the start of one of the highest-profile development experiments ever, who’s watching the MVPs?

The most comprehensive evaluation of the project published so far is a review by the Overseas Development Institute, a large UK-based think tank. The review covered two out of four sectors, in four out of ten countries, with data collected in the MVs only, not in control villages. The report’s authors cautioned that “the review team was not tasked and not well placed to assess rigorously the effectiveness and efficiency of individual interventions as it was premature and beyond the means of the review.”

Despite this, a Millennium Villages blog entry on Mali says, “With existing villages showing ‘remarkable results,’ several countries have developed bold plans to scale up the successful interventions to the national level.” Millennium Promise CEO John McArthur described Sachs’ recent testimony to the Senate Foreign Relations Committee: “Sachs noted the success of the Millennium Villages throughout Africa and the tremendous development gains seen in the project over the past three years.”

The Evaluation that Isn’t?

In contrast, evaluation experts have expressed disappointment in the results they’ve seen from the Millennium Villages Project to date. This isn’t because the MVPs fail to produce impressive outcomes, like a 350 percent increase in maize production in one year (in Mwandama, Malawi), or a 51 percent reduction in malaria cases (in Koraro, Ethiopia). Rather, it has to do with what is—and is not—being measured.

“Given that they’re getting aid on the order of 100 percent of village-level income per capita,” said the Center for Global Development’s Michael Clemens in an email, “we should not be surprised to see a big effect on them right away. I am sure that any analysis would reveal short-term effects of various kinds, on various development indicators in the Millennium Village.” The more important test would be to see if those effects are still there—compared with non-Millennium Villages—a few years after the project is over.

Ted Miguel, head of the Center of Evaluation for Global Action at Berkeley, also said he would “hope to see a randomized impact evaluation, as the obvious, most scientifically rigorous approach, and one that is by now a standard part of the toolkit of most development economists. At a minimum I would have liked to see some sort of comparison group of nearby villages not directly affected by MVP but still subject to any relevant local economic/political ‘shocks,’ or use in a difference-in-differences analysis.” Miguel said: “It is particularly disappointing because such strong claims have been made in the press about the ’success’ of the MVP model even though they haven't generated the rigorous evidence needed to really assess if this is in fact the case.”

An MVP spokesperson told me that they are running a multi-stage household study building on detailed baseline data, the first results from which will be published in 2010. The sample size is 300 households from each of the 14 MV “clusters” of villages (which comprise about 30,000-60,000 people each.) She also said that their evaluation “uses a pair-matched community intervention trial design” and “comparison villages for 10 MV sites.”

But Jeff Sachs noted in a 2006 speech that they were not doing detailed surveying in non-MV sites because—he said— “it’s almost impossible—and ethically not possible—to do an intensive intervention of measurement without interventions of actual process.” A paper the following year went on to explain that not only is there no selection of control villages (randomized or otherwise), there is also no attempt to select interventions for each village randomly in order to isolate the effects of specific interventions, or of certain sequences or combinations of interventions.

CEO John McArthur declined to comment on this apparent contradiction. The MVP spokesperson could say only that the evaluation strategy has evolved, and promised a thorough review of their monitoring and evaluation practices in 2010.

Comparison villages could be selected retroactively, but the MVP has failed to satisfactorily explain how they chose the MVs, saying in documents and in response to our questions only that they were “impoverished hunger hotspots” chosen “in consultation with the national and local governments.” If there was no consistent method used in selecting the original villages (if politics played a role, or if villages were chosen because they were considered more likely to succeed), it would be difficult to choose meaningful comparison villages.

Living in a Resource-Limited World

Imagine that you are a policymaker in a developing country, with limited resources at your disposal. What can you learn from the Millennium Villages? So far, not very much. Evaluations from the MVP give us a picture of how life has changed for the people living in the Millennium Villages, and information about how to best manage and implement the MVP.

Sandra Sequeira, an evaluation expert at London School of Economics, sums up the quandary neatly. “Their premise is that more is always better, i.e. more schools, more clinics, more immunizations, more bed nets. But we don't live in a world of unlimited resources. So the questions we really need to answer are: How much more? Given that we have to make choices, more of what?”

These are tough questions that the Millennium Villages Project will leave unanswered. For a huge pilot project with so much money and support behind it, and one that specifically aims to be exemplary (to “show what success looks like”), this is a disappointment, and a wasted opportunity.

Read More & Discuss

Development Experiments: Ethical? Feasible? Useful?

A new kind of development research in recent years involves experiments: there is a “treatment group” that gets an aid intervention (such as a de-worming drug for school children), and a “control group” that does not. People are assigned randomly to the two groups, so there is no systematic difference between the two groups except the treatment. The difference in outcomes (such as school attendance by those who get deworming vs. those who do not) is a rigorous estimate of the effect of treatment. These Randomized Controlled Trials (RCTs) have been advocated by leading development economists like Esther Duflo and Abhijit Banerjee at MIT and Michael Kremer at Harvard. Others have criticized RCTs. The most prominent critic is the widely respected dean of development research and current President of the American Economics Association, Angus Deaton of Princeton, who released his Keynes lecture on this topic earlier this year, “Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development.” Dani Rodrik and Lant Pritchett have also criticized RCTs.

To drastically oversimplify and boil down the debate, here are some criticisms and responses:

1. What are you doing experimenting on humans?

Denying something beneficial to some people for research purposes seems wildly unethical at first. RCT defenders point out that there is never enough money to treat everyone and drawing lots is a widely recognized method for fairly allocating scarce resources. And when you find something works, it will then be adopted and be beneficial to everyone who participated in the experiment. The same issues arise in medical drug testing, where RCTs are mainly accepted. Still, RCTs can cause hard feelings between treatment and control groups within a community or across communities. Given the concerns of this blog with the human dignity of the poor, the researchers should be at least be careful to communicate to the individuals involved what they are up to and always get their permission.

2. Can you really generalize from one small experiment to conclude that something “works”?

This is the single biggest concern about what RCTs teach us. If you find that using flipcharts in classrooms raises test scores in one experiment, does that mean that aid agencies should buy flipcharts for every school in the world? Context matters – the effect of flipcharts depends on the existing educational level of students and teachers, availability of other educational methods, and about a thousand other things. Plus, implementing something in an experimental setting is a lot easier than having it implemented well on a large scale. Defenders of RCTs say you can run many experiments in many different settings to validate that something “works.” Critics worry about the feeble incentives for academics to do replications, and say we have little idea how many or what kind of replications would be sufficient to establish that something “works.”

3. Can you find out “what works” without a theory to guide you?

Critics say this is the real problem with issue #2. The dream of getting pure evidence without theory is usually unattainable. For example, you need a theory to guide you as to what determines the effect of flipcharts to have any hope of narrowing down the testing and replications to something manageable. The most useful RCT results are those that confirm or reject a theory of human behavior. For example, a general finding across many RCTs in Africa is that demand for free life-saving products collapses once you charge a price for them (even a low subsidized price). This refutes the theory that fully informed people are rationally purchasing low cost medical inputs to improve their health and working capacity. This would usefully lead to further testing of whether the problem is lack of information or the assumption of perfect rationality (the latter is increasingly questioned for rich as well as poor people).

4. Can RCTs be manipulated to get the “right” results?

Yes. One could search among many outcome variables and many slices of the sample for results. One could investigate in advance which settings were more likely to give good results. Of course, scientific ethics prohibit these practices, but they are difficult to enforce. These problems become more severe when the implementing agency has a stake in the outcome of the evaluation, as could happen with an agency whose program will receive more funding when the results are positive.

5. Are RCTs limited to small questions?

Yes. Even if problems 1 through 4 are resolved, RCTs are infeasible for many of the big questions in development, like the economy-wide effects of good institutions or good macroeconomic policies. Some RCT proponents have (rather naively) claimed RCTs could revolutionize social policy, making it dramatically more effective – this claim itself can ironically not be tested with RCTs. Otherwise, embracing RCTs has led development researchers to lower their ambitions. This is probably a GOOD thing in foreign aid, where outsiders cannot hope to induce social transformation anyway and just finding some things that work for poor people is a reasonable outcome. But RCTs are usually less relevant for understanding overall economic development.

Overall, RCTs have contributed positive things to development research, but RCT proponents seem to oversell them and seem to be overly dogmatic about this kind of evidence being superior to all other kinds.

Read More & Discuss