More Tales of Two Tails

The following post is by Dennis Whittle, co-founder of GlobalGiving. Dennis blogs at Pulling for the Underdog.

An eloquent 3 year-old would have been better asking "What the dickens are you talking about?  Who is defining success?  Who says failure is bad, anyway?" - Joe

Earlier I blogged about aid cheerleaders and critics. Each camp argues about the mean outcome of aid rather than the distribution of impact among projects. Both camps agree that some projects have positive results and others negative. So why not try to figure out which projects work and focus our resources on them?

I got some great and insightful comments and a few nice aid distribution graphs from readers.  Here are some key themes:

  1. The mean *does* matter if the distribution is random. In other words, if we can't predict in advance what types of projects will succeed, we should only spend more resources if the mean outcome is positive.
  2. Many people believe that on average the biggest positive returns come from investment in health projects.
  3. We should also look at the distribution of impact even within successful projects, because even projects that are successful on average can have negative impacts on poorer or more vulnerable people.
  4. Given the difficulty in predicting ex-ante what will work, a lot of experimentation is necessary.  But do we believe that existing evaluation systems provide the feedback loops necessary to shift aid resources toward successful initiatives?
  5. "Joe," the commenter above, argues that in any case traditional evaluators (aid experts) are not in the best position to decide what works and what doesn't.

Petr Jansky sent a paper he is working on with colleagues at Oxford about cocoa farmers in Ghana.  The local trade association was upset that they could not get pervasive adoption of a new package of fertilizer and other inputs designed to increase yields.  According to their models, the benefits to farmers should be very high.  The study found that - on average - that was true, but that the package of inputs has negative returns to farmers with certain types of soil or other constraints.  Farmers with zero or negative returns were simply opting out.

At first glance, these findings seem obvious and trivial.  But they are profound, in at least two ways.  First, retention rates are an implicit and easily observable proxy for net returns to farmers.  We don't need expensive outside evaluations to tell us whether the overall project is working or not.  And second, permitting farmers to decide acknowledges differential impacts on different people even within a single project.

What other ways could we design aid projects to allow the beneficiaries themselves to evaluate the impact and opt in or out depending on the impact for them personally?  And how would it change the life of aid workers if their projects were evaluated not by outside experts and formal analyses but by beneficiaries themselves speaking through the proxy of adoption?

Read More & Discuss

Congressional Muslim Terrorism Hearings: the Mathematical Witness Transcript

UPDATE 11am response to commentator: is there an association between inability to understand Bayes' theorem with ethnic prejudice?UPDATE 3:30PM explaining risk of false positives to congressmen and commentators

Congressman Chairman: Muslims! Terrorists! Muslims! Terrorists!

Witness: Let A be the event of terrorism, and B be the event of Muslimism. Then P(A|B)≠P(B|A)

Congressman: What are you talking about?

Witness: You seem to be confusing the probability that a Muslim person will be a terrorist with the probability that a terrorist person will be a Muslim

Congressman: And you seem to be confusing everyone in this hearing, smartass.

bayes rule
bayes rule

Witness:

Congressman:  What did you just call me?

Witness: it’s simple, the probability that a Muslim will be a terrorist will be 13,000 times lower than the probability that a terrorist will be a Muslim. That is, the ratio of the probability of being a terrorist to the probability of being a Muslim is about 1 over 13,000 (P(A)/P(B)).

Congressman:  so even the math department has been taken over by politically correct academic radicals who hate America?

Witness: even if you think that the Probability of a Terrorist being a Muslim is 95.3%, the probability of a Muslim being a Terrorist is only 0.0007%. That is less than the probability of a left-handed octogenarian Olympic discus-thrower being struck by lightning.

Congressman: or maybe even less than the probability that anyone is listening to you?

Witness: maybe this picture will help.

terrorist muslims
terrorist muslims

Congressman: I’m calling your state legislature right now to fire your radical butt.

POSTCRIPT: response to commentator:

Mr. McKinney, perhaps your prejudices led you to mis-read the piece. 13,000 was how much larger one conditional probability was than another, which is helpful for understanding Bayes' Theorem but not for policy. The policy-relevant probability is that of a Muslim being a terrorist, which based on a Rand report was calculated here as 0.007 percent.

If you still don't get this, then why don't you also start targeting white males, since 80% of serial killers fit that description, and these serial killers kill about 100 people a year.

Regards, Bill Easterly

POSTCRIPT 2 3:30PM

To the Congressman and Mr. McKinney (again):

One other probability you may want to consider is that Al-Qaeda's recruiting will become more successful by a δ >= 0.0007 percent after you have persecuted the 99.9993 percent of Muslims who are innocent.

Read More & Discuss

Solving the education puzzle? Test scores and growth

There has long been a mystery in why the rapid growth of education in poor countries did not pay off in growth of production per worker, above all in Africa (best captured by a classic paper by Lant Pritchett, Where has all the education gone?, ungated here) Eric Hanushek at Stanford has been working for the past several years on test scores as a possible resolution of the puzzle. If education doesn't translate into higher test scores, then there is something else wrong along the way, which likely includes well-known problems like absent teachers and missing textbooks. He showed this picture in a 2008 paper, and he has a stream of papers since, all with coauthor Ludger Woessman.

Growth is growth of income per person 1960-2000. Both growth and test scores are measured "conditionally," that is how well they do relative to a country's initial educational enrollment and income in 1960.

Of course, test scores are a potentially sensitive subject, as some will think they are tests of intrinsic intelligence. Is this whole area of research racist?

Not necessarily, of course. Let's take racist stories of differing intelligence between nations off the table, and consider all the other factors that could be reflected in such widely varying test scores relative to educational enrollment and income.

Read More & Discuss

Hey, fellow committee member, are you the weakest link?

UPDATE: 12:18 PM SEE END OF POST I was just on a committee that selected a small number of papers from a large number of submissions for a conference.  We each graded each paper and then we had to come up with a rule to go from our individual grades to a ranking of the papers to decide which ones got into the conference. So here are some possible rules:

(1) one veto kills the paper

So the overall grade for the paper equals the minimum of all of our grades, so if even just one of us flunks the paper, the paper flunks. You need to satisfy all of us. In econ lingo, you can't SUBSTITUTE one of us with a positive opinion for another one of us with a negative opinion.

ANALOGY: the "weakest link" production function, in which whatever input the economy has least constrains the whole output. Note that zero substitution means that all inputs/committee members are perfect complements. This is the world view of those who like Big Pushes to increase all the development  inputs at once.

(2) simple average

Averaging our grades goes to the other extreme of perfect SUBSTITUTION between us. One of us with a positive opinion cancels out (i.e. substitutes for) another one of us with a negative opinion. We committee members are not complements at all: the value of my grade is not influenced by your grade.

ANALOGY: the old Human Development Index.

Also in production functions relating Development to inputs, this rule  implies extreme flexibility. Rich economies feature this selectively to compensate for weakest links -- if the whole system is going to fail because of one input, then have a backup input that is a perfect substitute.

(3) geometric averages

This exotic animal  (cube root of the 3 grades multiplied together) is in between (1) and (2). You can partially but not completely substitute for one of us with another one of us. So for example if we were just grading A,B,C (numerically 3,2,1), then a paper with the score (2,2,2) has a higher geometric average than a paper with the grades (3,1,2) although they both had the same simple average under 2. We are also partial complements -- the higher is your grade, the stronger is the effect of my grade.

ANALOGY: the new Human Development Index, which an Aid Watch post criticized for TOO MUCH complementarity. The higher was committee member Per Capita Income, the stronger was the effect of another committee member Life Expectancy, which has the unappealing property that we value lives of rich people far more than those of poor people. Makes more sense for production functions than for HDI.

The ending of the actual committee story-- qualitative discussions were necessary for choosing the final papers in the end after constructing the mechanical indexes. Let me see what is the analogy here...

UPDATE: thanks to both of you for reading this wonky post all the way to the end. Do you think I have atoned for that Swimsuit Edition post now? and even the followup Swimsuit Edition post also?

Read More & Discuss

Dear UK Government, why won't you let me retire as Official Sachs Critic?

UPDATE 3: FEB 8 4:50PM: Twitter War reveals that Millennium Village Blog accused Clemens and Demombynes of hard hearts towards suffering (search the blog for "suffering"). UPDATE 2: FEB 8 4:30PM: concluding coverage by @PSIHealthyLives @viewfromthecave of the Great Twitter War prompted by this post between @aidwatch and @earthinstitute, with collateral attacks on @m_clem, ending in a non-acceptance of debate by head of @earthinstitute.

UPDATE Feb 8 11:30am: sent comment with link to this post to DFID Independent Commission for Aid Impact. Spokesperson promptly responded, rejected my comment for consideration on technical grounds, but did warmly invite me to complete the anonymous online mass Survey Monkey. It probably doesn't mean much, unless the Independent Commission already learned the brilliant strategy of bureaucratizing the critics? I do feel a wee bit sorry for one of your Commisioners, the great John Githongo, who presumably did not risk his life so he could be reading anonymous results from Survey Monkey.

Nobody is more tired of the interminable Sachs-Easterly debate than one guy named Easterly...alas, I seem to be stuck in a kind of Critic Trap, in which the ideas criticized keep reappearing unchanged, requiring equally unchanged criticisms, keeping me in chronic peril of taking myself way too seriously.

So it was with great weariness I heard the news that the British aid agency DFID (otherwise probably the best bilateral aid agency) is close to financing a brand new Millennium Village in northern Ghana, near Bolgatanga. I had hoped for something better from the new UK government, which had seemed like an improvement over the Blair and Brown ("We know the answers, just double aid") team .

As it happened, I passed by the proposed MVP site last summer. The proposed villages are right on the main road in one of the most NGO-intensive places anywhere (see the sign below, in which NGOs apparently own the region).

The usual critique that selection bias of the Millennium Villages makes evaluation  impossible may be somewhat relevant given the political realities that (1) the current government chose the villages for the MVP, (2) the incumbents have frequently promised to do more for the North, (3) the MVP came along and may be a high visibility way to keep that promise, and (4) ergo, the government will likely do everything possible to make the project succeed, showing nothing about scalability for thousands of villages elsewhere. In short, this new MV may be about as informative as my feeding my own children is informative on whether child nutrition programs work.

And how good is the track record of the MVP taking evaluation seriously? Michael Clemens and Gabriel Demombynes posted the following on the World Bank Africa blog last Friday:

In a June 2010 report called Harvests of Development, the Project claimed that the impacts of the project included expanded cell phone ownership.  For example, the MVP claimed that increases in cell phone ownership at the Ghana site were caused by the project, in this extract from page 91 of the MVP report:

This claim has little basis, because cell phone ownership has been expanding at about the same rate all around the MVP site in areas untouched by the project. ....

But on Tuesday, months after multiple discussions we’ve had with MVP leaders on our research, a post on the MVP’s blog restated the claim that the increase in mobile phone ownership at the intervention sites was caused by the Project...

{The Clemens and Demombynes paper does the same cell phone analysis with the same results in the MV of Sauri, Kenya.}

They were responding to a blog post on the MVP web site on February 2, 2011 as follows:

Sauri looks back on five years of success

Infrastructure: ... The proportion of households owning a mobile phone has increased four-fold....

In short, independent observers made an irrefutable argument that a claim was invalid, the MVP heard the argument, seemed to accept it, and then repeated the previous claim unchanged.

Or in other words, if nobody is listening to any evaluations anyway, if I am bored and I am boring everyone else, why should I want to be Official Sachs Critic any longer?

Messrs. Clemens and Demombynes, you may want to check out a new job opening...

Read More & Discuss

Development in 3 Sentences

I liked this formulation from the blog The Coming Prosperity, posted today as a link on Twitter:

If solutions are known, need $$. If solutions are knowable, need evaluations. If solutions are evolving, need entrepreneurs.

Consumer Warnings: This comes at the end of a long diatribe against You-Know-Who (associated with $$). I'm not sure the author is a reliable guide to other people's work, since Yours Truly is incorrectly associated with "evaluations." But I still like the 3 sentences above.

Read More & Discuss

Please help us praise Millennium Villages...

UPDATE 4: 3rd nomination for positive. Day 3 of silence from MVP UPDATE 3: another nomination for positive evaluation (Michael Clemens paper), another energetic disavowal by the author (see comments below).  

UPDATE 2: oops, author of only nomination so far says it's not so positive-- see comments

UPDATE: received first nomination of positive review

On Twitter, @bill_easterly noted yesterday's Aid Watch post :

On Millennium Villages: this is not my own predictable response, this is independent guest post

Which immediately got the reply on Twitter:

intentional irony? your guest posts are as "independent" as any MV self-assessment

Aid Watch will let its guest posters defend their own independence, but in the meantime let's find another guest poster that will pass our critic's most stringent independence test. In short...

...could somebody please send us a strongly positive evaluation of the the Millennium Villages.

Our critic rightly notes that self-assessment is not what anybody is looking for, so  the only restriction is that the evaluators of course must not be part of the MV program themselves, i.e. must be independent.

This is not satire. Aid Watch would be very happy to hear from those evaluators of the MVs who have the strongest possible positive portrayal of the results of the MV intervention. We will post summaries of these evaluations without comment on Aid Watch.

UPDATE: received first nomination of a positive review of MVs: an article in Vanity Fair.

Read More & Discuss

What’s it like to live in a Millennium Village?

In Mayange, a cluster of villages about an hour’s drive south of Kigali, Rwanda, interventions by the Millennium Village Project across all sectors (in seeds, fertilizer, malaria nets, health clinics, vaccines, ambulances, water sources, classrooms, computers, cell towers, microloans and lots more) aim to lift villagers out of poverty within five to ten years. What do we know about the effects of such ambitious projects on the lives of the people living in these impoverished, rural communities? A qualitative study by Elisabeth King, a post-doctoral fellow with Columbia’s Earth Institute, produces a fascinating if limited* “snapshot” of social impacts of the Millennium Village Project in Mayange. A few observations:

The villagers King talked to were reluctant to bare all to a foreigner asking questions about delicate social topics. Her questions about quality of life, trust, and social exclusion elicited some contradictory and evasive answers: “Life in Mayange…In general it is not bad, it is not good, it’s in between.” “I know people well. But then, people are private and one only knows one’s own problems.” “There are no problems. But there are always some small problems between people though.”  King explained that in her previous research she found that Rwandans “downplayed negative aspects of social life and tended to embed negative reflections within positive pro-government ‘bookends.’”

MVP has good brand recognition and outreach; cooperatives sometimes increase cooperation. King found that the project was well-known among villagers, and almost all could name a change that had resulted from the project. Most were members of some kind of cooperative (farming, basket-weaving, bee-keeping) created by the project, and some described these groups as strengthening social bonds in the community or increasing women’s confidence by helping them provide income for their families.

Villagers thought that benefits from the project were unevenly distributed. In response to “Who gained the most from the project?” villagers answered most frequently “MV staff,” followed by “local leaders,” and villagers most willing to adopt new practices suggested by the project.

MVP may not be doing so well on the most basic thing – letting people say what THEY want. The most common suggestion was that the project should consult more with people in the community about what they want. One woman told King: “The MV has to meet with local community to learn more about what people really want because sometimes the MV brings things that the community doesn’t need or want. People may have good ideas.”

--

*King’s study is limited in several ways beyond lack of statistical significance (she spoke with 35 individuals and 8 focus groups in a population of 25,000 people). One, as a visiting Westerner asking questions about MVP, she can’t avoid being seen as tied with the project. Whether this makes her interviewees more timid in voicing complaints (for fear of losing some project benefit or subsidy), or more bold (in the hopes of gaining resources to address their troubles) is hard to say. Two, the Rwandan ban on talking about ethnic divisions prevents people from speaking candidly about this obvious issue in a place where resettled genocide survivors and released prisoners now live side by side. Three, King has no baseline data, so she can’t talk about changes in quality of life or social cohesion based on statements from before the project vs. during/after the project (see also: Clemens and Demombynes).

--

Thanks to Michael Clemens for the tweet that sent this study our way.

Read previous posts on the Millennium Villages here.

Read More & Discuss

Census 2010: Voters more Republican, more Texan, Fatter

The exciting Census headlines:  Texas is the big winner in gaining Congressional seats, Texans vote Republican, Republicans win! Except -- the additional Texans are Hispanics, Hispanics vote Democratic, Democrats win! What a nice illustration of a serious problem in development empirics, known by the lusty, sensuous name of "heterogeneous effects."  If  you find handing out free bed nets lowers malaria, that still only applies ON AVERAGE to the group covered by the study. Within this group, the effects are likely heterogeneous behind the average positive effect, and there could be some sub-group for which the effect is zero.  This is analogous to the Texas effect on voting-- on average, being Texan makes you vote Republican, but this is an average of heterogeneous groups, some of whom -- like the burgeoning Hispanics -- vote Democratic.

You could solve this problem by analyzing all the possible sub-groups. Unfortunately, both in politics and in development, this is unlimited, while research budgets and data are limited.

To illustrate imaginative sub-group possibles, my own pathbreaking insight is that one reliable group of Republican voters  is, well, how to be polite about this(!?), are persons with somewhat larger belt sizes. Notice how many of the most brownest, reddest states are Red States, while the Blue State strongholds are in the relatively thinner Northeast.  

Also some sub-group effects could be spurious correlations. During my own struggles against middle-aged spread, I have not noticed any more inclination to vote Republican when my jeans size increases.

If this is all too methodological and obscure for you, then, congratualtions, you are normal.   On the off chance that you are willing to work hard on this stuff, you can get many unexpected lessons. For example, if you want a roly-poly Santa for the office party, ask a Republican.

Read More & Discuss

Human Development Index debate…you still want more?

I suspect that we long ago exhausted the patience of our readers with our multiple rounds of debate on the Human Development Report's new methodology for its Human Development Index. At the same time, I feel an obligation to let the other side of the debate have their say as much as they want. So here is UNDP's new response to Martin Ravallion's response to UNDP's previous response to our original blog criticizing the new Human Development Index, as well, crazy.

Read More & Discuss

Millennium Villages: Moving the goalposts

Here on the blog, we’ve been following the progress of the Millennium Villages Project, a joint effort from the UN and Columbia’s Earth Institute that has introduced a package of development interventions in health, education, agriculture and infrastructure into 14 “clusters” of villages throughout 10 African countries. In response to a critical paper by Michael Clemens and Gabriel Demombynes, the MVP architects published a statement last week that they said would “clarify” some “basic misunderstandings” about the project. This statement caught our attention because—I would argue—what it is actually doing is seeking to reframe the debate about the project, and redefining project success in different, less ambitious terms.

“The primary aim” of the project, the MVP architects write, “is to achieve the Millennium Development Goals in the Project sites, as a contribution to the broader fulfillment of the MDGs (Evaluating the Millennium Villages: A response to Clemens and Demombynes, October 2010, emphasis in the original). Also important, they say, is to clarify what the MVP is not: “The MVP is not testing a rigid protocol for implementing MDG-based outcomes…The MVP is not claiming or aiming to provide a unique or “optimal” model for achieving the MDGs.”

This sounds fine unless you’ve read the many other MVP project reports and documents that clearly outline other, different, major goals and indicators of success.

For example:

So, in this context, what’s even more revealing about this new statement is what it does NOT say. It does not mention that the improvements to the villages will be self-sustaining, or even moving towards self-sustainability by 2015, although that notion was at one point advertised as a “central proposition underpinning the Millennium Villages concept” (MVP FAQ, late 2006). In this case, the clarification seems more like a retrenchment, a moving away from the ambitious claims made at the project’s optimistic outset.

The new MVP definition also backs away from talking about interventions “undertaken as a single integrated project” that will serve as “proof of concept that the poverty trap can be overcome” (as stated in the PNAS paper cited above). In fact the impact of the project as an integrated whole can’t be demonstrated, the MVP authors argue, because some of the same improvements at work in the Millennium Villages (insecticide-treated bednets, subsidized fertilizer and seeds, for example) are also present in many of the surrounding villages.

Before, the project was defined in its own materials as a research experiment (a “proof of concept” carried out first in “research villages”) to prove that a package of development interventions delivered in a particular way can help lift the very poorest people living in rural Africa out of poverty forever. In today’s new formulation, the MVP is a means to show that by spending an amount roughly equal to 100 percent of the village’s per capita income on already “proven” interventions, for a period of 10 years, it can allow that village, for at least one moment in time in 2015, to step across the finish line demarcated by the Millennium Development Goals.

If the project continues to define success in these narrower terms, it will effectively shift the focus away from any obligation to show that the positive things achieved in the Millennium Villages are self-sustaining beyond the 10-year life of the project, or to prove that they are actually a result of the project itself.

POSTSCRIPT:

Screen shot of the top of World Bank's Africa Can...End Poverty Blog last Friday:

That notice was removed; here’s what the same blog tells us, at the bottom of the post, today:

UPDATE: Another view from Chris Blattman.

Read More & Discuss

World according to Blattman

Honoring Stealing from Chris Blattman's great blog, I am reproducing some of his recent posts because they have been unusually fun & good and because I'm just too lazy to write my own blog today. Favorite distorted maps of Africa:

Favorite wordle on which countries are mentioned in Journal of Development Economics shown below.

I'm fascinated by this. One idea that I am investigating in my own research is that success stories are over-sampled, a brilliant thesis for which there is a spectacular lack of confirmation in this wordle (3 out of the Gang of 4 are MIA, what's going on?) One paper already written shows a very strong association between per capita income and being studied by economists, which confirms another favorite personal hypothesis -- the poor get the worst of everything, including the worst economics.

Read More & Discuss

Millennium Villages: don't work, don't know or don't care?

UPDATE 10/16 12:25PM:  Tim Harford in FT also covers Clemens and Demombynes paper and gets response from Sachs. In a new paper, Michael Clemens and Gabriel Demombynes ask:

When is the rigorous impact evaluation of development projects a luxury, and when a necessity?

The authors study the case of the Millennium Villages, a large, high-profile, project originally meant to demonstrate that a package of technology-based interventions in education, health and agriculture could lastingly propel people living in the poorest African villages out of poverty within five (now ten) years.

One way Clemens and Demombynes get at their central question is to examine how the Millennium Villages are (so far) being evaluated, and ask whether a more rigorous method of evaluation would be 1) feasible and 2) likely to yield very different results. They answer 1) yes and 2) yes.

They start by looking at the findings of a Millennium Villages midpoint report released last summer, which shows movement on indicators (higher crop yield, more cell phone usage, fewer cases of malaria, etc.) against a baseline of data collected in those same villages three years prior. In the graph of cell phone ownership in Kenya below, this progress is charted by the black line.

Clemens and Demombynes then put this data in the context of how non-Millennium villages in the same country and region are faring on these same indicators, using publicly available data from national surveys. These are the red, blue, and green lines in the figure below.

What is going on in non-Millennium Villages in Kenya to drive up the number of cell phone users? Conventional wisdom is that it’s driven in very small part by outside aid and in large part by entrepreneurs, small and large. (There are a whole series of these graphs included in the paper, many show more improvement in MVs than in comparators, while a few show worse performance in MVs.)

The paper goes on to describe the weaknesses in the MVP’s published plans for future evaluations, which do involve comparison villages, and suggests how future waves of MVP interventions could be more rigorously evaluated without spending a lot more. (Summary here.)

The MVP team responded to the critique, saying that it “misunderstands the MVP’s aims and evaluation methods.” They shift away from portraying MVP as a demonstration project: the “primary aim” is “to achieve the Millennium Development Goals in the Project sites.”

Read More & Discuss

Top 25 rankings of all time

Today's topic was spurred by some rather unusual college rankings by the Wall Street Journal, in which Texas Tech has a higher rank than Harvard. This has been among the most popular articles on the Online Journal for 3 straight days now. Of course, also very popular are the US News and World Report College Rankings (which give the rather opposite results in which Harvard does slightly better relative to Texas Tech.) We all love rankings.

Talking about ranking methodology, not so much.

(Turns out WSJ rankings were biased towards larger colleges. Texas Tech had undergraduate enrollment of over 22 thousand, compared to less than 7 thousand at Harvard.)

In development, we have all kinds of ranked beauty contests: the Human Development Index (HDI), Doing Business, Governance Indicators, Corruption, etc. etc. I've even been participating in one ranking exercise myself, on, of course, aid agencies.

Just like wacky college rankings, many development rankings are based on methodologies that could use a lot more scrutiny than they usually get. Take the much-loved uber-publicized Human Development Index (HDI). The HDI has a hidden implication pointed out by the World Bank's Martin Ravallion as long ago as 1997. Imagine the reaction to this hidden implication if they publicly announced it:

The UNDP announced today that they consider a human life in the USA to be 70 times more valuable than human life in the Democratic Republic of the Congo.

The other consequence of this hidden implication is that rich countries will get A LOT of credit for higher life expectancy (and not much for income). Scandinavia does very well on the HDI because its life expectancy is higher by 1 or 2 years than the US, even though US income is higher. Bryan Caplan suggests:

Scandinavia comes out on top according to the HDI because the HDI is basically a measure of how Scandinavian your country is.

There's an alternative to constructing murky indexes with dubious assumptions. This would be to be use people's actual choices to infer which places are better. If a college recruiter (the basis of the WSJ rankings) had to choose between otherwise equivalent Texas Tech and Harvard graduates, which one would they pick? Instead of the US News convoluted rankings, why not just ask a student admitted to both Texas Tech and Harvard which one they would pick (or DID pick).  These are called "revealed preference rankings" and have actually been done for colleges.

Instead of Human Development Indexes, why not just ask migrants where they would choose to live (or do actually choose to live): US or Iceland? Iceland has a much higher HDI, but actually has out-migration, whereas I think there are a few people trying pretty hard to get into the US.

Instead of Doing Business index-based rankings, why not use the actual choices of businesses where to invest? These also have the added attraction of not being subject to artificial manipulation (which happens with both college index rankings and Doing Business).

Indexes should not be discarded altogether. Sometimes there are no choices to guide rankings (like aid agencies, sadly). They can also summarize information for those making the choices.

But please double check your methodology.

Read More & Discuss

Beautiful fractals and ugly inequality

UPDATE 4pm: is there any point to this post? see end of text UPDATE II: 4:30pm Critic cuts me some slack. see end of text

UPDATE III 11am, 9/10/10: Paul Krugman says he had the idea first (see end of text)

In our ceaseless search for trendy themes, let's consider today the beauty of fractals. The picture below shows one fascinating kind of fractal called a "Koch snowflake." Fractals have the same amount of "jaggedness" or "unevenness" at every scale. Income inequality behaves like a fractal: income is very uneven at large scales and at small scales. Here's a mapping exercise that illustrates this, with a tastefully chosen color scheme that is consistent across all maps (rich is red or brown-red, poor is pale yellow, in between is orange). We are going to go from global to the US to the New York City metro area to the neighborhood of NYU in Manhattan. At each scale, there is a remarkably high level of inequality across space. The rich coastal cities in the US and the poor rural South. Rich lower and midtown Manhattan and poor South Bronx. Rich West Village and Soho and poor Lower East Side. Inequality is one of the hardest policy problems, so more later. A simpler insight of economics is that the most obvious answer to inequality is exactly wrong -- complete redistribution (i.e. a 100 percent tax on everyone above average to go to everyone below average) would destroy incentives for wealth creation and make everyone worse off.

UPDATE: a commentator on Facebook asked me what the implications are. Reminds of the World Bank research managers, who if you told them it was raining, they would ask "but what are the policy implications?"  Finally a chance for revenge through my favorite Mark Twain quote (from preface to Huckleberry Finn):

Persons attempting to find a motive in this narrative will be prosecuted; persons attempting to find a moral in it will be banished; persons attempting to find a plot in it will be shot. By Order of the Author.

Please be content every now and then to just contemplate how the world is, which is kinda necessary before you immediately try to fix it.

UPDATE II: my Facebook critic and also @viewfromthecave grant permission for me to meditate on this for a while, and just try to convince you IT'S INTERESTING; note that fractal genius Mandelbrot thought cotton prices were interesting.  (If you do want my Comprehensive Solution, see here.)

UPDATE III: Paul Krugman noted this post and self-deprecatingly notes he had the idea first of the fractal nature of inequality. No problem, Professor Krugman, you can have it. As a long-time fan of your theoretical research, could I request that you take some time out from NYT to come up with a nice theory of why inequality behaves fractally?

Read More & Discuss

Is Impact Measurement a Dead End?

This post was written by Alanna Shaikh. Alanna is a global health professional who blogs at UN Dispatch and Blood and Milk. We’ve spent the last few years watching the best donors and NGOs get more and more committed to the idea of measurable impacts. At first, the trend seemed unimpeachable. International donors have spent far too much money with far too few results. Focusing more on impact seemed like the way out of that trap.

But is it? The last couple of weeks have seen a spate of arguments from development thinkers rethinking this premise.

Steve Lawry at the Hauser Center, argues two main points against excessive focus on impact evaluation. The first is that it stifles innovation by keeping NGOs from trying risky new things. But I think that the problem is an institutional culture that doesn’t allow for failure. By allowing NGOs to fail and learn from failure, innovation is encouraged.

His second point is more interesting: “Many real-world problems are not easily described with the kind of precision that professional mathematicians insist upon. This is due to the limitations of data, the costs of collecting and analyzing data, and the inherent difficulties of giving mathematical expression to the complexity of human behavior.” This strikes me as very true. At what point are we expecting too much from our impact assessments?

In the same vein, the fascinating Wanderlust blog just ran a post about Cynefin. Cynefin is a framework for understanding systems. It categorizes systems into four subsets: Simple, Complicated, Complex or Chaotic. Chaotic systems, the author argues, can’t be evaluated for impact using standard measures. He states that “In a Chaotic paradigm, there is relatively little difference likely to occur in quality between a response that is based on three weeks’ worth of detailed analysis and one that is based on the gut reaction of a team leader…”

The Center for Global Development just published a paper by former USAID administrator Andrew Natsios. Natsios points out that USAID has begun to favor health programs over democracy strengthening or governance programs because health programs can be more easily measured for impact. Rule of law efforts, on the other hand, are vital to development but hard to measure and therefore get less funding.

Now we come to the hard questions:

If we limit all of our development projects to those that have easy metrics for success, we lose a lot of programs, many of which support important things like rule of law. Of course, if they don't have useful metrics, how do we know those programs are supporting the important goals?

And how meaningful is impact evaluation anyway when you consider the short time frames we’re working with? Most development programs take ten years or more to show real impact. How are we supposed to bring that in line with government funding cycles?

On the other hand, we don't have a lot of alternatives to impact evaluation. Impact is not unimportant just because it’s hard to quantify at times. We can’t wish that away. Plenty of beautifully designed and carefully implemented projects turned out not to have any effect at all. For example, consider what we’ve learned from microfinance impact evaluations. Microloans have a positive effect but not the one we expected.

It’s a standard trope of this blog to point out that there’s no panacea in global development. That’s true of impact evaluation, too. It’s a tool for identifying worthwhile development efforts, but it is not the only tool.  We can’t go back to assuming that good intentions lead to good results, but there must be room for judgment and experience in with the quantifiable data.

--

UPDATE: This post was edited to correct an attribution error in the third paragraph - Eds.

Read More & Discuss

The World Bank’s “horizontal” approach to health falls horizontal?

The history of foreign aid for global health has seen a cycling back and forth between two alternative approaches. The “vertical” approach focuses on fighting one disease at a time, and in Africa has been very effective in targeting smallpox, Guinea worm, measles, and river blindness, to name a few examples. After large initial successes though, diminishing returns to vertical programs set in. The “horizontal” approach instead invests sector-wide to make health systems work to administer prevention and treatment for all diseases. (For more on the history and pros and cons of these approaches, see Can the West Save Africa, pp 57-60). Since the late 1990s, the Bank and other donors have shifted resources to back the idea that “it’s the health system, stupid.” (According to the Institute for Health Metrics and Evaluation, health sector support shot up from $2 million in 1998 to $937 million in 2007, and surpassed specific funding for TB and malaria for the first time in 2006.)

Strangely enough, whether this resource shift has actually improved health has never really been tested.

Aid Without Impact ACTION reportA new report funded by the Bill and Melinda Gates Foundation found that sector-wide approaches (aka SWAps—the development industry never misses the chance to make a silly acronym) “are not yet being implemented in a way that has led to improvements in health outcomes in effective, efficient, measurable, or sustainable ways.” In other words… SWAps don’t work.

Written by Richard Skolnik, Paul Jensen and Robert Johnson of ACTION (Advocacy to Control TB Internationally), the report looks especially at whether the Bank’s sector-wide programs are associated with success in TB detection and treatment, and concludes with a number of alarming or surprising findings. (We don’t know if the authors have a predisposition towards the vertical approach given their affiliation with advocacy on one disease, but they do seem to ask the right questions.)

First, the authors find little evidence of the impact of SWAps on health outcomes, and what little there is, is mixed at best. The World Bank’s own evaluation picks up on a “general lack of attention to results,” “insufficient attention to ensuring that SWAps are technically sound,” “a general failure to monitor country expenditures,” and “very weak monitoring and evaluation of the health programs that SWAps are supporting.” In the history of SWAps, there has been only one rigorous, independent evaluation, in Tanzania.

Second, only three of the 15 Bank SWAp projects in sub-Saharan Africa from 2001-2008 even included indicators for detection of TB cases and successful treatment of TB. And in only one country (Tanzania), a SWAp “might” be linked to an actual health outcome: higher rates of TB treatment success.

Third, the aid workers and health experts interviewed for the evaluation said that SWAps focus on the process of coordinating aid delivery, which has become an end in itself, obscuring the need to actually increase successful treatment and decrease deaths. NONE of them questioned the need to work through SWAps BUT they almost all agreed there is “little evidence” that SWAps are associated with improved health outcomes.

This suggests to us that it's not only about correctly choosing the right mix of horizontal and vertical but whether ANY approach will work unless it has feedback and accountability. Is this why SWAps were a good idea in theory but a disaster in practice?

What to do? The authors have some suggestions, which are a little hard to believe aren’t already being done as a matter of course: Create incentives to focus on results not the process, drastically increase transparency of project information and evaluation, and do independent program evaluation.

Come to think of it, the donors’ behavior reminds us of Aid Watch’s analogy from Monday. Here, the Bank sends truckloads of money down the same SWAps road, ignoring increasingly obvious and urgent signs that the Bank should change course. But still it hurtles along, unfazed by even its own evaluators shouting from the side of the road that what it’s doing isn’t working.

Read More & Discuss