With a previous post on data mining, let’s examine one recent book as a possible candidate for tests of whether data mining could be a problem. Here are the top 10 reasons I chose this book:

10. Oodles of regressions were run

Author each morning

wondering whether, during the previous evening, Pedro, or Anke, or Dominic, or Lisa, or Benedikt, or Marguerite has cracked whatever problem we had crashed into by the time I left for home.

9. Oodles of control variables were tried

...range of possible causes drawn from across the social sciences. In addition to various characteristics of the economy, these include aspects of the country’s history, its geography, its social composition, and its polity.

8. Weird conclusions about war

mountains are dangerous...

7. Sample was sliced up to get results

Globally we find no effect of ethnic polarization. But in Africa ethnic polarization sharply increases risk.

6. Very flexible specifications to get results

If coup risk is high, military spending reduces risk…if coup risk is low, military spending increases risk...

5. Previous results with same methodology didn’t pass the new data test

Our previous results got overturned by the new data.

4. Reverse causality makes every interpretation questionable

I’ll let Nathan Fiala handle this one.

3. Overconfidence in such statistical research as definitive

The ideas in this book are all founded on statistical research.

2. Author previously announced he was data mining:

Table 1 presents the preferred reference model of conflict duration with eight variations. The reference model is reached after a series of iterations in which insignificant variables are deleted and variants of economic, social, geographic and historical explanatory variables are then tested in turn.

1. A lot depends on the results

The book often won’t let Africans vote, but it will let them experience Western military intervention.

From Aid to Equality

Top 10 reasons to test “War, Guns, and Votes” for data mining