With a previous post on data mining, let’s examine one recent book as a possible candidate for tests of whether data mining could be a problem. Here are the top 10 reasons I chose this book:
10. Oodles of regressions were run
Author each morning
wondering whether, during the previous evening, Pedro, or Anke, or Dominic, or Lisa, or Benedikt, or Marguerite has cracked whatever problem we had crashed into by the time I left for home.
9. Oodles of control variables were tried
...range of possible causes drawn from across the social sciences. In addition to various characteristics of the economy, these include aspects of the country’s history, its geography, its social composition, and its polity.
8. Weird conclusions about war
mountains are dangerous...
7. Sample was sliced up to get results
Globally we find no effect of ethnic polarization. But in Africa ethnic polarization sharply increases risk.
6. Very flexible specifications to get results
If coup risk is high, military spending reduces risk…if coup risk is low, military spending increases risk...
5. Previous results with same methodology didn’t pass the new data test
Our previous results got overturned by the new data.
4. Reverse causality makes every interpretation questionable
I’ll let Nathan Fiala handle this one.
3. Overconfidence in such statistical research as definitive
The ideas in this book are all founded on statistical research.
2. Author previously announced he was data mining:
Table 1 presents the preferred reference model of conflict duration with eight variations. The reference model is reached after a series of iterations in which insignificant variables are deleted and variants of economic, social, geographic and historical explanatory variables are then tested in turn.
1. A lot depends on the results
The book often won’t let Africans vote, but it will let them experience Western military intervention.