Last week I highlighted a new article by Arceneaux, Gerber, and Green that suggests that matching methods have difficulty in replicating the experimentally estimated causal effect of a phone-based voter mobilization effort, given a relatively rich set of covariates and a large control pool from which to draw matches. Matching methods have been touted as producing experiment-like estimates from observational data, so this result is kind of disheartening. How might advocates of matching methods respond to this claim?
Let's assume that the results in the paper hold up to further scrutiny (someone should - and I have no doubt will - put this data through the ringer, although hopefully it won't suffer the fate of the NSW dataset). Why should turnout be problematic? Explaining voter turnout has presented quandaries and paradoxes in other branches of political science, so it is hardly surprising that it mucks up the works here. Turnout has been called "the paradox that ate rational choice," due to the great difficulty in finding a plausible model that can justify turnout on instrumental terms. To my mind, the most reasonable (and least interesting) rational choice models of turnout resort to the psychic benefits of voting or "civic duty" - the infamous "D" term - to account for the fairly solid empirical generalization that some people do, in fact, vote. What, exactly, the "D" term represents is something of a mystery, but it seems reasonable that people who feel a duty to go to the polls are also more likely to listen to a phone call urging them to vote, even conditional on things like age, gender, and voting behavior in the previous two elections.
The authors are somewhat pessimistic about the possibility of detecting such problems when researchers do not have an experimental estimate to benchmark their results (and, hence, when matching or some other technique is actually needed). They ask, "How does one know whether matched observations are balanced in terms of the unobserved causes of the dependent variable?" That is indeed the question, but I think that they may be a little too skeptical about the ability to ferret out such problems, especially in this particular context. If the matched data is truly balanced on both the observed and unobserved outcomes, then there should be no difference in expected value of some auxiliary variable (excluded from the matching process) that was observed before the treatment was applied, unless we want to start thinking in terms of reverse temporal causation. The authors could have dropped, say, turnout in 2000 from their matching procedure, matched on the other covariates, and then checked for a difference in the turnout in 2000 between the treatment and control groups in 2002. My guess is that they would find a pretty big difference. Of course, since these matches are not the same as those used in the analysis, any problems that result could be "fixed" by the inclusion of 2000 voter turnout in the matching procedure, but that is putting a lot of weight on one variable.
Even if the prospects for identifying bias due to unobserved covariates are better than Arceneaux, Gerber, and Green suggest, it is not at all apparent that we can do anything about it. In this case, if we knew what "duty" was, we might be able to find covariates that would allow us to satisfy the unconfoundedness constraint. On the other hand, it is not obvious how we would identify those variables from observational studies, since we would likely have similar problems with confoundedness. No one said this was supposed to be easy.
Posted by Mike Kellermann at February 6, 2006 6:00 AM