Nick Beauchamp (Northeastern University) - Predicting, Extrapolating and Interpolating State-level Polls using Twitter

Presentation Date: 

Wednesday, April 23, 2014

Abstract: Presidential, gubernatorial, and senatorial elections all require state-level polling, but even during presidential campaigns, state-level surveys remain sparse, erratically timed, and entirely neglected in uncompetitive states. Partly in response to these unmet needs in political and other domains, there have been numerous efforts to approximate various survey measures using social media data, but most of these approaches remain distinctly flawed, both methodologically and due to insufficient training data.  To remedy these flaws, this paper combines 1200 state-level polls during the 2012 presidential campaign with over 100 million state-located political Tweets; models the former as a function of the latter using a new linear regularization feature-selection method; and shows via forward-in-time rolling-window out-of-sample testing that, properly modeled, the Twitter textual data tracks polling variation both across states and within states over time, predicting short-term changes in polls with greater accuracy than is possible using past polling data alone. Thus validated, these measures can be extended to unpolled states and, given the density of the Twitter data, potentially to sub-state regions and sub-day timescales.  In addition, an examination of the textual features most strongly associated with changes in surveyed vote intention reveals the topics, events, and concerns associated with the rapidly shifting national debate, making this not just a measurement tool, but also of potential use for real-time campaign strategy.

See also: 2014