Political campaigning has become a multi-million dollar business. A substantial proportion of a campaign's budget is spent on voter mobilization, i.e., on identifying and influencing as many people as possible to vote. Based on data, campaigns use statistical tools to provide a basis for deciding who to target. While the data available is usually rich, campaigns have traditionally relied on a rather limited selection of information, often including only previous voting behavior and one or two demographical variables. Statistical procedures that are currently in use include logistic regression or standard classification tree methods like CHAID, but there is a growing interest in employing modern data mining approaches. Along the lines of this development, we propose a modern framework for voter targeting called LORET (for logistic regression trees) that employs trees (with possibly just a single root node) containing logistic regressions (with possibly just an intercept) in every leaf. Thus, they contain logistic regression and classification trees as special cases and allow for a synthesis of both techniques under one umbrella. We explore various flavors of LORET models that (a) compare the effect of using the full set of available variables against using only limited information and (b) investigate their varying effects either as regressors in the logistic model components or as partitioning variables in the tree components. To assess model performance and illustrate targeting, we apply LORET to a data set of 19,634 eligible voters from the 2004 US presidential election. We find that augmenting the standard set of variables (such as age and voting history) together with additional predictor variables (such as the household composition in terms of party affiliation and each individual's rank in the household) clearly improves predictive accuracy.
|Series||Research Report Series / Department of Statistics and Mathematics|
- Research Report Series / Department of Statistics and Mathematics