What lurks beneath: Hazards in using data-driven non-parametric approaches in voting studies

Aktivität: VortragWissenschaftlicher Vortrag (Science-to-Science)


Conventional political science is, if at all quantitatively inclined, focused on the formulation of models and finding their traces in data. This paper follows a radically different trajectory. By abandoning the traditional methodological hindrances, data itself can be allowed to shape the outcome of the research process. The tool used in this endeavor bears the names of Regression or Decision Trees (RDT), dependent only on the intended outcome. This makes RDTs a very powerful analytical tool that is not constrained by a priori assumptions or need for parametric estimations. The power of RDTs is being assessed on a standard machine learning benchmarking data set, the Congressional Roll Call Database of 1984, on its suitability for political science questions. As with all techniques, RDTs as well have pitfalls that avid researchers need to steer well clear of. In this paper we will explore common problems social scientists face when analyzing large amounts of quantitative data: deciding on both a correct coding scheme and level of measurements for the values of the variables and a suitable algorithm for the processing of their data. Many times, these questions are regarded as hair splitting and being of a somewhat academic nature. However, wrong choices here can severely distort the outcome of any analysis. In our paper we demonstrate the severity of the distortion by applying different coding schemes and classification algorithms to the roll call data set. Our results demonstrate both the power and utility of exploratory methods like RDTs as well as the importance of these technical questions that have been ignored way too long.
Zeitraum25 Aug. 201128 Aug. 2011
EreignistitelECPR General Conference
VeranstaltungstypKeine Angaben

Österreichische Systematik der Wissenschaftszweige (ÖFOS)

  • 101018 Statistik
  • 102022 Softwareentwicklung