The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice

Mathew Gillings, Andrew Hardie

Publikation: Wissenschaftliche FachzeitschriftOriginalbeitrag in FachzeitschriftBegutachtung

Abstract

Topic modelling is a method of statistical data mining of a corpus of documents, popular in the digital humanities and, increasingly, in social sciences. A critical methodological issue is how ‘topics’ (groups of co-selected word types) can be interpreted in analytically meaningful terms. In the current literature, this is typically done by ‘eyeballing’; that is, cursory and largely unsystematic examination of the ‘top’ words in each algorithmically identified word group. We critically evaluate this approach in a dual analysis, comparing the ‘eyeballing’ approach with an alternative using sample close reading across the corpus. We used MALLET to extract two topic models from a test corpus: one with stopwords included, another with stopwords excluded. We then used the aforementioned methods to assign labels to these topics. The results suggest that a close-reading approach is more effective not only in level of detail but even in terms of accuracy. In particular, we found that: assigning labels via eyeballing yields incomplete or incorrect topic labels; removing stopwords drastically affects the analysis outcome; topic labelling and interpretation depend considerably on the analysts’ specialist knowledge; and differences of perspective or construal are unlikely to be captured through a topic model. We conclude that an interpretive paradigm founded in close reading may make topic modelling more appealing to humanities researchers.
OriginalspracheEnglisch
Seiten (von - bis)530–543
Seitenumfang14
FachzeitschriftDigital Scholarship in the Humanities
Jahrgang38
Ausgabenummer2
Frühes Online-Datum22 Dez. 2022
DOIs
PublikationsstatusVeröffentlicht - Juni 2023

Zitat