Abstract
Topic modelling is a method of statistical data mining of a corpus of documents, popular in the digital humanities and, increasingly, in social sciences. A critical methodological issue is how ‘topics’ (groups of co-selected word types) can be interpreted in analytically meaningful terms. In the current literature, this is typically done by ‘eyeballing’; that is, cursory and largely unsystematic examination of the ‘top’ words in each algorithmically identified word group. We critically evaluate this approach in a dual analysis, comparing the ‘eyeballing’ approach with an alternative using sample close reading across the corpus. We used MALLET to extract two topic models from a test corpus: one with stopwords included, another with stopwords excluded. We then used the aforementioned methods to assign labels to these topics. The results suggest that a close-reading approach is more effective not only in level of detail but even in terms of accuracy. In particular, we found that: assigning labels via eyeballing yields incomplete or incorrect topic labels; removing stopwords drastically affects the analysis outcome; topic labelling and interpretation depend considerably on the analysts’ specialist knowledge; and differences of perspective or construal are unlikely to be captured through a topic model. We conclude that an interpretive paradigm founded in close reading may make topic modelling more appealing to humanities researchers.
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 530–543 |
Seitenumfang | 14 |
Fachzeitschrift | Digital Scholarship in the Humanities |
Jahrgang | 38 |
Ausgabenummer | 2 |
Frühes Online-Datum | 22 Dez. 2022 |
DOIs | |
Publikationsstatus | Veröffentlicht - Juni 2023 |