Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Classification

Mohamad Al Sayed, Adrian M.P. Bra ̧soveanu, Lyndon J.B. Nixon, Arno Scharl

Publikation: Beitrag in Buch/KonferenzbandBeitrag in Konferenzband

Abstract

Transformer models have achieved state-of-the-art results for news classification tasks, but remain difficult to modify to yield the desired class probabilities in a multi-class setting. Using a neural topic model to create dense topic clusters helps with generating these class probabilities. The presented work uses the BERTopic clustered embeddings model as a preprocessor to eliminate documents that do not belong to any distinct cluster or topic. By combining the resulting embeddings with a Sentence Transformer fine-tuned with SetFit, we obtain a prompt-free framework that demonstrates competitive performance even with few-shot labeled data. Our findings show that incorporating BERTopic in the preprocessing stage leads to a notable improvement in the classification accuracy of news documents. Furthermore, our method outperforms hybrid approaches that combine text and images for news document classification.
OriginalspracheEnglisch
Titel des SammelwerksAdvances in Computational Intelligence
Untertitel des Sammelwerks17th International Work-Conference on Artificial Neural Networks, IWANN 2023, Ponta Delgada, Portugal, June 19–21, 2023, Proceedings, Part I
Herausgeber*innenIgnacio Rojas, Gonzalo Joya, Andreu Catala
ErscheinungsortCham
VerlagSpringer
Seiten162-174
Seitenumfang13
Band1
Auflage1
ISBN (elektronisch)978-3-031-43085-5
ISBN (Print)978-3-031-43084-8
PublikationsstatusVeröffentlicht - 2023
Extern publiziertJa

Publikationsreihe

ReiheLecture Notes in Computer Science (LNCS)
Band14134
ISSN0302-9743
  • Gentio

    Hornik, K. (Projektleitung), Seiler, A. (Kontaktperson für administrative Abwicklung), Polleres, A. (Forscher*in) & Disselbacher-Kollmann, K. (Kontaktperson für administrative Abwicklung)

    1/01/2030/06/23

    Projekt: Forschung

Zitat