Projekte pro Jahr
Abstract
Transformer models have achieved state-of-the-art results for news classification tasks, but remain difficult to modify to yield the desired class probabilities in a multi-class setting. Using a neural topic model to create dense topic clusters helps with generating these class probabilities. The presented work uses the BERTopic clustered embeddings model as a preprocessor to eliminate documents that do not belong to any distinct cluster or topic. By combining the resulting embeddings with a Sentence Transformer fine-tuned with SetFit, we obtain a prompt-free framework that demonstrates competitive performance even with few-shot labeled data. Our findings show that incorporating BERTopic in the preprocessing stage leads to a notable improvement in the classification accuracy of news documents. Furthermore, our method outperforms hybrid approaches that combine text and images for news document classification.
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Advances in Computational Intelligence |
Untertitel des Sammelwerks | 17th International Work-Conference on Artificial Neural Networks, IWANN 2023, Ponta Delgada, Portugal, June 19–21, 2023, Proceedings, Part I |
Herausgeber*innen | Ignacio Rojas, Gonzalo Joya, Andreu Catala |
Erscheinungsort | Cham |
Verlag | Springer |
Seiten | 162-174 |
Seitenumfang | 13 |
Band | 1 |
Auflage | 1 |
ISBN (elektronisch) | 978-3-031-43085-5 |
ISBN (Print) | 978-3-031-43084-8 |
Publikationsstatus | Veröffentlicht - 2023 |
Extern publiziert | Ja |
Publikationsreihe
Reihe | Lecture Notes in Computer Science (LNCS) |
---|---|
Band | 14134 |
ISSN | 0302-9743 |
Projekte
- 1 Abgeschlossen
-
Gentio
Hornik, K. (Projektleitung), Seiler, A. (Kontaktperson für administrative Abwicklung), Polleres, A. (Forscher*in) & Disselbacher-Kollmann, K. (Kontaktperson für administrative Abwicklung)
1/01/20 → 30/06/23
Projekt: Forschung