Abstract
Transformer models have achieved state-of-the-art results for news classification tasks, but remain difficult to modify to yield the desired class probabilities in a multi-class setting. Using a neural topic model to create dense topic clusters helps with generating these class probabilities. The presented work uses the BERTopic clustered embeddings model as a preprocessor to eliminate documents that do not belong to any distinct cluster or topic. By combining the resulting embeddings with a Sentence Transformer fine-tuned with SetFit, we obtain a prompt-free framework that demonstrates competitive performance even with few-shot labeled data. Our findings show that incorporating BERTopic in the preprocessing stage leads to a notable improvement in the classification accuracy of news documents. Furthermore, our method outperforms hybrid approaches that combine text and images for news document classification.
| Originalsprache | Englisch |
|---|---|
| Titel des Sammelwerks | Advances in Computational Intelligence |
| Untertitel des Sammelwerks | 17th International Work-Conference on Artificial Neural Networks, IWANN 2023, Ponta Delgada, Portugal, June 19–21, 2023, Proceedings, Part I |
| Herausgeber*innen | Ignacio Rojas, Gonzalo Joya, Andreu Catala |
| Erscheinungsort | Cham |
| Verlag | Springer |
| Seiten | 162-174 |
| Seitenumfang | 13 |
| Band | 1 |
| Auflage | 1 |
| ISBN (elektronisch) | 978-3-031-43085-5 |
| ISBN (Print) | 978-3-031-43084-8 |
| Publikationsstatus | Veröffentlicht - 2023 |
| Extern publiziert | Ja |
Publikationsreihe
| Reihe | Lecture Notes in Computer Science (LNCS) |
|---|---|
| Band | 14134 |
| ISSN | 0302-9743 |
Projekte
- 1 Abgeschlossen
-
Gentio
Hornik, K. (Projektleitung), Seiler, A. (Kontaktperson für administrative Abwicklung), Polleres, A. (Forscher*in) & Disselbacher-Kollmann, K. (Kontaktperson für administrative Abwicklung)
1/01/20 → 30/06/23
Projekt: Forschung
Zitat
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver