Projects per year
Abstract
Transformer models have achieved state-of-the-art results for news classification tasks, but remain difficult to modify to yield the desired class probabilities in a multi-class setting. Using a neural topic model to create dense topic clusters helps with generating these class probabilities. The presented work uses the BERTopic clustered embeddings model as a preprocessor to eliminate documents that do not belong to any distinct cluster or topic. By combining the resulting embeddings with a Sentence Transformer fine-tuned with SetFit, we obtain a prompt-free framework that demonstrates competitive performance even with few-shot labeled data. Our findings show that incorporating BERTopic in the preprocessing stage leads to a notable improvement in the classification accuracy of news documents. Furthermore, our method outperforms hybrid approaches that combine text and images for news document classification.
Original language | English |
---|---|
Title of host publication | Advances in Computational Intelligence |
Subtitle of host publication | 17th International Work-Conference on Artificial Neural Networks, IWANN 2023, Ponta Delgada, Portugal, June 19–21, 2023, Proceedings, Part I |
Editors | Ignacio Rojas, Gonzalo Joya, Andreu Catala |
Place of Publication | Cham |
Publisher | Springer |
Pages | 162-174 |
Number of pages | 13 |
Volume | 1 |
Edition | 1 |
ISBN (Electronic) | 978-3-031-43085-5 |
ISBN (Print) | 978-3-031-43084-8 |
Publication status | Published - 2023 |
Externally published | Yes |
Publication series
Series | Lecture Notes in Computer Science (LNCS) |
---|---|
Volume | 14134 |
ISSN | 0302-9743 |
Projects
- 1 Finished
-
Gentio
Hornik, K. (PI - Project head), Seiler, A. (Contact person for administrative matters), Polleres, A. (Researcher) & Disselbacher-Kollmann, K. (Contact person for administrative matters)
1/01/20 → 30/06/23
Project: Research