Abstract
Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata’s infrastructure and automating its population in a sustainable manner through LLMS by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis
shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata.
Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction – requiring (minimal) manual validation – of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. While our study focuses on data from 105 Semantic Web-related conferences, we expect our method to be more generally applicable for enhancing Wikidata’s utility as a comprehensive scholarly resource.
shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata.
Our main contributions include (a) an analysis of ontologies for representing scholarly data to identify gaps and relevant entities/properties in Wikidata, (b) semi-automated extraction – requiring (minimal) manual validation – of conference metadata (e.g., acceptance rates, organizer roles, programme committee members, best paper awards, keynotes, and sponsors) from websites and proceedings texts using LLMs. Finally, we discuss (c) extensions to visualization tools in the Wikidata context for data exploration of the generated scholarly data. While our study focuses on data from 105 Semantic Web-related conferences, we expect our method to be more generally applicable for enhancing Wikidata’s utility as a comprehensive scholarly resource.
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Knowledge Engineering and Knowledge Management |
Untertitel des Sammelwerks | 24th International Conference, EKAW 2024, Amsterdam, The Netherlands, November 26–28, 2024, Proceedings |
Erscheinungsort | Cham |
Verlag | Springer |
Seiten | 243–259 |
ISBN (elektronisch) | 978-3-031-77792-9 |
ISBN (Print) | 978-3-031-77791-2 |
DOIs | |
Publikationsstatus | Veröffentlicht - 12 Sept. 2024 |
Veranstaltung | 24th International Conference on Knowledge Engineering and Knowledge Management - Amsterdam, Niederlande Dauer: 26 Nov. 2024 → 28 Nov. 2024 Konferenznummer: 2024 https://event.cwi.nl/ekaw2024/ |
Publikationsreihe
Reihe | Lecture Notes in Computer Science (LNCS) |
---|---|
Band | 15370 |
ISSN | 0302-9743 |
Konferenz
Konferenz | 24th International Conference on Knowledge Engineering and Knowledge Management |
---|---|
Kurztitel | EKAW |
Land/Gebiet | Niederlande |
Ort | Amsterdam |
Zeitraum | 26/11/24 → 28/11/24 |
Internetadresse |
Österreichische Systematik der Wissenschaftszweige (ÖFOS)
- 102001 Artificial Intelligence
- 102030 Semantische Technologien
- 102028 Knowledge Engineering