TY - GEN
T1 - From Experts to LLMs: Evaluating the Quality of Automatically Generated Ontologies
AU - Llugiqi Rexha, Majlinda
AU - Ekaputra, Fajar J.
AU - Sabou, Marta
PY - 2025
Y1 - 2025
N2 - Ontologies play a crucial role in knowledge representation, yet their manual construction requires domain expertise and effort. While previous work has focused on using large language models (LLMs) for assessing ontology creation, fully automated ontology generation remains underexplored. As a consequence, most research relies on a limited set of well-known ontologies or knowledge graphs, which constrains the evaluation of various tasks such as link prediction and knowledge graph completion. This highlights the need for diverse ontology benchmarks with varying characteristics, such as number of concepts, hierarchy depth and so on, to effectively evaluate tasks such as link prediction and knowledge graph completion. In this work, we investigate the feasibility of generating ontologies using LLMs and evaluate whether they can produce ontologies of comparable quality to human-built ones. Given a seed set of concepts, a target number of concepts, relations, and maximum hierarchy depth, we employ three different LLMs to generate ontologies within the heart disease domain. Defining a seed set of concepts is particularly important for modeling the features of tabular datasets, enabling structured knowledge representation for downstream tasks. We systematically evaluate the generated ontologies by analyzing their structural integrity, semantic coherence, and suitability for downstream tasks. Our results show that while LLM- generated ontologies differ structurally from human-built ones, they remain comparable in semantic similarity and downstream ML performance, with LLaMA-generated ontologies proving to be the most effective. These findings highlight the potential of LLM-generated ontologies not only to support automated knowledge representation but also to enhance ontology benchmarks by introducing diverse structural characteristics, enabling more comprehensive evaluations of machine learning tasks.
AB - Ontologies play a crucial role in knowledge representation, yet their manual construction requires domain expertise and effort. While previous work has focused on using large language models (LLMs) for assessing ontology creation, fully automated ontology generation remains underexplored. As a consequence, most research relies on a limited set of well-known ontologies or knowledge graphs, which constrains the evaluation of various tasks such as link prediction and knowledge graph completion. This highlights the need for diverse ontology benchmarks with varying characteristics, such as number of concepts, hierarchy depth and so on, to effectively evaluate tasks such as link prediction and knowledge graph completion. In this work, we investigate the feasibility of generating ontologies using LLMs and evaluate whether they can produce ontologies of comparable quality to human-built ones. Given a seed set of concepts, a target number of concepts, relations, and maximum hierarchy depth, we employ three different LLMs to generate ontologies within the heart disease domain. Defining a seed set of concepts is particularly important for modeling the features of tabular datasets, enabling structured knowledge representation for downstream tasks. We systematically evaluate the generated ontologies by analyzing their structural integrity, semantic coherence, and suitability for downstream tasks. Our results show that while LLM- generated ontologies differ structurally from human-built ones, they remain comparable in semantic similarity and downstream ML performance, with LLaMA-generated ontologies proving to be the most effective. These findings highlight the potential of LLM-generated ontologies not only to support automated knowledge representation but also to enhance ontology benchmarks by introducing diverse structural characteristics, enabling more comprehensive evaluations of machine learning tasks.
UR - https://ceur-ws.org/Vol-3977/elmke-1.pdf
M3 - Contribution to conference proceedings
VL - 3977
T3 - CEUR Workshop Proceedings
BT - Proceedings of the 2nd International Workshop on Evaluation of Language Models in Knowledge Engineering Co-located with the Extended Semantic Web Conference (ESWC 2025), Portoroz, Slovenia
PB - CEUR Workshop Proceedings
ER -