TY - JOUR
T1 - ODArchive – Creating an Archive for Structured Data from Open Data Portals
AU - Weber, Thomas
AU - Mitlöhner, Johann
AU - Neumaier, Sebastian
AU - Polleres, Axel
PY - 2020
Y1 - 2020
N2 - We present ODArchive, a large corpus of structured data collected from over 260 Open Data portals worldwide, alongside with curated, integrated metadata. Furthermore we enrich the harvested datasets by heuristic annotations using the type hierarchies in existing Knowledge Graphs. We both (i) present the underlying distributed architecture to scale up regular harvesting and monitoring changes on these portals, and (ii) make the corpus available via different APIs. Moreover, we (iii) analyse the characteristics of tabular data within the corpus. Our APIs can be used to regularly run such analyses or to reproduce experiments from the literature that have worked on static, not publicly available corpora.
AB - We present ODArchive, a large corpus of structured data collected from over 260 Open Data portals worldwide, alongside with curated, integrated metadata. Furthermore we enrich the harvested datasets by heuristic annotations using the type hierarchies in existing Knowledge Graphs. We both (i) present the underlying distributed architecture to scale up regular harvesting and monitoring changes on these portals, and (ii) make the corpus available via different APIs. Moreover, we (iii) analyse the characteristics of tabular data within the corpus. Our APIs can be used to regularly run such analyses or to reproduce experiments from the literature that have worked on static, not publicly available corpora.
U2 - 10.1007/978-3-030-62466-8_20
DO - 10.1007/978-3-030-62466-8_20
M3 - Journal article
SN - 0302-9743
SP - 311
EP - 327
JO - Lecture Notes in Computer Science (LNCS)
JF - Lecture Notes in Computer Science (LNCS)
ER -