LOD-a-lot: A single-file enabler for data science

Wouter Beek, Javier David Fernandez Garcia, Ruben Verborgh

Publication: Chapter in book/Conference proceedingContribution to conference proceedings

Abstract

Many data scientists make use of Linked Open Data (LOD) as a huge interconnected knowledge base represented in RDF. However, the distributed nature of the information and the lack of a scalable approach to manage and consume such Big Semantic Data makes it difficult and expensive to conduct large-scale studies. As a consequence, most scientists restrict their analyses to one or two datasets (often DBpedia) that contain at most hundreds of millions of triples.
LOD-a-lot is a dataset that integrates a large portion (over 28 billion triples) of the LOD Cloud into a single ready-to-consume file that can be easily downloaded, shared and queried with a small memory footprint. This paper shows there exists a wide collection of Data Science use cases that can be performed over such a LOD-a-lot file. For these use cases LOD-a-lot significantly reduces the cost and complexity of conducting Data Science.
Original languageEnglish
Title of host publicationInternational Conference on Semantic Systems
Editors Rinke Hoekstra and Catherine Faron-Zucker and Tassilo Pellegrini and Victor de Boer
Place of PublicationAmsterdam
PublisherACM Press
Pages181 - 184
DOIs
Publication statusPublished - 2017

Cite this