HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce

José M. Giménez-García, Javier David Fernandez Garcia, Miguel A. Martínez-Prieto

HDT a is binary RDF serialization aiming at minimizing the
space overheads of traditional RDF formats, while providing retrieval
features in compressed space. Several HDT-based applications, such as
the recent Linked Data Fragments proposal, leverage these features for
diverse publication, interchange and consumption purposes. However,
scalability issues emerge in HDT construction because the whole RDF
dataset must be processed in a memory-consuming task. This is hindering the evolution of novel applications and techniques at Web scale. This paper introduces HDT-MR, a MapReduce-based technique to process huge RDF and build the HDT serialization. HDT-MR performs in linear time with the dataset size and has proven able to serialize datasets up to several billion triples, preserving HDT compression and retrieval features.
Titel des SammelwerksExtended Semantic Web Conference (ESWC)
PublikationsstatusVeröffentlicht - 1 Apr. 2015

Österreichische Systematik der Wissenschaftszweige (ÖFOS)

  • 102015 Informationssysteme
  • 102
  • 502050 Wirtschaftsinformatik
  • 102001 Artificial Intelligence