HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce

José M. Giménez-García, Javier David Fernandez Garcia, Miguel A. Martínez-Prieto

Publication: Chapter in book/Conference proceedingContribution to conference proceedings

Abstract

HDT a is binary RDF serialization aiming at minimizing the
space overheads of traditional RDF formats, while providing retrieval
features in compressed space. Several HDT-based applications, such as
the recent Linked Data Fragments proposal, leverage these features for
diverse publication, interchange and consumption purposes. However,
scalability issues emerge in HDT construction because the whole RDF
dataset must be processed in a memory-consuming task. This is hindering the evolution of novel applications and techniques at Web scale. This paper introduces HDT-MR, a MapReduce-based technique to process huge RDF and build the HDT serialization. HDT-MR performs in linear time with the dataset size and has proven able to serialize datasets up to several billion triples, preserving HDT compression and retrieval features.
Original languageEnglish
Title of host publicationExtended Semantic Web Conference (ESWC)
Publication statusPublished - 1 Apr 2015

Austrian Classification of Fields of Science and Technology (ÖFOS)

  • 102015 Information systems
  • 102
  • 502050 Business informatics
  • 102001 Artificial intelligence

Cite this