Abstract
HDT a is binary RDF serialization aiming at minimizing the
space overheads of traditional RDF formats, while providing retrieval
features in compressed space. Several HDT-based applications, such as
the recent Linked Data Fragments proposal, leverage these features for
diverse publication, interchange and consumption purposes. However,
scalability issues emerge in HDT construction because the whole RDF
dataset must be processed in a memory-consuming task. This is hindering the evolution of novel applications and techniques at Web scale. This paper introduces HDT-MR, a MapReduce-based technique to process huge RDF and build the HDT serialization. HDT-MR performs in linear time with the dataset size and has proven able to serialize datasets up to several billion triples, preserving HDT compression and retrieval features.
space overheads of traditional RDF formats, while providing retrieval
features in compressed space. Several HDT-based applications, such as
the recent Linked Data Fragments proposal, leverage these features for
diverse publication, interchange and consumption purposes. However,
scalability issues emerge in HDT construction because the whole RDF
dataset must be processed in a memory-consuming task. This is hindering the evolution of novel applications and techniques at Web scale. This paper introduces HDT-MR, a MapReduce-based technique to process huge RDF and build the HDT serialization. HDT-MR performs in linear time with the dataset size and has proven able to serialize datasets up to several billion triples, preserving HDT compression and retrieval features.
Original language | English |
---|---|
Title of host publication | Extended Semantic Web Conference (ESWC) |
Publication status | Published - 1 Apr 2015 |
Austrian Classification of Fields of Science and Technology (ÖFOS)
- 102015 Information systems
- 102
- 502050 Business informatics
- 102001 Artificial intelligence