Everything you always wanted to know about blank nodes

Aidan Hogan, Macelo Arenas, Alejandro Mallea, Axel Polleres

Publication: Scientific journalJournal articlepeer-review

69 Downloads (Pure)


In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as "existential variables". We
first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete Information
in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the
W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of
RDF data published on the Web (the BTC-2012 dataset), where we find that 25.7% of unique RDF terms are blank
nodes, that 44.9% of documents and 66.2% of domains featured use of at least one blank node, and that aside from
one Linked Data domain whose RDF data contains many "blank node cycles", the vast majority of blank nodes form
tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data,
we show that 6.1% of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are
isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF
document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and
leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF
graphs are sufficiently "rich" in ground information for problematic cases to be avoided by non-naive algorithms.
Original languageEnglish
Pages (from-to)42 - 69
JournalJournal of Web Semantics
Issue number1
Publication statusPublished - 2014

Cite this