Open (Science) Data Usage - A linkage infrastructure for reuse and data citation analytics





How is open data actually used? We know surprisingly little about this important question. This gap makes it difficult to evaluate which data sets are successful in terms of reuse, which in turn makes it difficult to define best practices for the curation and dissemination of open data. Information about the actual use of datasets would be valuable to researchers, funding agencies, libraries, statistical agencies - indeed anyone providing or using open data. Given
this need, why don't we know more about how open data is used?

The most significant obstacle to a consistent measurement of dataset usage is the inconsistency in how datasets are referenced. Only in recent years have datasets been given persistent references (i.e. Digital Object Identifiers - DOIs) and dataset citation practices are still heterogeneous across domains. Despite these challenges, the potential value of reliable measurements of dataset usage makes it worthy of investigation.

We therefore propose to develop an infrastructure to link datasets with projects and research that reference them. By focusing on a specific case, research preprints from the arXiv and datasets from a collection of open data repositories, we plan to deliver a proof of
concept by the end of this project, which can be used to apply for additional funding from national or EU sources. The design of the infrastructure will be modular, ensuring that it can be extended in a straightforward way. Finally, the initial infrastructure can be used to test first hypotheses about data sharing best practices.
Tatsächlicher Beginn/ -es Ende1/10/201/07/21