Effective Crowd-Annotation of Participants, Interventions, and Outcomes in the Text of Clinical Trial Reports. Empirical Methods in Natural Language Processing,

Markus Zlabinger, Reka Marta Sabou, Sebastian Hofstätter, Allan Hanbury

Publication: Chapter in book/Conference proceedingContribution to conference proceedings


The search for Participants, Interventions, and Outcomes (PIO) in clinical trial reports is a critical task in Evidence Based Medicine. For an automatic PIO extraction, high-quality corpora are needed. Obtaining such a corpus from
crowdworkers, however, has been shown to be ineffective since (i) workers usually lack domain-specific expertise to conduct the task with sufficient quality, and (ii) the standard approach of annotating entire abstracts of trial reports as one task-instance (i.e. HIT) leads to an uneven distribution in task effort. In this paper, we switch from entire abstract to sentence annotation, referred to as the SEN-BASE approach. We build upon SENBASE in SENSUPPORT, where we compensate the lack of domain-specific expertise of crowdworkers by showing for each task-instance similar sentences that are already annotated by experts. Such tailored task-instance examples are retrieved via unsupervised semantic short-text similarity (SSTS) method – and we evaluate nine methods to find an effective solution for SENSUPPORT. We compute the Cohen’s Kappa agreement between crowd-annotations and gold standard annotations and show that (i) both sentence-based approaches outperform a BASELINE approach where entire abstracts are annotated; (ii) supporting annotators with tailored task-instance examples is the best performing approach with Kappa agreements of 0.78/0.75/0.69 for P, I, and O respectively.
Original languageEnglish
Title of host publicationFindings of ACL: EMNLP 2020
Editors EMNLP 2020
Place of Publicationonline
Pages3064 - 3074
Publication statusPublished - 2020

Austrian Classification of Fields of Science and Technology (ÖFOS)

  • 102
  • 102001 Artificial intelligence
  • 102015 Information systems
  • 102022 Software development

Cite this