A Novel Label Aggregation with Attenuated Scores for Ground-Truth Identification of Dataset Annotation with Crowdsourcing

Ratchainant THAMMASUDJARIT; Anon PLANGPRASOPCHOK; Charnyote PLUEMPITIWIRIYAWEJ

doi:10.1587/transinf.2016DAP0024

A Novel Label Aggregation with Attenuated Scores for Ground-Truth Identification of Dataset Annotation with Crowdsourcing

Ratchainant THAMMASUDJARIT, Anon PLANGPRASOPCHOK, Charnyote PLUEMPITIWIRIYAWEJ

Full Text Views

0

Cite this

Summary :

Ground-truth identification - the process, which infers the most probable labels, for a certain dataset, from crowdsourcing annotations - is a crucial task to make the dataset usable, e.g., for a supervised learning problem. Nevertheless, the process is challenging because annotations from multiple annotators are inconsistent and noisy. Existing methods require a set of data sample with corresponding ground-truth labels to precisely estimate annotator performance but such samples are difficult to obtain in practice. Moreover, the process requires a post-editing step to validate indefinite labels, which are generally unidentifiable without thoroughly inspecting the whole annotated data. To address the challenges, this paper introduces: 1) Attenuated score (A-score) - an indicator that locally measures annotator performance for segments of annotation sequences, and 2) label aggregation method that applies A-score for ground-truth identification. The experimental results demonstrate that A-score label aggregation outperforms majority vote in all datasets by accurately recovering more labels. It also achieves higher F1 scores than those of the strong baselines in all multi-class data. Additionally, the results suggest that A-score is a promising indicator that helps identifying indefinite labels for the post-editing procedure.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.4 pp.750-757

Publication Date: 2017/04/01

Publicized: 2017/01/17

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016DAP0024

Type of Manuscript: Special Section PAPER (Special Section on Data Engineering and Information Management)

Category

Authors

Ratchainant THAMMASUDJARIT
  Mahidol University
Anon PLANGPRASOPCHOK
  National Electronics and Computer Technology Center
Charnyote PLUEMPITIWIRIYAWEJ
  Mahidol University

Keyword

ground-truth identification, crowdsourcing, label aggregation, attenuation scoring

Cite this

Copy

Ratchainant THAMMASUDJARIT, Anon PLANGPRASOPCHOK, Charnyote PLUEMPITIWIRIYAWEJ, "A Novel Label Aggregation with Attenuated Scores for Ground-Truth Identification of Dataset Annotation with Crowdsourcing" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 4, pp. 750-757, April 2017, doi: 10.1587/transinf.2016DAP0024.
Abstract: Ground-truth identification - the process, which infers the most probable labels, for a certain dataset, from crowdsourcing annotations - is a crucial task to make the dataset usable, e.g., for a supervised learning problem. Nevertheless, the process is challenging because annotations from multiple annotators are inconsistent and noisy. Existing methods require a set of data sample with corresponding ground-truth labels to precisely estimate annotator performance but such samples are difficult to obtain in practice. Moreover, the process requires a post-editing step to validate indefinite labels, which are generally unidentifiable without thoroughly inspecting the whole annotated data. To address the challenges, this paper introduces: 1) Attenuated score (A-score) - an indicator that locally measures annotator performance for segments of annotation sequences, and 2) label aggregation method that applies A-score for ground-truth identification. The experimental results demonstrate that A-score label aggregation outperforms majority vote in all datasets by accurately recovering more labels. It also achieves higher F1 scores than those of the strong baselines in all multi-class data. Additionally, the results suggest that A-score is a promising indicator that helps identifying indefinite labels for the post-editing procedure.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016DAP0024/_p

Copy

@ARTICLE{e100-d_4_750,
author={Ratchainant THAMMASUDJARIT, Anon PLANGPRASOPCHOK, Charnyote PLUEMPITIWIRIYAWEJ, },
journal={IEICE TRANSACTIONS on Information},
title={A Novel Label Aggregation with Attenuated Scores for Ground-Truth Identification of Dataset Annotation with Crowdsourcing},
year={2017},
volume={E100-D},
number={4},
pages={750-757},
abstract={Ground-truth identification - the process, which infers the most probable labels, for a certain dataset, from crowdsourcing annotations - is a crucial task to make the dataset usable, e.g., for a supervised learning problem. Nevertheless, the process is challenging because annotations from multiple annotators are inconsistent and noisy. Existing methods require a set of data sample with corresponding ground-truth labels to precisely estimate annotator performance but such samples are difficult to obtain in practice. Moreover, the process requires a post-editing step to validate indefinite labels, which are generally unidentifiable without thoroughly inspecting the whole annotated data. To address the challenges, this paper introduces: 1) Attenuated score (A-score) - an indicator that locally measures annotator performance for segments of annotation sequences, and 2) label aggregation method that applies A-score for ground-truth identification. The experimental results demonstrate that A-score label aggregation outperforms majority vote in all datasets by accurately recovering more labels. It also achieves higher F1 scores than those of the strong baselines in all multi-class data. Additionally, the results suggest that A-score is a promising indicator that helps identifying indefinite labels for the post-editing procedure.},
keywords={},
doi={10.1587/transinf.2016DAP0024},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - A Novel Label Aggregation with Attenuated Scores for Ground-Truth Identification of Dataset Annotation with Crowdsourcing
T2 - IEICE TRANSACTIONS on Information
SP - 750
EP - 757
AU - Ratchainant THAMMASUDJARIT
AU - Anon PLANGPRASOPCHOK
AU - Charnyote PLUEMPITIWIRIYAWEJ
PY - 2017
DO - 10.1587/transinf.2016DAP0024
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2017
AB - Ground-truth identification - the process, which infers the most probable labels, for a certain dataset, from crowdsourcing annotations - is a crucial task to make the dataset usable, e.g., for a supervised learning problem. Nevertheless, the process is challenging because annotations from multiple annotators are inconsistent and noisy. Existing methods require a set of data sample with corresponding ground-truth labels to precisely estimate annotator performance but such samples are difficult to obtain in practice. Moreover, the process requires a post-editing step to validate indefinite labels, which are generally unidentifiable without thoroughly inspecting the whole annotated data. To address the challenges, this paper introduces: 1) Attenuated score (A-score) - an indicator that locally measures annotator performance for segments of annotation sequences, and 2) label aggregation method that applies A-score for ground-truth identification. The experimental results demonstrate that A-score label aggregation outperforms majority vote in all datasets by accurately recovering more labels. It also achieves higher F1 scores than those of the strong baselines in all multi-class data. Additionally, the results suggest that A-score is a promising indicator that helps identifying indefinite labels for the post-editing procedure.
ER -