In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F1 scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.
Muhammad Ammar MALIK
Chosun University
Jae Young CHOI
Hankook University of Foreign Study
Moonsoo KANG
Chosun University
Bumshik LEE
Chosun University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Muhammad Ammar MALIK, Jae Young CHOI, Moonsoo KANG, Bumshik LEE, "Improved Majority Filtering Algorithm for Cleaning Class Label Noise in Supervised Learning" in IEICE TRANSACTIONS on Fundamentals,
vol. E102-A, no. 11, pp. 1556-1559, November 2019, doi: 10.1587/transfun.E102.A.1556.
Abstract: In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F1 scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E102.A.1556/_p
Copy
@ARTICLE{e102-a_11_1556,
author={Muhammad Ammar MALIK, Jae Young CHOI, Moonsoo KANG, Bumshik LEE, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Improved Majority Filtering Algorithm for Cleaning Class Label Noise in Supervised Learning},
year={2019},
volume={E102-A},
number={11},
pages={1556-1559},
abstract={In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F1 scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.},
keywords={},
doi={10.1587/transfun.E102.A.1556},
ISSN={1745-1337},
month={November},}
Copy
TY - JOUR
TI - Improved Majority Filtering Algorithm for Cleaning Class Label Noise in Supervised Learning
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1556
EP - 1559
AU - Muhammad Ammar MALIK
AU - Jae Young CHOI
AU - Moonsoo KANG
AU - Bumshik LEE
PY - 2019
DO - 10.1587/transfun.E102.A.1556
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E102-A
IS - 11
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - November 2019
AB - In most supervised learning problems, the labelling quality of datasets plays a paramount role in the learning of high-performance classifiers. The performance of a classifier can significantly be degraded if it is trained with mislabeled data. Therefore, identification of such examples from the dataset is of critical importance. In this study, we proposed an improved majority filtering algorithm, which utilized the ability of a support vector machine in terms of capturing potentially mislabeled examples as support vectors (SVs). The key technical contribution of our work, is that the base (or component) classifiers that construct the ensemble of classifiers are trained using non-SV examples, although at the time of testing, the examples captured as SVs were employed. An example can be tagged as mislabeled if the majority of the base classifiers incorrectly classifies the example. Experimental results confirmed that our algorithm not only showed high-level accuracy with higher F1 scores, for identifying the mislabeled examples, but was also significantly faster than the previous methods.
ER -