In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy.
Hideko KAWAKUBO
Tokyo Institute of Technology
Marthinus Christoffel DU PLESSIS
The University of Tokyo
Masashi SUGIYAMA
The University of Tokyo
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hideko KAWAKUBO, Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA, "Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 1, pp. 176-186, January 2016, doi: 10.1587/transinf.2015EDP7212.
Abstract: In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7212/_p
Copy
@ARTICLE{e99-d_1_176,
author={Hideko KAWAKUBO, Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance},
year={2016},
volume={E99-D},
number={1},
pages={176-186},
abstract={In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy.},
keywords={},
doi={10.1587/transinf.2015EDP7212},
ISSN={1745-1361},
month={January},}
Copy
TY - JOUR
TI - Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance
T2 - IEICE TRANSACTIONS on Information
SP - 176
EP - 186
AU - Hideko KAWAKUBO
AU - Marthinus Christoffel DU PLESSIS
AU - Masashi SUGIYAMA
PY - 2016
DO - 10.1587/transinf.2015EDP7212
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2016
AB - In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class balance. However, the test class balance is often unknown in practice. In this paper, we consider a semi-supervised learning setup where labeled training samples and unlabeled test samples are available and propose a class balance estimator based on the energy distance. Through experiments, we demonstrate that the proposed method is computationally much more efficient than existing approaches, with comparable accuracy.
ER -