Class Prior Estimation from Positive and Unlabeled Data

Marthinus Christoffel DU PLESSIS; Masashi SUGIYAMA

doi:10.1587/transinf.E97.D.1358

Class Prior Estimation from Positive and Unlabeled Data

Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA

Full Text Views

0

Cite this

Summary :

We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.

Publication: IEICE TRANSACTIONS on Information Vol.E97-D No.5 pp.1358-1362

Publication Date: 2014/05/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E97.D.1358

Type of Manuscript: LETTER

Category: Artificial Intelligence, Data Mining

Authors

Marthinus Christoffel DU PLESSIS
Tokyo Institute of Technology
Masashi SUGIYAMA
Tokyo Institute of Technology

Keyword

class-prior change, outlier detection, positive and unlabeled learning, divergence estimation, pearson divergence

Cite this

Copy

Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA, "Class Prior Estimation from Positive and Unlabeled Data" in IEICE TRANSACTIONS on Information, vol. E97-D, no. 5, pp. 1358-1362, May 2014, doi: 10.1587/transinf.E97.D.1358.
Abstract: We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E97.D.1358/_p

Copy

@ARTICLE{e97-d_5_1358,
author={Marthinus Christoffel DU PLESSIS, Masashi SUGIYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Class Prior Estimation from Positive and Unlabeled Data},
year={2014},
volume={E97-D},
number={5},
pages={1358-1362},
abstract={We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.},
keywords={},
doi={10.1587/transinf.E97.D.1358},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Class Prior Estimation from Positive and Unlabeled Data
T2 - IEICE TRANSACTIONS on Information
SP - 1358
EP - 1362
AU - Marthinus Christoffel DU PLESSIS
AU - Masashi SUGIYAMA
PY - 2014
DO - 10.1587/transinf.E97.D.1358
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E97-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2014
AB - We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.
ER -