Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

Heum PARK; Hyuk-Chul KWON

doi:10.1587/transinf.E92.D.2360

IEICE TRANSACTIONS on Information

Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

Heum PARK, Hyuk-Chul KWON

Full Text Views

0

Cite this

Summary :

This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.

Publication: IEICE TRANSACTIONS on Information Vol.E92-D No.12 pp.2360-2368

Publication Date: 2009/12/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E92.D.2360

Type of Manuscript: Special Section PAPER (Special Section on Natural Language Processing and its Applications)

Category: Document Analysis

Cite this

Copy

Heum PARK, Hyuk-Chul KWON, "Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification" in IEICE TRANSACTIONS on Information, vol. E92-D, no. 12, pp. 2360-2368, December 2009, doi: 10.1587/transinf.E92.D.2360.
Abstract: This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2360/_p

Copy

@ARTICLE{e92-d_12_2360,
author={Heum PARK, Hyuk-Chul KWON, },
journal={IEICE TRANSACTIONS on Information},
title={Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification},
year={2009},
volume={E92-D},
number={12},
pages={2360-2368},
abstract={This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.},
keywords={},
doi={10.1587/transinf.E92.D.2360},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification
T2 - IEICE TRANSACTIONS on Information
SP - 2360
EP - 2368
AU - Heum PARK
AU - Hyuk-Chul KWON
PY - 2009
DO - 10.1587/transinf.E92.D.2360
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.
ER -

IEICE TRANSACTIONS on Information

Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles