Generating Category Hierarchy for Classifying Large Corpora

Fumiyo FUKUMOTO; Yoshimi SUZUKI

doi:10.1093/ietisy/e89-d.4.1543

IEICE TRANSACTIONS on Information

Generating Category Hierarchy for Classifying Large Corpora

Fumiyo FUKUMOTO, Yoshimi SUZUKI

Full Text Views

0

Cite this

Summary :

We address the problem of dealing with large collections of data, and investigate the use of automatically constructing domain specific category hierarchies to improve text classification. We use two well-known techniques, the partitioning clustering method called k-means and loss function, to create the category hierarchy. The k-means method involves iterating through the data that the system is permitted to classify during each iteration and construction of a hierarchical structure. In general, the number of clusters k is not given beforehand. Therefore, we used a loss function that measures the degree of disappointment in any differences between the true distribution over inputs and the learner's prediction to select the appropriate number of clusters k. Once the optimal number of k is selected, the procedure is repeated for each cluster. Our evaluation using the 1996 Reuters corpus, which consists of 806,791 documents, showed that automatically constructing hierarchies improves classification accuracy.

Publication: IEICE TRANSACTIONS on Information Vol.E89-D No.4 pp.1543-1554

Publication Date: 2006/04/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e89-d.4.1543

Type of Manuscript: PAPER

Category: Natural Language Processing

Cite this

Copy

Fumiyo FUKUMOTO, Yoshimi SUZUKI, "Generating Category Hierarchy for Classifying Large Corpora" in IEICE TRANSACTIONS on Information, vol. E89-D, no. 4, pp. 1543-1554, April 2006, doi: 10.1093/ietisy/e89-d.4.1543.
Abstract: We address the problem of dealing with large collections of data, and investigate the use of automatically constructing domain specific category hierarchies to improve text classification. We use two well-known techniques, the partitioning clustering method called k-means and loss function, to create the category hierarchy. The k-means method involves iterating through the data that the system is permitted to classify during each iteration and construction of a hierarchical structure. In general, the number of clusters k is not given beforehand. Therefore, we used a loss function that measures the degree of disappointment in any differences between the true distribution over inputs and the learner's prediction to select the appropriate number of clusters k. Once the optimal number of k is selected, the procedure is repeated for each cluster. Our evaluation using the 1996 Reuters corpus, which consists of 806,791 documents, showed that automatically constructing hierarchies improves classification accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.4.1543/_p

Copy

@ARTICLE{e89-d_4_1543,
author={Fumiyo FUKUMOTO, Yoshimi SUZUKI, },
journal={IEICE TRANSACTIONS on Information},
title={Generating Category Hierarchy for Classifying Large Corpora},
year={2006},
volume={E89-D},
number={4},
pages={1543-1554},
abstract={We address the problem of dealing with large collections of data, and investigate the use of automatically constructing domain specific category hierarchies to improve text classification. We use two well-known techniques, the partitioning clustering method called k-means and loss function, to create the category hierarchy. The k-means method involves iterating through the data that the system is permitted to classify during each iteration and construction of a hierarchical structure. In general, the number of clusters k is not given beforehand. Therefore, we used a loss function that measures the degree of disappointment in any differences between the true distribution over inputs and the learner's prediction to select the appropriate number of clusters k. Once the optimal number of k is selected, the procedure is repeated for each cluster. Our evaluation using the 1996 Reuters corpus, which consists of 806,791 documents, showed that automatically constructing hierarchies improves classification accuracy.},
keywords={},
doi={10.1093/ietisy/e89-d.4.1543},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - Generating Category Hierarchy for Classifying Large Corpora
T2 - IEICE TRANSACTIONS on Information
SP - 1543
EP - 1554
AU - Fumiyo FUKUMOTO
AU - Yoshimi SUZUKI
PY - 2006
DO - 10.1093/ietisy/e89-d.4.1543
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2006
AB - We address the problem of dealing with large collections of data, and investigate the use of automatically constructing domain specific category hierarchies to improve text classification. We use two well-known techniques, the partitioning clustering method called k-means and loss function, to create the category hierarchy. The k-means method involves iterating through the data that the system is permitted to classify during each iteration and construction of a hierarchical structure. In general, the number of clusters k is not given beforehand. Therefore, we used a loss function that measures the degree of disappointment in any differences between the true distribution over inputs and the learner's prediction to select the appropriate number of clusters k. Once the optimal number of k is selected, the procedure is repeated for each cluster. Our evaluation using the 1996 Reuters corpus, which consists of 806,791 documents, showed that automatically constructing hierarchies improves classification accuracy.
ER -

IEICE TRANSACTIONS on Information

Generating Category Hierarchy for Classifying Large Corpora

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Generating Category Hierarchy for Classifying Large Corpora

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles