Short Text Classification Based on Distributional Representations of Words

Chenglong MA; Qingwei ZHAO; Jielin PAN; Yonghong YAN

doi:10.1587/transinf.2016SLL0006

Short Text Classification Based on Distributional Representations of Words

Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN

Full Text Views

0

Cite this

Summary :

Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.10 pp.2562-2565

Publication Date: 2016/10/01

Publicized: 2016/07/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016SLL0006

Type of Manuscript: Special Section LETTER (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)

Category: Text classification

Authors

Chenglong MA
  Chinese Academy of Sciences
Qingwei ZHAO
  Chinese Academy of Sciences
Jielin PAN
  Chinese Academy of Sciences
Yonghong YAN
  Chinese Academy of Sciences

Keyword

short text classification, word embedding, gaussian model

Cite this

Copy

Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN, "Short Text Classification Based on Distributional Representations of Words" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 10, pp. 2562-2565, October 2016, doi: 10.1587/transinf.2016SLL0006.
Abstract: Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016SLL0006/_p

Copy

@ARTICLE{e99-d_10_2562,
author={Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={Short Text Classification Based on Distributional Representations of Words},
year={2016},
volume={E99-D},
number={10},
pages={2562-2565},
abstract={Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.},
keywords={},
doi={10.1587/transinf.2016SLL0006},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Short Text Classification Based on Distributional Representations of Words
T2 - IEICE TRANSACTIONS on Information
SP - 2562
EP - 2565
AU - Chenglong MA
AU - Qingwei ZHAO
AU - Jielin PAN
AU - Yonghong YAN
PY - 2016
DO - 10.1587/transinf.2016SLL0006
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2016
AB - Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.
ER -