Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.
Chenglong MA
Chinese Academy of Sciences
Qingwei ZHAO
Chinese Academy of Sciences
Jielin PAN
Chinese Academy of Sciences
Yonghong YAN
Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN, "Short Text Classification Based on Distributional Representations of Words" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 10, pp. 2562-2565, October 2016, doi: 10.1587/transinf.2016SLL0006.
Abstract: Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016SLL0006/_p
Copy
@ARTICLE{e99-d_10_2562,
author={Chenglong MA, Qingwei ZHAO, Jielin PAN, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={Short Text Classification Based on Distributional Representations of Words},
year={2016},
volume={E99-D},
number={10},
pages={2562-2565},
abstract={Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.},
keywords={},
doi={10.1587/transinf.2016SLL0006},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Short Text Classification Based on Distributional Representations of Words
T2 - IEICE TRANSACTIONS on Information
SP - 2562
EP - 2565
AU - Chenglong MA
AU - Qingwei ZHAO
AU - Jielin PAN
AU - Yonghong YAN
PY - 2016
DO - 10.1587/transinf.2016SLL0006
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2016
AB - Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.
ER -