New Word Detection Using BiLSTM+CRF Model with Features

Jianyong DUAN; Zheng TAN; Mei ZHANG; Hao WANG

doi:10.1587/transinf.2019EDP7330

IEICE TRANSACTIONS on Information

New Word Detection Using BiLSTM+CRF Model with Features

Jianyong DUAN, Zheng TAN, Mei ZHANG, Hao WANG

Full Text Views

0

Cite this

Summary :

With the widespread popularity of a large number of social platforms, an increasing number of new words gradually appear. However, such new words have made some NLP tasks like word segmentation more challenging. Therefore, new word detection is always an important and tough task in NLP. This paper aims to extract new words using the BiLSTM+CRF model which added some features selected by us. These features include word length, part of speech (POS), contextual entropy and degree of word coagulation. Comparing to the traditional new word detection methods, our method can use both the features extracted by the model and the features we select to find new words. Experimental results demonstrate that our model can perform better compared to the benchmark models.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.10 pp.2228-2236

Publication Date: 2020/10/01

Publicized: 2020/07/14

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDP7330

Type of Manuscript: PAPER

Category: Natural Language Processing

Authors

Jianyong DUAN
  North China University of Technology,CNONIX National Standard Application and Promotion Lab
Zheng TAN
  North China University of Technology,CNONIX National Standard Application and Promotion Lab
Mei ZHANG
  North China University of Technology
Hao WANG
  North China University of Technology,CNONIX National Standard Application and Promotion Lab

Keyword

new word detection, BiLSTM

Cite this

Copy

Jianyong DUAN, Zheng TAN, Mei ZHANG, Hao WANG, "New Word Detection Using BiLSTM+CRF Model with Features" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 10, pp. 2228-2236, October 2020, doi: 10.1587/transinf.2019EDP7330.
Abstract: With the widespread popularity of a large number of social platforms, an increasing number of new words gradually appear. However, such new words have made some NLP tasks like word segmentation more challenging. Therefore, new word detection is always an important and tough task in NLP. This paper aims to extract new words using the BiLSTM+CRF model which added some features selected by us. These features include word length, part of speech (POS), contextual entropy and degree of word coagulation. Comparing to the traditional new word detection methods, our method can use both the features extracted by the model and the features we select to find new words. Experimental results demonstrate that our model can perform better compared to the benchmark models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7330/_p

Copy

@ARTICLE{e103-d_10_2228,
author={Jianyong DUAN, Zheng TAN, Mei ZHANG, Hao WANG, },
journal={IEICE TRANSACTIONS on Information},
title={New Word Detection Using BiLSTM+CRF Model with Features},
year={2020},
volume={E103-D},
number={10},
pages={2228-2236},
abstract={With the widespread popularity of a large number of social platforms, an increasing number of new words gradually appear. However, such new words have made some NLP tasks like word segmentation more challenging. Therefore, new word detection is always an important and tough task in NLP. This paper aims to extract new words using the BiLSTM+CRF model which added some features selected by us. These features include word length, part of speech (POS), contextual entropy and degree of word coagulation. Comparing to the traditional new word detection methods, our method can use both the features extracted by the model and the features we select to find new words. Experimental results demonstrate that our model can perform better compared to the benchmark models.},
keywords={},
doi={10.1587/transinf.2019EDP7330},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - New Word Detection Using BiLSTM+CRF Model with Features
T2 - IEICE TRANSACTIONS on Information
SP - 2228
EP - 2236
AU - Jianyong DUAN
AU - Zheng TAN
AU - Mei ZHANG
AU - Hao WANG
PY - 2020
DO - 10.1587/transinf.2019EDP7330
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2020
AB - With the widespread popularity of a large number of social platforms, an increasing number of new words gradually appear. However, such new words have made some NLP tasks like word segmentation more challenging. Therefore, new word detection is always an important and tough task in NLP. This paper aims to extract new words using the BiLSTM+CRF model which added some features selected by us. These features include word length, part of speech (POS), contextual entropy and degree of word coagulation. Comparing to the traditional new word detection methods, our method can use both the features extracted by the model and the features we select to find new words. Experimental results demonstrate that our model can perform better compared to the benchmark models.
ER -

IEICE TRANSACTIONS on Information