IEICE global.ieice.org Site

The search functionality is under construction.

The search functionality is under construction.

Author Search Result

[Author] Hyunyoung LEE(1hit)

1-1hit

Contextualized Character Embedding with Multi-Sequence LSTM for Automatic Word Segmentation
Hyunyoung LEE Seungshik KANG

PAPER-Natural Language Processing

Pubricized:
2020/08/19
Vol:
E103-D No:11
Page(s):
2371-2378
Contextual information is a crucial factor in natural language processing tasks such as sequence labeling. Previous studies on contextualized embedding and word embedding have explored the context of word-level tokens in order to obtain useful features of languages. However, unlike it is the case in English, the fundamental task in East Asian languages is related to character-level tokens. In this paper, we propose a contextualized character embedding method using n-gram multi-sequences information with long short-term memory (LSTM). It is hypothesized that contextualized embeddings on multi-sequences in the task help each other deal with long-term contextual information such as the notion of spans and boundaries of segmentation. The analysis shows that the contextualized embedding of bigram character sequences encodes well the notion of spans and boundaries for word segmentation rather than that of unigram character sequences. We find out that the combination of contextualized embeddings from both unigram and bigram character sequences at output layer rather than the input layer of LSTMs improves the performance of word segmentation. The comparison showed that our proposed method outperforms the previous models.

Latest Issue

English

Links

Call for Papers

Call for Papers

Special Section

Submit to IEICE Trans.

Submit to IEICE Trans.

Information for Authors

Transactions NEWS

Transactions NEWS

Popular articles

Popular articles

Top 10 Downloads