The search functionality is under construction.

IEICE TRANSACTIONS on Information

Effective Language Representations for Danmaku Comment Classification in Nicovideo

Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI

  • Full Text Views

    1

  • Cite this

Summary :

Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.

Publication
IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.838-846
Publication Date
2023/05/01
Publicized
2023/01/16
Online ISSN
1745-1361
DOI
10.1587/transinf.2022DAP0010
Type of Manuscript
Special Section PAPER (Special Section on Data Engineering and Information Management)
Category

Authors

Hiroyoshi NAGAO
  Doshisha University
Koshiro TAMURA
  Doshisha University
Marie KATSURAI
  Doshisha University

Keyword