Effective Language Representations for Danmaku Comment Classification in Nicovideo

Hiroyoshi NAGAO; Koshiro TAMURA; Marie KATSURAI

doi:10.1587/transinf.2022DAP0010

IEICE TRANSACTIONS on Information

Effective Language Representations for Danmaku Comment Classification in Nicovideo

Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI

Full Text Views

1

Cite this

Summary :

Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.

Publication: IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.838-846

Publication Date: 2023/05/01

Publicized: 2023/01/16

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022DAP0010

Type of Manuscript: Special Section PAPER (Special Section on Data Engineering and Information Management)

Category

Authors

Hiroyoshi NAGAO
  Doshisha University
Koshiro TAMURA
  Doshisha University
Marie KATSURAI
  Doshisha University

Keyword

comment classification, Danmaku, BERT, Nicovideo

Cite this

Copy

Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI, "Effective Language Representations for Danmaku Comment Classification in Nicovideo" in IEICE TRANSACTIONS on Information, vol. E106-D, no. 5, pp. 838-846, May 2023, doi: 10.1587/transinf.2022DAP0010.
Abstract: Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022DAP0010/_p

Copy

@ARTICLE{e106-d_5_838,
author={Hiroyoshi NAGAO, Koshiro TAMURA, Marie KATSURAI, },
journal={IEICE TRANSACTIONS on Information},
title={Effective Language Representations for Danmaku Comment Classification in Nicovideo},
year={2023},
volume={E106-D},
number={5},
pages={838-846},
abstract={Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.},
keywords={},
doi={10.1587/transinf.2022DAP0010},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Effective Language Representations for Danmaku Comment Classification in Nicovideo
T2 - IEICE TRANSACTIONS on Information
SP - 838
EP - 846
AU - Hiroyoshi NAGAO
AU - Koshiro TAMURA
AU - Marie KATSURAI
PY - 2023
DO - 10.1587/transinf.2022DAP0010
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.
ER -

IEICE TRANSACTIONS on Information