Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.
Binggang ZHUO
Tottori University
Masaki MURATA
Tottori University
Qing MA
Ryukoku University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Binggang ZHUO, Masaki MURATA, Qing MA, "Auxiliary Loss for BERT-Based Paragraph Segmentation" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 1, pp. 58-67, January 2023, doi: 10.1587/transinf.2022EDP7083.
Abstract: Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7083/_p
Copy
@ARTICLE{e106-d_1_58,
author={Binggang ZHUO, Masaki MURATA, Qing MA, },
journal={IEICE TRANSACTIONS on Information},
title={Auxiliary Loss for BERT-Based Paragraph Segmentation},
year={2023},
volume={E106-D},
number={1},
pages={58-67},
abstract={Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.},
keywords={},
doi={10.1587/transinf.2022EDP7083},
ISSN={1745-1361},
month={January},}
Copy
TY - JOUR
TI - Auxiliary Loss for BERT-Based Paragraph Segmentation
T2 - IEICE TRANSACTIONS on Information
SP - 58
EP - 67
AU - Binggang ZHUO
AU - Masaki MURATA
AU - Qing MA
PY - 2023
DO - 10.1587/transinf.2022EDP7083
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2023
AB - Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.
ER -