Speech recognition is a technique that recognizes words and sentences in audio form and converts them into text sentences. Currently, with the advancement of deep learning technologies, speech recognition has achieved very satisfactory results close to human abilities. However, there are still limitations in identification results such as lack of punctuation, capitalization, and standardized numerical data. Vietnamese also contains local words, homonyms, etc, which make it difficult to read and understand the identification results for users as well as to perform the next tasks in Natural Language Processing (NLP). In this paper, we propose to combine the transformer decoder with conditional random field (CRF) to restore punctuation and capitalization for the Vietnamese automatic speech recognition (ASR) output. By chunking input sentences and merging output sequences, it is possible to handle longer strings with greater accuracy. Experiments show that the method proposed in the Vietnamese post-speech recognition dataset delivers the best results.
Thi Thu HIEN NGUYEN
Thai Nguyen University of Education
Thai BINH NGUYEN
Vietnam Artificial Intelligence System
Ngoc PHUONG PHAM
Vietnam Artificial Intelligence System
Quoc TRUONG DO
Vietnam Artificial Intelligence System
Tu LUC LE
Office of Hanoi People's Committee
Chi MAI LUONG
University of Science and Technology of Hanoi
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Thi Thu HIEN NGUYEN, Thai BINH NGUYEN, Ngoc PHUONG PHAM, Quoc TRUONG DO, Tu LUC LE, Chi MAI LUONG, "Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 8, pp. 1195-1203, August 2021, doi: 10.1587/transinf.2020BDP0005.
Abstract: Speech recognition is a technique that recognizes words and sentences in audio form and converts them into text sentences. Currently, with the advancement of deep learning technologies, speech recognition has achieved very satisfactory results close to human abilities. However, there are still limitations in identification results such as lack of punctuation, capitalization, and standardized numerical data. Vietnamese also contains local words, homonyms, etc, which make it difficult to read and understand the identification results for users as well as to perform the next tasks in Natural Language Processing (NLP). In this paper, we propose to combine the transformer decoder with conditional random field (CRF) to restore punctuation and capitalization for the Vietnamese automatic speech recognition (ASR) output. By chunking input sentences and merging output sequences, it is possible to handle longer strings with greater accuracy. Experiments show that the method proposed in the Vietnamese post-speech recognition dataset delivers the best results.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020BDP0005/_p
Copy
@ARTICLE{e104-d_8_1195,
author={Thi Thu HIEN NGUYEN, Thai BINH NGUYEN, Ngoc PHUONG PHAM, Quoc TRUONG DO, Tu LUC LE, Chi MAI LUONG, },
journal={IEICE TRANSACTIONS on Information},
title={Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text},
year={2021},
volume={E104-D},
number={8},
pages={1195-1203},
abstract={Speech recognition is a technique that recognizes words and sentences in audio form and converts them into text sentences. Currently, with the advancement of deep learning technologies, speech recognition has achieved very satisfactory results close to human abilities. However, there are still limitations in identification results such as lack of punctuation, capitalization, and standardized numerical data. Vietnamese also contains local words, homonyms, etc, which make it difficult to read and understand the identification results for users as well as to perform the next tasks in Natural Language Processing (NLP). In this paper, we propose to combine the transformer decoder with conditional random field (CRF) to restore punctuation and capitalization for the Vietnamese automatic speech recognition (ASR) output. By chunking input sentences and merging output sequences, it is possible to handle longer strings with greater accuracy. Experiments show that the method proposed in the Vietnamese post-speech recognition dataset delivers the best results.},
keywords={},
doi={10.1587/transinf.2020BDP0005},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text
T2 - IEICE TRANSACTIONS on Information
SP - 1195
EP - 1203
AU - Thi Thu HIEN NGUYEN
AU - Thai BINH NGUYEN
AU - Ngoc PHUONG PHAM
AU - Quoc TRUONG DO
AU - Tu LUC LE
AU - Chi MAI LUONG
PY - 2021
DO - 10.1587/transinf.2020BDP0005
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - Speech recognition is a technique that recognizes words and sentences in audio form and converts them into text sentences. Currently, with the advancement of deep learning technologies, speech recognition has achieved very satisfactory results close to human abilities. However, there are still limitations in identification results such as lack of punctuation, capitalization, and standardized numerical data. Vietnamese also contains local words, homonyms, etc, which make it difficult to read and understand the identification results for users as well as to perform the next tasks in Natural Language Processing (NLP). In this paper, we propose to combine the transformer decoder with conditional random field (CRF) to restore punctuation and capitalization for the Vietnamese automatic speech recognition (ASR) output. By chunking input sentences and merging output sequences, it is possible to handle longer strings with greater accuracy. Experiments show that the method proposed in the Vietnamese post-speech recognition dataset delivers the best results.
ER -