Fast Gated Recurrent Network for Speech Synthesis

Bima PRIHASTO; Tzu-Chiang TAI; Pao-Chi CHANG; Jia-Ching WANG

doi:10.1587/transinf.2021EDL8032

Fast Gated Recurrent Network for Speech Synthesis

Bima PRIHASTO, Tzu-Chiang TAI, Pao-Chi CHANG, Jia-Ching WANG

Full Text Views

0

Cite this

Summary :

The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.

Publication: IEICE TRANSACTIONS on Information Vol.E105-D No.9 pp.1634-1638

Publication Date: 2022/09/01

Publicized: 2022/06/10

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021EDL8032

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

Bima PRIHASTO
  National Central University
Tzu-Chiang TAI
  Providence University
Pao-Chi CHANG
  National Central University
Jia-Ching WANG
  National Central University

Keyword

speech synthesis, acoustic modelling, gated recurrent neural network, long short-term memory

Cite this

Copy

Bima PRIHASTO, Tzu-Chiang TAI, Pao-Chi CHANG, Jia-Ching WANG, "Fast Gated Recurrent Network for Speech Synthesis" in IEICE TRANSACTIONS on Information, vol. E105-D, no. 9, pp. 1634-1638, September 2022, doi: 10.1587/transinf.2021EDL8032.
Abstract: The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8032/_p

Copy

@ARTICLE{e105-d_9_1634,
author={Bima PRIHASTO, Tzu-Chiang TAI, Pao-Chi CHANG, Jia-Ching WANG, },
journal={IEICE TRANSACTIONS on Information},
title={Fast Gated Recurrent Network for Speech Synthesis},
year={2022},
volume={E105-D},
number={9},
pages={1634-1638},
abstract={The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.},
keywords={},
doi={10.1587/transinf.2021EDL8032},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - Fast Gated Recurrent Network for Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 1634
EP - 1638
AU - Bima PRIHASTO
AU - Tzu-Chiang TAI
AU - Pao-Chi CHANG
AU - Jia-Ching WANG
PY - 2022
DO - 10.1587/transinf.2021EDL8032
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2022
AB - The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.
ER -