Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Ryo MASUMURA; Taichi ASAMI; Takanobu OBA; Sumitaka SAKAUCHI; Akinori ITO

doi:10.1587/transinf.2018EDP7242

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, Sumitaka SAKAUCHI, Akinori ITO

Full Text Views

0

Cite this

Summary :

This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.12 pp.2557-2567

Publication Date: 2019/12/01

Publicized: 2019/09/25

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDP7242

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Ryo MASUMURA
  NTT Corporation
Taichi ASAMI
  NTT Corporation
Takanobu OBA
  NTT Corporation
Sumitaka SAKAUCHI
  NTT Corporation
Akinori ITO
  Tohoku University

Keyword

latent words recurrent neural network language models, n-gram approximation, Viterbi approximation, automatic speech recognition

Cite this

Copy

Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, Sumitaka SAKAUCHI, Akinori ITO, "Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 12, pp. 2557-2567, December 2019, doi: 10.1587/transinf.2018EDP7242.
Abstract: This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7242/_p

Copy

@ARTICLE{e102-d_12_2557,
author={Ryo MASUMURA, Taichi ASAMI, Takanobu OBA, Sumitaka SAKAUCHI, Akinori ITO, },
journal={IEICE TRANSACTIONS on Information},
title={Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition},
year={2019},
volume={E102-D},
number={12},
pages={2557-2567},
abstract={This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.},
keywords={},
doi={10.1587/transinf.2018EDP7242},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2557
EP - 2567
AU - Ryo MASUMURA
AU - Taichi ASAMI
AU - Takanobu OBA
AU - Sumitaka SAKAUCHI
AU - Akinori ITO
PY - 2019
DO - 10.1587/transinf.2018EDP7242
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2019
AB - This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
ER -