The search functionality is under construction.
The search functionality is under construction.

Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique

Tanasan SRIKOTR, Kazunori MANO

  • Full Text Views

    0

  • Cite this

Summary :

The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E105-A No.4 pp.647-654
Publication Date
2022/04/01
Publicized
2021/09/30
Online ISSN
1745-1337
DOI
10.1587/transfun.2021SMP0018
Type of Manuscript
Special Section PAPER (Special Section on Smart Multimedia & Communication Systems)
Category
Speech and Hearing, Digital Signal Processing

Authors

Tanasan SRIKOTR
  Shibaura Institute of Technology
Kazunori MANO
  Shibaura Institute of Technology

Keyword