The search functionality is under construction.
The search functionality is under construction.

Voice Conversion Using Input-to-Output Highway Networks

Yuki SAITO, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI

  • Full Text Views

    0

  • Cite this

Summary :

This paper proposes Deep Neural Network (DNN)-based Voice Conversion (VC) using input-to-output highway networks. VC is a speech synthesis technique that converts input features into output speech parameters, and DNN-based acoustic models for VC are used to estimate the output speech parameters from the input speech parameters. Given that the input and output are often in the same domain (e.g., cepstrum) in VC, this paper proposes a VC using highway networks connected from the input to output. The acoustic models predict the weighted spectral differentials between the input and output spectral parameters. The architecture not only alleviates over-smoothing effects that degrade speech quality, but also effectively represents the characteristics of spectral parameters. The experimental results demonstrate that the proposed architecture outperforms Feed-Forward neural networks in terms of the speech quality and speaker individuality of the converted speech.

Publication
IEICE TRANSACTIONS on Information Vol.E100-D No.8 pp.1925-1928
Publication Date
2017/08/01
Publicized
2017/04/28
Online ISSN
1745-1361
DOI
10.1587/transinf.2017EDL8034
Type of Manuscript
LETTER
Category
Speech and Hearing

Authors

Yuki SAITO
  The University of Tokyo
Shinnosuke TAKAMICHI
  The University of Tokyo
Hiroshi SARUWATARI
  The University of Tokyo

Keyword