Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

Toru NAKASHIKA; Tetsuya TAKIGUCHI; Yasuo ARIKI

doi:10.1587/transinf.E97.D.1403

IEICE TRANSACTIONS on Information

Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

Toru NAKASHIKA, Tetsuya TAKIGUCHI, Yasuo ARIKI

Full Text Views

0

Cite this

Summary :

This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.

Publication: IEICE TRANSACTIONS on Information Vol.E97-D No.6 pp.1403-1410

Publication Date: 2014/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E97.D.1403

Type of Manuscript: Special Section PAPER (Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application)

Category: Voice Conversion and Speech Enhancement

Authors

Toru NAKASHIKA
  Kobe University
Tetsuya TAKIGUCHI
  Kobe University
Yasuo ARIKI
  Kobe University

Keyword

voice conversion, restricted Boltzmann machine, deep learning, speaker individuality

Cite this

Copy

Toru NAKASHIKA, Tetsuya TAKIGUCHI, Yasuo ARIKI, "Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines" in IEICE TRANSACTIONS on Information, vol. E97-D, no. 6, pp. 1403-1410, June 2014, doi: 10.1587/transinf.E97.D.1403.
Abstract: This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E97.D.1403/_p

Copy

@ARTICLE{e97-d_6_1403,
author={Toru NAKASHIKA, Tetsuya TAKIGUCHI, Yasuo ARIKI, },
journal={IEICE TRANSACTIONS on Information},
title={Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines},
year={2014},
volume={E97-D},
number={6},
pages={1403-1410},
abstract={This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.},
keywords={},
doi={10.1587/transinf.E97.D.1403},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
T2 - IEICE TRANSACTIONS on Information
SP - 1403
EP - 1410
AU - Toru NAKASHIKA
AU - Tetsuya TAKIGUCHI
AU - Yasuo ARIKI
PY - 2014
DO - 10.1587/transinf.E97.D.1403
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E97-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2014
AB - This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.
ER -

IEICE TRANSACTIONS on Information