Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments

Ryoichi TAKASHIMA; Tetsuya TAKIGUCHI; Yasuo ARIKI

doi:10.1587/transfun.E96.A.1946

Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments

Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI

Full Text Views

0

Cite this

Summary :

This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E96-A No.10 pp.1946-1953

Publication Date: 2013/10/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E96.A.1946

Type of Manuscript: Special Section PAPER (Special Section on Sparsity-aware Signal Processing)

Category

Authors

Ryoichi TAKASHIMA
  Kobe University
Tetsuya TAKIGUCHI
  Kobe University
Yasuo ARIKI
  Kobe University

Keyword

voice conversion, exemplar-based, sparse coding, non-negative matrix factorization, noise robustness

Cite this

Copy

Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI, "Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments" in IEICE TRANSACTIONS on Fundamentals, vol. E96-A, no. 10, pp. 1946-1953, October 2013, doi: 10.1587/transfun.E96.A.1946.
Abstract: This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E96.A.1946/_p

Copy

@ARTICLE{e96-a_10_1946,
author={Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments},
year={2013},
volume={E96-A},
number={10},
pages={1946-1953},
abstract={This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.},
keywords={},
doi={10.1587/transfun.E96.A.1946},
ISSN={1745-1337},
month={October},}

Copy

TY - JOUR
TI - Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1946
EP - 1953
AU - Ryoichi TAKASHIMA
AU - Tetsuya TAKIGUCHI
AU - Yasuo ARIKI
PY - 2013
DO - 10.1587/transfun.E96.A.1946
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E96-A
IS - 10
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - October 2013
AB - This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
ER -