This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
Ryoichi TAKASHIMA
Kobe University
Tetsuya TAKIGUCHI
Kobe University
Yasuo ARIKI
Kobe University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI, "Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments" in IEICE TRANSACTIONS on Fundamentals,
vol. E96-A, no. 10, pp. 1946-1953, October 2013, doi: 10.1587/transfun.E96.A.1946.
Abstract: This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E96.A.1946/_p
Copy
@ARTICLE{e96-a_10_1946,
author={Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments},
year={2013},
volume={E96-A},
number={10},
pages={1946-1953},
abstract={This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.},
keywords={},
doi={10.1587/transfun.E96.A.1946},
ISSN={1745-1337},
month={October},}
Copy
TY - JOUR
TI - Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1946
EP - 1953
AU - Ryoichi TAKASHIMA
AU - Tetsuya TAKIGUCHI
AU - Yasuo ARIKI
PY - 2013
DO - 10.1587/transfun.E96.A.1946
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E96-A
IS - 10
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - October 2013
AB - This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
ER -