Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
Peng SONG
Yantai University
Wenming ZHENG
Southeast University
Xinran ZHANG
Southeast University
Yun JIN
Southeast University
Cheng ZHA
Southeast University
Minghai XIN
Southeast University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Peng SONG, Wenming ZHENG, Xinran ZHANG, Yun JIN, Cheng ZHA, Minghai XIN, "A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion" in IEICE TRANSACTIONS on Fundamentals,
vol. E98-A, no. 10, pp. 2178-2181, October 2015, doi: 10.1587/transfun.E98.A.2178.
Abstract: Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E98.A.2178/_p
Copy
@ARTICLE{e98-a_10_2178,
author={Peng SONG, Wenming ZHENG, Xinran ZHANG, Yun JIN, Cheng ZHA, Minghai XIN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion},
year={2015},
volume={E98-A},
number={10},
pages={2178-2181},
abstract={Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.},
keywords={},
doi={10.1587/transfun.E98.A.2178},
ISSN={1745-1337},
month={October},}
Copy
TY - JOUR
TI - A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2178
EP - 2181
AU - Peng SONG
AU - Wenming ZHENG
AU - Xinran ZHANG
AU - Yun JIN
AU - Cheng ZHA
AU - Minghai XIN
PY - 2015
DO - 10.1587/transfun.E98.A.2178
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E98-A
IS - 10
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - October 2015
AB - Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
ER -