A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion

Peng SONG; Wenming ZHENG; Xinran ZHANG; Yun JIN; Cheng ZHA; Minghai XIN

doi:10.1587/transfun.E98.A.2178

IEICE TRANSACTIONS on Fundamentals

A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion

Peng SONG, Wenming ZHENG, Xinran ZHANG, Yun JIN, Cheng ZHA, Minghai XIN

Full Text Views

0

Cite this

Summary :

Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E98-A No.10 pp.2178-2181

Publication Date: 2015/10/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E98.A.2178

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

Peng SONG
  Yantai University
Wenming ZHENG
  Southeast University
Xinran ZHANG
  Southeast University
Yun JIN
  Southeast University
Cheng ZHA
  Southeast University
Minghai XIN
  Southeast University

Keyword

non-parallel speech, voice conversion, iterative speaker model alignment, Gaussian mixture model

Cite this

Copy

Peng SONG, Wenming ZHENG, Xinran ZHANG, Yun JIN, Cheng ZHA, Minghai XIN, "A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion" in IEICE TRANSACTIONS on Fundamentals, vol. E98-A, no. 10, pp. 2178-2181, October 2015, doi: 10.1587/transfun.E98.A.2178.
Abstract: Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E98.A.2178/_p

Copy

@ARTICLE{e98-a_10_2178,
author={Peng SONG, Wenming ZHENG, Xinran ZHANG, Yun JIN, Cheng ZHA, Minghai XIN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion},
year={2015},
volume={E98-A},
number={10},
pages={2178-2181},
abstract={Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.},
keywords={},
doi={10.1587/transfun.E98.A.2178},
ISSN={1745-1337},
month={October},}

Copy

TY - JOUR
TI - A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2178
EP - 2181
AU - Peng SONG
AU - Wenming ZHENG
AU - Xinran ZHANG
AU - Yun JIN
AU - Cheng ZHA
AU - Minghai XIN
PY - 2015
DO - 10.1587/transfun.E98.A.2178
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E98-A
IS - 10
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - October 2015
AB - Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
ER -

IEICE TRANSACTIONS on Fundamentals