IEICE global.ieice.org Site

The search functionality is under construction.

Author Search Result

[Author] Kei AKUZAWA(1hit)

1-1hit

Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams
Yuki SAITO Kei AKUZAWA Kentaro TACHIBANA

PAPER-Speech and Hearing

Pubricized:
2020/06/12
Vol:
E103-D No:9
Page(s):
1978-1987
This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.

Latest Issue

English

Links

Call for Papers

Call for Papers

Special Section

Submit to IEICE Trans.

Submit to IEICE Trans.

Information for Authors

Transactions NEWS

Transactions NEWS

Popular articles

Popular articles

Top 10 Downloads