The search functionality is under construction.

The search functionality is under construction.

This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

- Publication
- IEICE TRANSACTIONS on Information Vol.E103-D No.6 pp.1395-1405

- Publication Date
- 2020/06/01

- Publicized
- 2020/03/13

- Online ISSN
- 1745-1361

- DOI
- 10.1587/transinf.2019EDP7166

- Type of Manuscript
- PAPER

- Category
- Speech and Hearing

Daisuke SAITO

The University of Tokyo

Nobuaki MINEMATSU

The University of Tokyo

Keikichi HIROSE

The University of Tokyo

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Daisuke SAITO, Nobuaki MINEMATSU, Keikichi HIROSE, "Tensor Factor Analysis for Arbitrary Speaker Conversion" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 6, pp. 1395-1405, June 2020, doi: 10.1587/transinf.2019EDP7166.

Abstract: This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7166/_p

Copy

@ARTICLE{e103-d_6_1395,

author={Daisuke SAITO, Nobuaki MINEMATSU, Keikichi HIROSE, },

journal={IEICE TRANSACTIONS on Information},

title={Tensor Factor Analysis for Arbitrary Speaker Conversion},

year={2020},

volume={E103-D},

number={6},

pages={1395-1405},

abstract={This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.},

keywords={},

doi={10.1587/transinf.2019EDP7166},

ISSN={1745-1361},

month={June},}

Copy

TY - JOUR

TI - Tensor Factor Analysis for Arbitrary Speaker Conversion

T2 - IEICE TRANSACTIONS on Information

SP - 1395

EP - 1405

AU - Daisuke SAITO

AU - Nobuaki MINEMATSU

AU - Keikichi HIROSE

PY - 2020

DO - 10.1587/transinf.2019EDP7166

JO - IEICE TRANSACTIONS on Information

SN - 1745-1361

VL - E103-D

IS - 6

JA - IEICE TRANSACTIONS on Information

Y1 - June 2020

AB - This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

ER -