Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

Kazuhiro KOBAYASHI; Tomoki TODA; Hironori DOI; Tomoyasu NAKANO; Masataka GOTO; Graham NEUBIG; Sakriani SAKTI; Satoshi NAKAMURA

doi:10.1587/transinf.E97.D.1419

IEICE TRANSACTIONS on Information

Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

Kazuhiro KOBAYASHI, Tomoki TODA, Hironori DOI, Tomoyasu NAKANO, Masataka GOTO, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMURA

Full Text Views

0

Cite this

Summary :

The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.

Publication: IEICE TRANSACTIONS on Information Vol.E97-D No.6 pp.1419-1428

Publication Date: 2014/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E97.D.1419

Type of Manuscript: Special Section PAPER (Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application)

Category: Voice Conversion and Speech Enhancement

Authors

Kazuhiro KOBAYASHI
  Nara Institute of Science and Technology (NAIST)
Tomoki TODA
  Nara Institute of Science and Technology (NAIST)
Hironori DOI
  Nara Institute of Science and Technology (NAIST)
Tomoyasu NAKANO
  National Institute of Advanced Industrial Science and Technology (AIST)
Masataka GOTO
  National Institute of Advanced Industrial Science and Technology (AIST)
Graham NEUBIG
  Nara Institute of Science and Technology (NAIST)
Sakriani SAKTI
  Nara Institute of Science and Technology (NAIST)
Satoshi NAKAMURA
  Nara Institute of Science and Technology (NAIST)

Keyword

singing voice, voice conversion, perceived age, spectral and prosodic features, subjective evaluations

Cite this

Copy

Kazuhiro KOBAYASHI, Tomoki TODA, Hironori DOI, Tomoyasu NAKANO, Masataka GOTO, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMURA, "Voice Timbre Control Based on Perceived Age in Singing Voice Conversion" in IEICE TRANSACTIONS on Information, vol. E97-D, no. 6, pp. 1419-1428, June 2014, doi: 10.1587/transinf.E97.D.1419.
Abstract: The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E97.D.1419/_p

Copy

@ARTICLE{e97-d_6_1419,
author={Kazuhiro KOBAYASHI, Tomoki TODA, Hironori DOI, Tomoyasu NAKANO, Masataka GOTO, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Voice Timbre Control Based on Perceived Age in Singing Voice Conversion},
year={2014},
volume={E97-D},
number={6},
pages={1419-1428},
abstract={The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.},
keywords={},
doi={10.1587/transinf.E97.D.1419},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Voice Timbre Control Based on Perceived Age in Singing Voice Conversion
T2 - IEICE TRANSACTIONS on Information
SP - 1419
EP - 1428
AU - Kazuhiro KOBAYASHI
AU - Tomoki TODA
AU - Hironori DOI
AU - Tomoyasu NAKANO
AU - Masataka GOTO
AU - Graham NEUBIG
AU - Sakriani SAKTI
AU - Satoshi NAKAMURA
PY - 2014
DO - 10.1587/transinf.E97.D.1419
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E97-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2014
AB - The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.
ER -

IEICE TRANSACTIONS on Information