The search functionality is under construction.
The search functionality is under construction.

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

Kenichi FUJITA, Atsushi ANDO, Yusuke IJIMA

  • Full Text Views

    0

  • Cite this

Summary :

This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker. Speech rhythm is one of the essential factors among speaker characteristics, along with acoustic features such as F0, for reproducing individual utterances in speech synthesis. A novel feature of the proposed method is the rhythm-based embeddings extracted from phonemes and their durations, which are known to be related to speaking rhythm. They are extracted with a speaker identification model similar to the conventional spectral feature-based one. We conducted three experiments, speaker embeddings generation, speech synthesis with generated embeddings, and embedding space analysis, to evaluate the performance. The proposed method demonstrated a moderate speaker identification performance (15.2% EER), even with only phonemes and their duration information. The objective and subjective evaluation results demonstrated that the proposed method can synthesize speech with speech rhythm closer to the target speaker than the conventional method. We also visualized the embeddings to evaluate the relationship between the distance of the embeddings and the perceptual similarity. The visualization of the embedding space and the relation analysis between the closeness indicated that the distribution of embeddings reflects the subjective and objective similarity.

Publication
IEICE TRANSACTIONS on Information Vol.E107-D No.1 pp.93-104
Publication Date
2024/01/01
Publicized
2023/10/06
Online ISSN
1745-1361
DOI
10.1587/transinf.2023EDP7039
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Kenichi FUJITA
  NTT Corporation
Atsushi ANDO
  NTT Corporation
Yusuke IJIMA
  NTT Corporation

Keyword