The search functionality is under construction.

IEICE TRANSACTIONS on Information

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Kou TANAKA, Tomoki TODA, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMURA

  • Full Text Views

    0

  • Cite this

Summary :

This paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.

Publication
IEICE TRANSACTIONS on Information Vol.E97-D No.6 pp.1429-1437
Publication Date
2014/06/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E97.D.1429
Type of Manuscript
Special Section PAPER (Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application)
Category
Voice Conversion and Speech Enhancement

Authors

Kou TANAKA
  Nara Institute of Science and Technology (NAIST)
Tomoki TODA
  Nara Institute of Science and Technology (NAIST)
Graham NEUBIG
  Nara Institute of Science and Technology (NAIST)
Sakriani SAKTI
  Nara Institute of Science and Technology (NAIST)
Satoshi NAKAMURA
  Nara Institute of Science and Technology (NAIST)

Keyword