The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] speech production(4hit)

1-4hit
  • Production-Oriented Models for Speech Recognition

    Erik MCDERMOTT  Atsushi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    1006-1014

    Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation thereof; state-to-state evolution is modeled only crudely, with no explicit relationship between states, such as would be afforded by the use of phonetic features commonly used by linguists to describe speech phenomena, or by the continuity and smoothness of the production parameters governing speech. This survey article attempts to provide an overview of proposals by several researchers for improving acoustic modeling in these regards. Such topics as the controversial Motor Theory of Speech Perception, work by Hogden explicitly using a continuity constraint in a pseudo-articulatory domain, the Kalman filter based Hidden Dynamic Model, and work by many groups showing the benefits of using articulatory features instead of phones as the underlying units of speech, will be covered.

  • Exploring Human Speech Production Mechanisms by MRI

    Kiyoshi HONDA  Hironori TAKEMOTO  Tatsuya KITAMURA  Satoru FUJITA  Sayoko TAKANO  

     
    INVITED PAPER

      Vol:
    E87-D No:5
      Page(s):
    1050-1058

    Recent investigations using magnetic resonance imaging (MRI) of human speech organs have opened up new avenues of research. Visualization of the speech production system provides abundant information on the physiological and acoustic realization of human speech. This article summarizes the current status of MRI applications with respect to speech research as well as our own experience of discovery and re-evaluation of acoustic events emanating from the vocal tract and physiological mechanisms.

  • Speaker Adaptation Method for Acoustic-to-Articulatory Inversion using an HMM-Based Speech Production Model

    Sadao HIROYA  Masaaki HONDA  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1071-1078

    We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.

  • A Short-Time Speech Analysis Method with Mapping Using the Fejr Kernel

    Nobuhiro MIKI  Kenji TAKEMURA  Nobuo NAGAI  

     
    PAPER

      Vol:
    E77-A No:5
      Page(s):
    792-799

    We discuss estimation error as a basic problem in formant estimation in the analysis of speech of very short-time duration in the glottal closure of the vowel. We also show in our simulation that good estimation of the first formant is almost impossible with the ordinary method using a waveform cutting. We propose a new method in which the cut waveform, as a discontinuous function of finite time, is mapped to a continuous function defined in the whole time domain; and we show that using this method, the estimation accuracy for low frequency formants can be greatly improved.