The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Mikio YAMAGUCHI(3hit)

1-3hit
  • Proposal and Evaluation of a Method for Accurate Analysis of Glottal Source Parameters

    John-Paul HOSOM  Mikio YAMAGUCHI  

     
    PAPER-Speech Processing

      Vol:
    E77-D No:10
      Page(s):
    1130-1141

    A new method for the accurate extraction of glottal source parameters is proposed. This method, called Heuristic Analysis-by-Synthesis (HAbS), has been developed specifically to overcome the weaknesses of other methods of glottal source parameter extraction. The specific features of this method are the use of the AbS method for extraction of glottal source and vocal tract parameters, the use of a parametric glottal source model during vocal tract analysis, the use of alternating glottal source and vocal tract analyses, and simultaneous, time-domain analysis of the glottal source parameters and the first formant. This method has been implemented in such a way that user interaction is not required. The performance of the HAbS method is evaluated using both synthetic-speech and natural-speech data. Error is measured in both the time domain and the spectral domain, and the standard deviation of extracted parameter values is computed. In addition, the error in analysis of each glottal-source parameter is computed using synthetic-speech data. In order to assess the accuracy of the HAbS method as compared to other methods, three other methods (LPC, AIF, and AbS) are evaluated using the same data methods of error measurement. From these evaluations, it is clear that the HAbS method yields results that are more accurate than these other methods.

  • Development of a Rule-Based Speech Synthesizer Module for Embedded Use

    Mikio YAMAGUCHI  John-Paul HOSOM  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1990-1998

    A module for rule-based Japanese speech synthesis has been developed. The synthesizer was constructed using the Multiple-Cascade Terminal Analog (MCTA) structure, and this sturcture has been improved in three respects: the voicing-source model has an increased number of variable parameters which allows for voicing-source waveforms that better approximate natural speech; the spectral characteristics of the fricative source have been improved; and the path used for nasal consonants has an increased number of resonators to better conform to theory. The current synthesis system uses a modified stored-pattern data structure which allows better transitions between syllables; however, time-invariant values are used in certain cases in order to decrease the amount of required memory. This system also has a new consolidated method for generating geminate obstruents and syllabic nasals. This synthesizer and synthesis system have been implemented in a re-developed rule-based speech-synthesis module. This module has been constructed using ASIC technology and has both small size (56368 mm) and light weight (19g); it is therefore possible to embed it in various types of portable or moving machinery. The module can be connected directly to a mocroprocessor bus and accepts as input sentences which are generated by the host computer. The input sentences are written with the Japanese katakana or romaji syllabaries and other symbols which describe the sentence structure. The syllable articulation rate for one hundred Japanese syllables (including palatalized sounds) is 65% and for sixty-seven syllables (not including palatalized sounds) is 74%. The word intelligibility, measured using phonetically-balanced words, it 88%.

  • Power Control of a Terminal Analog Synthesizer Using a Glottal Model

    Mikio YAMAGUCHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1957-1963

    A terminal-analog synthesizer which uses a glottal model has already been proposed for rule-based speech synthesis, but the control strategy for glottal source intensity levels has not yet been defined. On the other hand, power-control rules which determine the target segmental power of synthetic speech have been proposed, based on statistical analysis of the power in natural speech. It is pointed out that there is a close correlation between observed fundamental frequency and power levels in natural speech; however, the theoretical reasons for this correlation have not been explained. This paper shows the relationship between fundamental frequency and resultant power in a terminal-analog synthesizer which uses a glottal model. From the equations it can be deduced that the tendency in natural speech for power to increase with fundamental frequency can be closely simulated by the sum of the effect of the radiation characteristic and the effect of the synthesis system's vocal tract transfer function. In addition, this paper proposes a method for adjusting the power of synthetic speech to any desired value. This control method can be executed in real-time.