The search functionality is under construction.

Author Search Result

[Author] Hiroya FUJISAKI(5hit)

1-5hit
  • Manifestation of Linguistic Information in the Voice Fundamental Frequency Contours of Spoken Japanese

    Hiroya FUJISAKI  Keikichi HIROSE  Noboru TAKAHASHI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1919-1926

    Prosodic features of the spoken Japanese play an important role in the transmission of linguistic information concerning the lexical word accent, the sentence structure and the discourse structure. In order to construct prosodic rules for synthesizing high-quality speech, therefore, prosodic features of speech should be quantitatively analyzed with respect to the linguistic information. With a special focus on the fundamental frequency contour, we first define four prosodic units for the spoken Japanese, viz., prosodic word, prosodic phrase, prosodic clause and prosodic sentence, based on a decomposition of the fundamental frequency contour using a functional model for the generation process. Syntactic units are also introduced which have rough correspondence to these prosodic units. The relationships between the linguistic information and the characteristics of the components of the fundamental frequency contour are then described on the basis of results obtained by the analysis of two sets of speech material. Analysis of weathercast and newscast sentences showed that prosodic boundaries given by the manner of continuation/termination of phrase components fall into three categories, and are primarily related to the syntactic boundaries. On the other hand, analysis of noun phrases with various combinations of word accent types, syntactic structures, and focal conditions, indicated that the magnitude and the shape of the accent components, which of course reflect the information concerning the lexical accent types of constituent words, are largely influenced by the focal structure. The results also indicated that there are cases where prosody fails to meet all the requirements presented by word accent, syntax and discourse.

  • A System for the Synthesis of High-Quality Speech from Texts on General Weather Conditions

    Keikichi HIROSE  Hiroya FUJISAKI  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1971-1980

    A text-to-speech conversion system for Japanese has been developed for the purpose of producing high-quality speech output. This system consists of four processing stages: 1) linguistic processing, 2) phonological processing, 3) control parameter generation, and 4) speech waveform generation. Although the processing at the first stage is restricted to the texts on general weather conditions, the other three stages can also cope with texts of news and narrations on other topics. Since the prosodic features of speech are largely related to the linguistic information, such as word accent, syntactic structure and discourse structure, linguistic processing of a wider range than ever, at least a sentence, is indispensable to obtain good quality speech with respect to the prosody. From this point of view, input text was restricted to the weather forecast sentences and a method for linguistic processing was developed to conduct morpheme, syntactic and semantic analyses simultaneously. A quantitative model for generating fundamental frequency contours was adopted to make a good reflection of the linguistic information on the prosody of synthetic speech. A set of prosodic rules was constructed to generate prosodic symbols representing prosodic structures of the text from the linguistic information obtained at the first stage. A new speech synthesizer based on the terminal analog method was also developed to improve the segmental quality of synthetic speech. It consists of four paths of cascade connection of pole/zero filters and three waveform generators. The four paths are respectively used for the synthesis of vowels and vowel-like sounds, nasal murmur and buzz bar, friction, and plosion, while the three generators produce voicing source waveform approximated by polynomials, white Gaussian noise source for fricatives and impulse source for plosives. The validity of the approach above has been confirmed by the listening tests using speech synthesized by the developed system. Improvements both in the quality of prosodic features and in the quality of segmental features were realized for the synthetic speech.

  • A Scheme for Word Detection in Continuous Speech Using Likelihood Scores of Segments Modified by Their Context Within a Word

    Sumio OHNO  Keikichi HIROSE  Hiroya FUJISAKI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    725-731

    In conventional word-spotting methods for automatic recognition of continuous speech, individual frames or segments of the input speech are assigned labels and local likelihood scores solely on the basis of their own acoustic characteristics. On the other hand, experiments on human speech perception conducted by the present authors and others show that human perception of words in connected speech is based, not only on the acoustic characteristics of individual segments, but also on the acoustic and linguistic contexts in which these segments occurs. In other words, individual segments are not correctly perceive by humans unless they are accompanied by their context. These findings on the process of human speech perception have to be applied in automatic speech recognition in order to improve the performance. From this point of view, the present paper proposes a new scheme for detecting words in continuous speech based on template matching where the likelihood of each segment of a word is determined not only by its own characteristics but also by the likelihood of its context within the framework of a word. This is accomplished by modifying the likelihood score of each segment by the likelihood score of its phonetic context, the latter representing the degree of similarity of the context to that of a candidate word in the lexicon. Higher enhancement is given to the segmental likelihood score if the likelihood score of its context is higher. The advantage of the proposed scheme over conventional schemes is demonstrated by an experiment on constructing a word lattice using connected speech of Japanese uttered by a male speaker. The result indicates that the scheme is especially effective in giving correct recognition in cases where there are two or more candidate words which are almost equal in raw segmental likelihood scores.

  • Automatic Extraction of Tone Command Parameters for the Model of F0 Contour Generation for Standard Chinese

    Wentao GU  Keikichi HIROSE  Hiroya FUJISAKI  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1079-1085

    The model for the process of F0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Standard Chinese, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters from an observed F0 contour of speech, cannot be solved analytically. Moreover, the extraction of model parameters for Standard Chinese is more difficult than for Japanese and English, because the polarity of tone commands cannot be inferred directly from the F0 contour itself. In this paper, an efficient method is proposed to solve the problem by using information on syllable timing and tone labels. With the same framework as for the successive approximation method proposed for Japanese and English, the method presented here for Standard Chinese is focused on the first-order estimation of tone command parameters. A set of intra-syllable and inter-syllable rules are constructed to recognize the tone command patterns within each syllable. The experiment shows that the method works effectively and gives results comparable to those obtained by manual analysis.

  • Formulation of the Verbal Thought Process Based on Generative Rules

    Naoki SUEHIRO  Hiroya FUJISAKI  

     
    PAPER-Artificial Intelligence

      Vol:
    E67-E No:1
      Page(s):
    26-32

    An assumption is made on the generative nature of the verbal thought process, based on an analogy between language use and verbal thought. A procedure is then presented for acquiring the set of generative rules from a given set of concept strings, leading to an efficient representation of verbal knowledge. The non-terminal symbols derived in the acquisition process are found to correspond to concepts and superordinate concepts in the human process of verbal thought. The validity of the formulation and the efficiency of knowledge representation is demonstrated by an example in which knowledge of biological properties of animals is reorganized into a set of generative rules. The process of inductive inference is then defined as a generalization of the acquired knowledge, and the principle of maximum simplicity of rules is proposed as a possible criterion for such generalization. The proposal is also tested by an example in which only a small part of a systematic body of knowledge is utilized to make inferences on the unknown parts of the system.