The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Akira ICHIKAWA(5hit)

1-5hit
  • Estimating Syntactic Structure from Prosody in Japanese Speech

    Tomoko OHSUGA  Yasuo HORIUCHI  Akira ICHIKAWA  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E86-D No:3
      Page(s):
    558-564

    In this study, we introduce a method for estimating the syntactic structure of Japanese speech from F0 contour and pause duration. We defined a prosodic unit (PU) which is divided by the local minimal point of an F0 contour or pause. Combining PUs repeatedly (a pair of PUs is combined into one PU), a tree structure is gradually generated. Which pair of PUs in a sequence of three PUs should be combined is decided by a discriminant function based on the discriminant analysis of a corpus of speech data. We applied the method to the ATR Phonetically Balanced Sentences read by four Japanese speakers. We found that with this method, the correct rate of judgement for each sequence of three PUs is 79% and the estimation accuracy of the entire syntactic structure for each sentence is 26%. We consider this result to demonstrate a good degree of accuracy for the difficult task of estimating syntactic structure only from prosody.

  • Spontaneous Speech Understanding Based on Cooperative Problem-Solving

    Akio KOMATSU  Eiji OOHIRA  Akira ICHIKAWA  

     
    PAPER-Speech Understanding

      Vol:
    E74-A No:7
      Page(s):
    1845-1853

    Natural spontaneous speech is so ambiguous that a system for understanding it requires the cooperation of many knowledge sources. Thus, in order to integrate speech processing and language processing, it is necessary to provide a system with a mechanism for supporting such cooperation. We propose here a general framework for cooperative problemsolving, based on the blackboard model and a TMS (truth maintenance system), with an enhanced proving function. In this framework, a reasonably consistent interpretation is automatically kept on the blackboard, while each knowledge source performs its own inference and puts the results on the blackboard. Based on this framework, a model has been established for a system which can understand spontaneous speech through the cooperation of independent knowledge sources. Most notably, prosodic information is used as suprasegmental cues to infer the structure of spontaneous speech. This allows robust parsing of spoken sentences. The feasibility and validity of our basic framework have been confirmed by computer simulation experiments on spontaneous speech.

  • Digital Encoding Applied to Sign Language Video

    Kaoru NAKAZONO  Yuji NAGASHIMA  Akira ICHIKAWA  

     
    PAPER-Service and System

      Vol:
    E89-D No:6
      Page(s):
    1893-1900

    We report a specially designed encoding technique for sign language video sequences supposing that the technique is for sign telecommunication such as that using mobile videophones with a low bitrate. The technique is composed of three methods: gradient coding, precedence macroblock coding, and not-coded coding. These methods are based on the idea to distribute a certain number of bits for each macroblock according to the evaluation of importance of parts of the picture. They were implemented on a computer and encoded data of a short clip of sign language dialogue was evaluated by deaf subjects. As a result, the efficiency of the technique was confirmed.

  • Two Probabilistic Algorithms for Planar Motion Detection

    Iris FERMIN  Atsushi IMIYA  Akira ICHIKAWA  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E80-D No:3
      Page(s):
    371-381

    We introduce two probabilistic algorithms to determine the motion parameters of a planar shape without knowing a priori the point-to-point correspondences. If the target is limited to rigid objects, an Euclidean transformation can be expressed as a linear equation with six parameters, i.e. two translational parameters and four rotational parameters (the axis of rotation and the rotational speed about the axis). These parameters can be determined by applying the randomized Hough transform. One remarkable feature of our algorithms is that the calculations of the translation and rotation parameters are performed by using points randomly selected from two image frames that are acquired at different times. The estimation of rotation parameters is done using one of two approaches, which we call the triangle search and the polygon search algorithms respectively. Both methods focus on the intersection points of a boundary of the 2D shape and the circles whose centers are located at the shape's centroid and whose radii are generated randomly. The triangle search algorithm randomly selects three different intersection points in each image, such that they form congruent triangles, and then estimates the rotation parameter using these two triangles. However, the polygon search algorithm employs all the intersection points in each image, i.e. all the intersection points in the two image frames form two polygons, and then estimates the rotation parameter with aid of the vertices of these two polygons.

  • Dialogue Languages and Persons with Disabilities

    Akira ICHIKAWA  

     
    INVITED PAPER

      Vol:
    E87-D No:6
      Page(s):
    1312-1319

    Any utterances of dialogue, spoken language or sign language, have functions that enable recipients to achieve real-time and easy understanding and to control conversation smoothly in spite of its volatile characteristics. In this paper, we present evidence of these functions obtained experimentally. Prosody plays a very important role not only in spoken language (aural language) but also in sign language (visual language) and finger braille (tactile language). Skilled users of a language may detect word boundaries in utterances and estimate sentence structure immediately using prosody. The gestures and glances of a recipient may influence the utterances of the sender, leading to amendments of the contents of utterances and smooth exchanges in turn. Individuality and emotion in utterances are also very important aspects of effective communication support systems for persons with disabilities even more so than for those non-disabled persons. The trials described herein are universal in design. Some trials carried out to develop these systems are also reported.