The search functionality is under construction.

Author Search Result

[Author] Kiyoshi KOGURE(3hit)

1-3hit
  • Integration of Speech Recognition and Language Processing in a Japanese to English Spoken Language Translation System

    Tsuyoshi MORIMOTO  Kiyohiro SHIKANO  Kiyoshi KOGURE  Hitoshi IIDA  Akira KUREMATSU  

     
    PAPER-Speech Understanding

      Vol:
    E74-A No:7
      Page(s):
    1889-1896

    The experimental spoken language translation system (SL-TRANS) has been implemented. It can recognize Japanese speech, translate it to English, and output a synthesized English speech. One of the most important problems in realizing such a system is how to integrate, or connect, speech recognition and language processing. In this paper, a new method realized in the system is described. The method is composed of three processes: grammar-driven predictive speech recognition, Kakariuke-dependency-based candidate filtering, and HPSG-based lattice parsing which is supplemented with a sentence preference mechanism. Input speech is uttered phrase by phrase. The speech recognizer takes an input phrase utterance and outputs several candidates with recognition scores for each phrase. Japanese phrasal grammar is used in recognition. It contributes to the output of grammatically well-formed phrase candidates, as well as to the reduction of phone perplexity. The candidate filter takes a phrase lattice, which is a sequence of multiple candidates for a phrase, and outputs a reduced phrase lattice. It removes semantically inappropriate phrase candidates by applying the Kakariuke dependency relationship between phrases. Finally, the HPSG-based lattice parser takes a phrase lattice and chooses the most plausible sentence by checking syntactic and semantic legitimacy or evaluating sentential preference. Experiment results for the system are also reported and the usefulness of the method is confirmed.

  • Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

    Takatoshi JITSUHIRO  Tomoji TORIYAMA  Kiyoshi KOGURE  

     
    PAPER-Noisy Speech Recognition

      Vol:
    E91-D No:3
      Page(s):
    402-410

    We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.

  • Applicability of Camera Works to Free Viewpoint Videos with Annotation and Planning

    Ryuuki SAKAMOTO  Itaru KITAHARA  Megumu TSUCHIKAWA  Kaoru TANAKA  Tomoji TORIYAMA  Kiyoshi KOGURE  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1637-1648

    This paper shows the effectiveness of a cinematographic camera for controlling 3D video by measuring its effects on viewers with several typical camera works. 3D free-viewpoint video allows us to set its virtual camera on arbitrary positions and postures in 3D space. However, there have been neither investigations on adaptability nor on dependencies between the camera parameters of the virtual camera (i.e., positions, postures, and transitions) nor the impressions of viewers. Although camera works on 3D video based on expertise seems important for making intuitively understandable video, it has not yet been considered. When applying camera works to 3D video using the planning techniques proposed in previous research, generating ideal output video is difficult because it may include defects due to image resolution limitation, calculation errors, or occlusions as well as others caused by positioning errors of the virtual camera in the planning process. Therefore, we conducted an experiment with 29 subjects with camera-worked 3D videos created using simple annotation and planning techniques to determine the virtual camera parameters. The first point of the experiment examines the effects of defects on viewer impressions. To measure such impressions, we conducted a semantic differential (SD) test. Comparisons between ground truth and 3D videos with planned camera works show that the present defects of camera work do not significantly affect viewers. The experiment's second point examines whether the cameras controlled by planning and annotations affected the subjects with intentional direction. For this purpose, we conducted a factor analysis for the SD test answers whose results indicate that the proposed virtual camera control, which exploits annotation and planning techniques, allows us to realize camera working direction on 3D video.