The search functionality is under construction.

Author Search Result

[Author] Takayuki ARAI(3hit)

1-3hit
  • Automatic Language Identification Using Sequential Information of Phonemes

    Takayuki ARAI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    705-711

    In this paper approaches to language identification based on the sequential information of phonemes are described. These approaches assume that each language can be identified from its own phoneme structure, or phonotactics. To extract this phoneme structure, we use phoneme classifiers and grammars for each language. The phoneme classifier for each language is implemented as a multi-layer perceptron trained on quasi-phonetic hand-labeled transcriptions. After training the phoneme classifiers, the grammars for each language are calculated as a set of transition probabilities for each phoneme pair. Because of the interest in automatic language identification for worldwide voice communication, we decided to use telephone speech for this study. The data for this study were drawn from the OGI (Oregon Graduate Institute)-TS (telephone speech) corpus, a standard corpus for this type of research. To investigate the basic issues of this approach, two languages, Japanese and English, were selected. The language classification algorithms are based on Viterbi search constrained by a bigram grammar and by minimum and maximum durations. Using a phoneme classifier trained only on English phonemes, we achieved 81.1% accuracy. We achieved 79.3% accuracy using a phoneme classifier trained on Japanese phonemes. Using both the English and the Japanese phoneme classifiers together, we obtained our best result: 83.3%. Our results were comparable to those obtained by other methods such as that based on the hidden Markov model.

  • Visualization of Brain Activities of Single-Trial and Averaged Multiple-Trials MEG Data

    Yoshio KONNO  Jianting CAO  Takayuki ARAI  Tsunehiro TAKEDA  

     
    PAPER-Neuro, Fuzzy, GA

      Vol:
    E86-A No:9
      Page(s):
    2294-2302

    Treating an averaged multiple-trials data or non-averaged single-trial data is a main approach in recent topics on applying independent component analysis (ICA) to neurobiological signal processing. By taking an average, the signal-to-noise ratio (SNR) is increased but some important information such as the strength of an evoked response and its dynamics will be lost. The single-trial data analysis, on the other hand, can avoid this problem but the SNR is very poor. In this study, we apply ICA to both non-averaged single-trial data and averaged multiple-trials data to determine the properties and advantages of both. Our results show that the analysis of averaged data is effective for seeking the response and dipole location of evoked fields. The non-averaged single-trial data analysis efficiently identifies the strength and dynamic component such as α-wave. For determining both the range of evoked strength and dipole location, an analysis of averaged limited-trials data is better option.

  • What are the Essential Cues for Understanding Spoken Language?

    Steven GREENBERG  Takayuki ARAI  

     
    INVITED PAPER

      Vol:
    E87-D No:5
      Page(s):
    1059-1070

    Classical models of speech recognition assume that a detailed, short-term analysis of the acoustic signal is essential for accurately decoding the speech signal and that this decoding process is rooted in the phonetic segment. This paper presents an alternative view, one in which the time scales required to accurately describe and model spoken language are both shorter and longer than the phonetic segment, and are inherently wedded to the syllable. The syllable reflects a singular property of the acoustic signal -- the modulation spectrum -- which provides a principled, quantitative framework to describe the process by which the listener proceeds from sound to meaning. The ability to understand spoken language (i.e., intelligibility) vitally depends on the integrity of the modulation spectrum within the core range of the syllable (3-10 Hz) and reflects the variation in syllable emphasis associated with the concept of prosodic prominence ("accent"). A model of spoken language is described in which the prosodic properties of the speech signal are embedded in the temporal dynamics associated with the syllable, a unit serving as the organizational interface among the various tiers of linguistic representation.