The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Al(20498hit)

18901-18920hit(20498hit)

  • A New Adaptive Convergence Factor Algorithm with the Constant Damping Parameter

    Isao NAKANISHI  Yutaka FUKUI  

     
    PAPER

      Vol:
    E78-A No:6
      Page(s):
    649-655

    This paper presents a new Adaptive Convergence Factor (ACF) algorithm without the damping parameter adjustment acoording to the input signal and/or the composition of the filter system. The damping parameter in the ACF algorithms has great influence on the convergence characteristics. In order to examine the relation between the damping parameter and the convergence characteristics, the normalization which is realized by the related signal terms divided by each maximum value is introduced into the ACF algorithm. The normalized algorithm is applied to the modeling of unknown time-variable systems which makes it possible to examine the relation between the parameters and the misadjustment in the adaptive algorithms. Considering the experimental and theoretical results, the optimum value of the damping parameter can be defined as the minimum value where the total misadjustment becomes minimum. To keep the damping parameter optimum in any conditions, the new ACF algorithm is proposed by improving the invariability of the damping parameter in the normalized algorithm. The algorithm is investigated by the computer simulations in the modeling of unknown time-variable systems and the system indentification. The results of simulations show that the proposed algorithm needs no adjustment of the optimum damping parameter and brings the stable convergence characteristics even if the filter system is changed.

  • Speaker-Consistent Parsing for Speaker-Independent Continuous Speech Recognition

    Kouichi YAMAGUCHI  Harald SINGER  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    719-724

    This paper describes a novel speaker-independent speech recognition method, called speaker-consistent parsing", which is based on an intra-speaker correlation called the speaker-consistency principle. We focus on the fact that a sentence or a string of words is uttered by an individual speaker even in a speaker-independent task. Thus, the proposed method searches through speaker variations in addition to the contents of utterances. As a result of the recognition process, an appropriate standard speaker is selected for speaker adaptation. This new method is experimentally compared with a conventional speaker-independent speech recognition method. Since the speaker-consistency principle best demonstrates its effect with a large number of training and test speakers, a small-scale experiment may not fully exploit this principle. Nevertheless, even the results of our small-scale experiment show that the new method significantly outperforms the conventional method. In addition, this framework's speaker selection mechanism can drastically reduce the likelihood map computation.

  • Uniform and Non-uniform Normalization of Vocal Tracts Measured by MRI Across Male, Female and Child Subjects

    Chang-Sheng YANG  Hideki KASUYA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    732-737

    Three-dimensional vocal tract shapes of a male, a female and a child subjects are measured from magnetic resonance (MR) images during sustained phonation of Japanese vowels /a, i, u, e, o/. Non-uniform dimensional differences in the vocal tract shapes of the subjects are quantitatively measured. Vocal tract area functions of the female and child subjects are normalized to those of the male on the basis of non-uniform and uniform scalings of the vocal tract length and compared with each other. A comparison is also made between the formant frequencies computed from the area functions normalized by the two different scalings. It is suggested by the comparisons that non-uniformity in the vocal tract dimensions is not essential in the normalization of the five Japanese vowels.

  • Operation Scheduling by Annealed Neural Networks

    Tsuyoshi KAWAGUCHI  Tamio TODAKA  

     
    PAPER

      Vol:
    E78-A No:6
      Page(s):
    656-663

    The operation scheduling is an important subtask in the automatic synthesis of digital systems. Many greedy heuristics have been proposed for the operation scheduling, but they cannot find the globally best schedule. In this paper we present an algorithm to construct near optimal schedules. The algorithm combines characteristics of simulated annealing and neural networks. The neural network used in our scheduling algorithm is similar to that proposed by Hellstrom et al. However, while the problems of Refs. [11] and [12] have a single type of constraint, the problem considered in this paper has three types of constraints. As the result, the energy function of the proposed neural network is given by the weighted sum of three energy functions. To minimize the weighted sum of two or more energy functions, conventional methods try to find a good set of weights using a try and error method. Our algorithm takes a different approach than these methods. Results of the experiments show that the proposed algorithm can be used as an alternative heuristic for solving the operation scheduling problem. In addition, the proposed algorithm can exploit the inherent parallelism of the neural network.

  • Tone Recognition of Chinese Dissyllables Using Hidden Markov Models

    Xinhui HU  Keikichi HIROSE  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    685-691

    A method of tone recognition has been developed for dissyllabic speech of Standard Chinese based on discrete hidden Markov modeling. As for the feature parameters of recognition, combination of macroscopic and microscopic parameters of fundamental frequency contours was shown to give a better result as compared to the isolated use of each parameter. Speaker normalization was realized by introducing an offset to the fundamental frequency. In order to avoid recognition errors due to syllable segmentation, a scheme of concatenated learning was adopted for training hidden Markov models. Based on the observations of fundamental frequency contours of dissyllables, a scheme was introduced to the method, where a contour was represented with a series of three syllabic tone models, two for the first and the second syllables and one for the transition part around the syllabic boundary. Corresponding to the voiceless consonant of the second syllable, fundamental frequency contour of a dissyllable may include a part without fundamental frequencies. This part was linearly interpolated in the current method. To prove the validity of the proposed method, it was compared with other methods, such as representing all of the dissyllabic contours as the concatenation of two models, assigning a special code to the voiceless part, and so on. Tone sandhi was also taken into account by introducing two additional models for the half-third tone and for the first 4th tone of the combination of two 4th tones. With the proposed method, average recognition rate of 96% was achieved for 5 male and 5 female speakers.

  • Neural Predictive Hidden Markov Model for Speech Recognition

    Eiichi TSUBOKA  Yoshihiro TAKADA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    676-684

    This paper describes new modeling methods combining neural network and hidden Markov model applicable to modeling a time series such as speech signal. The idea assumes that the sequence is nonstationary and is a nonlinear autoregressive process whose parameters are controlled by a hidden Markov chain. One is the model where a non-linear predictor composed of a multi-layered neural network is defined at each state, another is the model where a multi-layered neural network is defined so that the path from the input layer to the output layer is divided into path-groups each of which corresponds to the state of the Markov chain. The latter is an extended model of the former. The parameter estimation methods for these models are shown, and other previously proposed models--one called Neural Prediction Model and another called Linear Predictive HMM--are shown to be special cases of the NPHMM proposed here. The experimental result affirms the justification of these proposed models.

  • A New HMnet Construction Algorithm Requiring No Contextual Factors

    Motoyuki SUZUKI  Shozo MAKINO  Akinori ITO  Hirotomo ASO  Hiroshi SHIMODAIRA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    662-668

    Many methods have been proposed for constructing context-dependent phoneme models using Hidden Markov Models (HMMs) to improve performance. These conventional methods require previously defined contextual factors. If these factors are deficient, the method exhibit poor recognition performance. In this paper, we propose a new construction algorithm for HMnet which does not require pre-defined contextual factors. Experiments demonstrated that the new algorithm could construct the HMnet even for the case that the Successive State Splitting (SSS) algorithm could not. The new algorithm produced better phoneme recognition characteristics than the SSS algorithm.

  • Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition

    Nobuaki MINEMATSU  Keikichi HIROSE  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    654-661

    A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.

  • An Utterance Prediction Method Based on the Topic Transition Model

    Yoichi YAMASHITA  Takashi HIRAMATSU  Osamu KAKUSHO  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    622-628

    This paper describes a method for predicting the user's next utterances in spoken dialog based on the topic transition model, named TPN. Some templates are prepared for each utterance pair pattern modeled by SR-plan. They are represented in terms of five kinds of topic-independent constituents in sentences. The topic of an utterance is predicted based on the TPN model and it instantiates the templates. The language processing unit analyzes the speech recognition result using the templates. An experiment shows that the introduction of the TPN model improves the performance of utterance recognition and it drastically reduces the search space of candidates in the input bunsetsu lattice.

  • A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance

    Osamu YOSHIOKA  Yasuhiro MINAMI  Kiyohiro SHIKANO  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    616-621

    This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.

  • Computation of the Field Distribution Generated by a Rectangular Aperture in a Four-Layered Lossy Dielectric Medium by Modal Analysis

    Shinya MIZOSHIRI  Katsumi ABE  Toshifumi SUGIURA  Shizuo MIZUSHINA  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    851-858

    An open-ended rectangular waveguide filled with a dielectric has been used as a contact-type antenna of microwave radiometer for non-invasive measurement of temperature in a biological object. In this application, the thermal radiation emitted by the object is measured as the brightness temperature by the instrument via the antenna. The brightness temperature is related to the physical temperatures in the object through the radiometric weighting function. By virtue of the reciprocity of antenna, the weighting function can be derived from the field distribution induced in the object by the antenna when it is operated in the active mode. In this work, we treat a problem of the rectangular waveguide antenna radiating into a four-layered medium by modal analysis. The results are first compared with those obtained by the FD-TD method to indicate that the results of the two methods are in a good agreement. The operation of an antenna used in a radiometer system in our laboratory is analyzed by this method and the weighting functions at different frequencies are computed, and the results are presented along with discussions on the results.

  • An Objective Measure Based on an Auditory Model for Assessing Low-Rate Coded Speech

    Toshiro WATANABE  Shinji HAYASHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    751-757

    We propose an objective measure from assessing low-rate coded speech. The model for this objective measure, in which several known features of the perceptual processing of speech sounds by the human ear are emulated, is based on the Hertz-to-Bark transformation, critical-band filtering with preemphasis to boost higher frequencies, nonlinear conversion for subjective loudness, and temporal (forward) masking. The effectiveness of the measure, called the Bark spectral distortion rating (BSDR), was validated by second-order polynomial regression analysis between the computed BSDR values and subjective MOS ratings obtained for a large number of utterances coded by several versions of CELP coders and one VSELP coder under three degradation conditions: input speech levels, transmission error rates, and background noise levels. The BSDR values correspond better to MOS ratings than several commonly used measures. Thus, BSDR can be used to accurately predict subjective scores.

  • Error Analysis of Field Trial Results of a Spoken Dialogue System for Telecommunications Applications

    Shingo KUROIWA  Kazuya TAKEDA  Masaki NAITO  Naomi INOUE  Seiichi YAMAMOTO  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    636-641

    We carried out a one year field trial of a voice-activated automatic telephone exchange service at KDD Laboratories which has about 200 branch phones. This system has DSP-based continuous speech recognition hardware which can process incoming calls in real time using a vocabulary of 300 words. The recognition accuracy was found to be 92.5% for speech read from a written text under laboratory conditions independent of the speaker. In this paper, we describe the performance of the system obtained as a result of the field trial. Apart from recognition accuracy, there was about 20% error due to out-of-vocabulary input and incorrect detection of speech endpoints which had not been allowed for in the laboratory experiments. Also, we found that the recognition accuracy for actual speech was about 18% lower than for speech read from text even if there were no out-of-vocabulary words. In this paper, we examine error variations for individual data in order to try and pinpoint the cause of incorrect recognition. It was found from experiments on the collected data that the pause model used, filled pause grammar and differences of channel frequency response seriously affected recognition accuracy. With the help of simple techniques to overcome these problems, we finally obtained a recognition accuracy of 88.7% for real data.

  • Multimodal Interaction in Human Communication

    Keiko WATANUKI  Kenji SAKAMOTO  Fumio TOGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    609-615

    We are developing multimodal man-machine interfaces through which users can communicate by integrating speech, gaze, facial expressions, and gestures such as nodding and finger pointing. Such multimodal interfaces are expected to provide more flexible, natural and productive communications between humans and computers. To achieve this goal, we have taken the approach of modeling human behavior in the context of ordinary face-to-face conversations. As the first step, we have implemented a system which utilizes video and audio recording equipment to capture verbal and nonverbal information in interpersonal communications. Using this system, we have collected data from a task-oriented conversation between a guest (subject) and a receptionist at company reception desk, and quantitatively analyzed this data with respect to multi-modalities which would be functional in fluid interactions. This paper presents detailed analyses of the data collected: (1) head nodding and eye-contact are related to the beginning and end of speaking turns, acting to supplement speech information; (2) listener responses occur after an average of 0.35 sec. from the receptionist's utterance of a keyword, and turn-taking for tag-questions occurs after an average of 0.44 sec.; and (3) there is a rhythmical coordination between speakers and listeners.

  • Effect of a Catheter on SAR Distribution around Interstitial Antenna for Microwave Hyperthermia

    Meng-Shien WU  Lira HAMADA  Koichi ITO  Haruo KASAI  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    845-850

    This paper describes that the dielectric characteristics of a catheter around the interstitial antenna have an effect on the wavelength for current, and this effect results in the variation of the SAR (Specific Absorption Rate) distribution around the antenna. A theoretical study of SAR distribution ground a coaxial-slot antenna is performed. Analytical technique used is the moment method. Result and discussion on the effect of material and thickness of the catheter are presented. The wavelength for the current shortens with increasing dielectric constant or decreasing thickness of the catheter. Due to this variation of the wavelength for current, the SAR distributions take various shapes.

  • Dual Concentric Conductor Radiator for Microwave Hyperthermia with Improved Field Uniformity to Periphery of Aperture

    Paul R. STAUFFER  Marco LEONCINI  Vinicio MANFRINI  Guido Biffi GENTILI  Chris J. DIEDERICH  David BOZZO  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    826-835

    Electromagnetic radiation patterns of planar 915MHz Dual Concentric Conductor (DCC) antennas were investigated with theoretical finite difference time domain (FDTD) analyses and experimental measurements of power deposition in a homogeneous lossy dielectric load. Power deposition (SAR) patterns were characterized by scanning an electric field sensor in front of the radiating aperture 1 cm deep in liquid "muscle tissue" phantom. Results showed close agreement between the theoretical simulations and measured SAR patterns for a 3.5cm square aperture. Additional SAR measurements demonstrated the ability to vary aperture size from 3.5-6cm with minimal change in shape of the power deposition pattern. Both analyses indicated that effective power deposition (50% SARmax) extends to the periphery of the square apertures. These data support the conclusion that the DCC aperture constitutes an improved radiator to be used as the functional building block of larger array applicators which are required for adjustable heating of large superficial tissue regions in the treatment of cancer.

  • A Flexible Hybrid Channel Assignment Strategy Using an Artificial Neural Network in a Cellular Mobile Communication system

    Kazuhiko SHIMADA  Masakazu SENGOKU  Takeo ABE  

     
    PAPER

      Vol:
    E78-A No:6
      Page(s):
    693-700

    A novel algorithm, as an advanced Hybrid Channel Assignment strategy, for channel assignment problem in a cellular system is proposed. A difference from the conventional Hybrid Channel Assignment method is that flexible fixed channel allocations which are variable through the channel assignment can be performed in order to cope with varying traffic. This strategy utilizes the Channel Rearrangement technique using the artificial neural network algorithm in order to enhance channel occupancy on the fixed channels. The strategy is applied to two simulation models which are the spatial homogeneous and inhomogeneous systems in traffic. The simulation results show that the strategy can effectively improve blocking probability in comparison with pure dynamic channel assignment strategy only with the Channel Rearrangement.

  • Coding for Multi-Pulse PPM with Imperfect Slot Synchronization in Optical Direct-Detection Channels

    Kazumi SATO  Tomoaki OHTSUKI  Iwao SASASE  

     
    PAPER-Optical Communication

      Vol:
    E78-B No:6
      Page(s):
    916-922

    The performance of coded multi-pulse pulse position modulation (MPPM) consisting of m slots and 2 pulses, denoted as (m, 2) MPPM, with imperfect slot synchronization is analyzed. Convolutional codes and Reed-Solomon (RS) codes are employed for (m, 2) MPPM, and the bit error probability of coded (m, 2) MPPM in the presence of the timing offset is derived. In each coded (m, 2) MPPM, we compare the performance of some different code rate systems. Moreover, we compare the performance of both systems at the same information bit rate. It is shown that in both coded systems, the performance of code rate-1/2 coded (m, 2) MPPM is the best when the timing offset is small. Wheji the timing offset is somewhat large, however, uncoded (m, 2) MPPM is shown to perform better than coded (m, 2) MPPM. Further, convolutional coded (m, 2) MPPM with the constraint length k7 is shown to perform better than RS coded (m, 2) MPPM for the same code rate.

  • Analyses of Virtual Path Bandwidth Control Effects in ATM Networks

    Hisaya HADAMA  Ken-ichi SATO  Ikuo TOKIZAWA  

     
    PAPER-Communication Systems and Transmission Equipment

      Vol:
    E78-B No:6
      Page(s):
    907-915

    This paper presents a newly developed analytical method which evaluates the virtual path bandwidth control effects for a general topology ATM (Asynchronous Transfer Mode) transport network. The virtual path concept can enhance the controllability of path bandwidth. Required link capacity to attain a specified call blocking probability can be reduced by applying virtual path bandwidth control. This paper proposes an analytical method to evaluate the call blocking probability of a general topology ATM network, which includes many virtual paths, that is using virtual path bandwidth control. A method for the designing link capacities of the network is also proposed. These methods make it possible to design an optimum transport network with path bandwidth control. Finally, a newly developed approximation technique is used to develop some analytical results on the effects of dynamic path bandwidth control are provided to demonstrate its effectiveness.

  • Performance of Spread Spectrum Medical Telemetry System in a Sharing Frequency Band with Current Telemetry System

    Masaki KYOSO  Toshiaki TAKANE  Akihiko UCHIYAMA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    862-865

    To make medical telemetry system more reliable in severe electromagnetic environment, we applied spread spectrum communication to ECG data transmission method. Spread spectrum communication system has shown superior performances to other systems, especially, in respect of anti-jamming, which allows it to share the frequency band with current telemetry systems. In this study, we show the characteristics of a spread spectrum transmitter when it is used in the same frequency band as a narrow-band transmitter. The result shows that the spread spectrum telemetry system can use the same frequency band permitted for medical telemetry system.

18901-18920hit(20498hit)