The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Ti(30728hit)

28501-28520hit(30728hit)

  • Relationship between SAR of Eyeball and Position of Feeding Point of MRI Antenna

    Hisaaki OCHI  Etsuji YAMAMOTO  Kunio SAWAYA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    859-861

    Analysis of the specific absorption rate (SAR) of a realistic head model generated with a 1.5-tesla MRI antenna is described. It is found that the SAR of the eyeball is strongly affected by the position of the feeding point, whereas the sensitivity of the antenna is virtually independent of the feeding point.

  • Multimodal Interaction in Human Communication

    Keiko WATANUKI  Kenji SAKAMOTO  Fumio TOGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    609-615

    We are developing multimodal man-machine interfaces through which users can communicate by integrating speech, gaze, facial expressions, and gestures such as nodding and finger pointing. Such multimodal interfaces are expected to provide more flexible, natural and productive communications between humans and computers. To achieve this goal, we have taken the approach of modeling human behavior in the context of ordinary face-to-face conversations. As the first step, we have implemented a system which utilizes video and audio recording equipment to capture verbal and nonverbal information in interpersonal communications. Using this system, we have collected data from a task-oriented conversation between a guest (subject) and a receptionist at company reception desk, and quantitatively analyzed this data with respect to multi-modalities which would be functional in fluid interactions. This paper presents detailed analyses of the data collected: (1) head nodding and eye-contact are related to the beginning and end of speaking turns, acting to supplement speech information; (2) listener responses occur after an average of 0.35 sec. from the receptionist's utterance of a keyword, and turn-taking for tag-questions occurs after an average of 0.44 sec.; and (3) there is a rhythmical coordination between speakers and listeners.

  • Effect of a Catheter on SAR Distribution around Interstitial Antenna for Microwave Hyperthermia

    Meng-Shien WU  Lira HAMADA  Koichi ITO  Haruo KASAI  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    845-850

    This paper describes that the dielectric characteristics of a catheter around the interstitial antenna have an effect on the wavelength for current, and this effect results in the variation of the SAR (Specific Absorption Rate) distribution around the antenna. A theoretical study of SAR distribution ground a coaxial-slot antenna is performed. Analytical technique used is the moment method. Result and discussion on the effect of material and thickness of the catheter are presented. The wavelength for the current shortens with increasing dielectric constant or decreasing thickness of the catheter. Due to this variation of the wavelength for current, the SAR distributions take various shapes.

  • Composite Dynamical System for Controlling Chaos

    Tetsushi UETA  Hiroshi KAWAKAMI  

     
    PAPER-Systems and Control

      Vol:
    E78-A No:6
      Page(s):
    708-714

    We propose a stabilization method of unstable periodic orbits embedded in a chaotic attractor of continuous-time system by using discrete state feedback controller. The controller is designed systematically by the Poincar mapping and its derivatives. Although the output of the controller is applied periodically to system parameter as small perturbations discontinuously, the controlled orbit accomplishes C0. As the stability of a specific orbit is completely determined by the design of controller, we can also use the method to destabilize a stable periodic orbit. The destabilization method may be effectively applied to escape from a local minimum in various optimization problems. As an example of the stabilization and destabilization, some numerical results of Duffing's equation are illustrated.

  • Microwave CT Imaging for a Human Forearm at 3GHz

    Takayuki NAKAJIMA  Hiroshi SAWADA  Itsuo YAMAURA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    874-876

    This paper describes the imaging method for a human forearm in the microwave transmission CT at 3GHz. To improve the spatial resolution, the correction method of the diffraction effects is adopted and the high directivity antennas are used. A cross-sectional image of the human forearm is obtained in vivo.

  • A Partially Ferrites Loaded Waveguide Applicator for Local Heating of Tissues

    Yoshio NIKAWA  Yasunori TOYOFUKU  Fumiaki OKADA  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    836-844

    A partially ferrites and dielectric loaded water filled waveguide applicator is presented which can be used for microwave heating of tissues. The applicator can change its heating pattern by changing the external DC magnetic field applied to the ferrites. The electromagnetic (EM) field distribution inside the applicator is obtained theoretically and the simulated EM field inside the applicator is checked experimentally using 430MHz. Furthermore, on the basis of the EM field distribution inside the applicator, simulations of SAR distribution inside lossy homogeneous human tissue as muscle are performed using finite difference time domain (FD-TD) method. Simulated data of Specific Absorption Rate (SAR) distribution is compared with the experimental ones. Simulations of temperature distribution are also performed using heat transfer equation. Simulated data of temperature elevation distribution is compared with the experimental ones. The simulated results agree well with the experimental ones and it is confirmed that the heating pattern can be changed by external DC magnetic field applied to the applicator. The results obtained here show that the partially ferrites and dielectric loaded water filled waveguide applicator which operates at 430 MHz can change its heating pattern without changing its setup and can heat local target on the human body for hyperthermia treatment.

  • Recent Progress of Electromagnetic Techniques in Hyperthermia Treatment

    Makoto KIKUCHI  

     
    INVITED PAPER

      Vol:
    E78-B No:6
      Page(s):
    799-808

    In the early stage of hyperthermia, a large number of engineering efforts have been done in the development or the improvement of the heating and temperature measuring techniques. However, they were not always satisfactory clinically. Thus, even in this moment, various engineering researches as well as the electromagnetic techniques for hyperthermia should be build up rapidly. This paper describes some of the highlights of developed or ongoing electromagnetic heating techniques in hyperthermia and identities a trend of emerging electromagnetic heating. Furthermore, the author emphasizes that few medical engineering efforts have been done in the boundary field between pure physics and clinics, and the proper way to develop the hyperthermia equipment is the best use of successes in the three essential regions: Physics, Biology and Clinics.

  • Uniform and Non-uniform Normalization of Vocal Tracts Measured by MRI Across Male, Female and Child Subjects

    Chang-Sheng YANG  Hideki KASUYA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    732-737

    Three-dimensional vocal tract shapes of a male, a female and a child subjects are measured from magnetic resonance (MR) images during sustained phonation of Japanese vowels /a, i, u, e, o/. Non-uniform dimensional differences in the vocal tract shapes of the subjects are quantitatively measured. Vocal tract area functions of the female and child subjects are normalized to those of the male on the basis of non-uniform and uniform scalings of the vocal tract length and compared with each other. A comparison is also made between the formant frequencies computed from the area functions normalized by the two different scalings. It is suggested by the comparisons that non-uniformity in the vocal tract dimensions is not essential in the normalization of the five Japanese vowels.

  • Relationship among Recognition Rate, Rejection Rate and False Alarm Rate in a Spoken Word Recognition System

    Atsuhiko KAI  Seiichi NAKAGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    698-704

    Detection of an unknown word or non-vocabulary word uttered by the user is necessary in realizing a practical spoken language user-interface. This paper describes the evaluation of an unknown word processing method for a subword unit based spoken word recognizer. We have assessed the relationship between the word recognition accuracy of a system and the detection rate of unknown words both by simulation and by experiment of the unknown word processing method. We found that the resultant detection accuracies using the unknown word processing are significantly influenced by the original word recognition accuracy while the degree of such effect depends on the vocabulary size.

  • Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams

    Ryosuke ISOTANI  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    692-697

    This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.

  • Tone Recognition of Chinese Dissyllables Using Hidden Markov Models

    Xinhui HU  Keikichi HIROSE  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    685-691

    A method of tone recognition has been developed for dissyllabic speech of Standard Chinese based on discrete hidden Markov modeling. As for the feature parameters of recognition, combination of macroscopic and microscopic parameters of fundamental frequency contours was shown to give a better result as compared to the isolated use of each parameter. Speaker normalization was realized by introducing an offset to the fundamental frequency. In order to avoid recognition errors due to syllable segmentation, a scheme of concatenated learning was adopted for training hidden Markov models. Based on the observations of fundamental frequency contours of dissyllables, a scheme was introduced to the method, where a contour was represented with a series of three syllabic tone models, two for the first and the second syllables and one for the transition part around the syllabic boundary. Corresponding to the voiceless consonant of the second syllable, fundamental frequency contour of a dissyllable may include a part without fundamental frequencies. This part was linearly interpolated in the current method. To prove the validity of the proposed method, it was compared with other methods, such as representing all of the dissyllabic contours as the concatenation of two models, assigning a special code to the voiceless part, and so on. Tone sandhi was also taken into account by introducing two additional models for the half-third tone and for the first 4th tone of the combination of two 4th tones. With the proposed method, average recognition rate of 96% was achieved for 5 male and 5 female speakers.

  • Neural Predictive Hidden Markov Model for Speech Recognition

    Eiichi TSUBOKA  Yoshihiro TAKADA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    676-684

    This paper describes new modeling methods combining neural network and hidden Markov model applicable to modeling a time series such as speech signal. The idea assumes that the sequence is nonstationary and is a nonlinear autoregressive process whose parameters are controlled by a hidden Markov chain. One is the model where a non-linear predictor composed of a multi-layered neural network is defined at each state, another is the model where a multi-layered neural network is defined so that the path from the input layer to the output layer is divided into path-groups each of which corresponds to the state of the Markov chain. The latter is an extended model of the former. The parameter estimation methods for these models are shown, and other previously proposed models--one called Neural Prediction Model and another called Linear Predictive HMM--are shown to be special cases of the NPHMM proposed here. The experimental result affirms the justification of these proposed models.

  • A Comparative Study of Output Probability Functions in HMMs

    Seiichi NAKAGAWA  Li ZHAO  Hideyuki SUZUKI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    669-675

    One of the most effective methods in speech recognition is the HMM which has been used to model speech statistically. The discrete distribution and the continuos distribution HMMs have been widely used in various applications. However, in recent years, HMMs with various output probability functions have been proposed to further improve recognition performance, e.g. the Gaussian mixture continuous and the semi-continuous distributed HMMs. We recently have also proposed the RBF (radial basis function)-based HMM and the VQ-distortion based HMM which use a RBF function and VQ-distortion measure at each state instead of an output probability density function used by traditional HMMs. In this paper, we describe the RBF-based HMM and the VQ-distortion based HMM and compare their performance with the discrete distributed, the Gaussian mixture distributed and the semi-continuous distributed HMMs based on their speech recognition performance rates through experiments on speaker-independent spoken digit recognition. Our results confirmed that the RBF-based and VQ-distortion based HMMs are more robust and superior to traditional HMMs.

  • A Scheme for Word Detection in Continuous Speech Using Likelihood Scores of Segments Modified by Their Context Within a Word

    Sumio OHNO  Keikichi HIROSE  Hiroya FUJISAKI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    725-731

    In conventional word-spotting methods for automatic recognition of continuous speech, individual frames or segments of the input speech are assigned labels and local likelihood scores solely on the basis of their own acoustic characteristics. On the other hand, experiments on human speech perception conducted by the present authors and others show that human perception of words in connected speech is based, not only on the acoustic characteristics of individual segments, but also on the acoustic and linguistic contexts in which these segments occurs. In other words, individual segments are not correctly perceive by humans unless they are accompanied by their context. These findings on the process of human speech perception have to be applied in automatic speech recognition in order to improve the performance. From this point of view, the present paper proposes a new scheme for detecting words in continuous speech based on template matching where the likelihood of each segment of a word is determined not only by its own characteristics but also by the likelihood of its context within the framework of a word. This is accomplished by modifying the likelihood score of each segment by the likelihood score of its phonetic context, the latter representing the degree of similarity of the context to that of a candidate word in the lexicon. Higher enhancement is given to the segmental likelihood score if the likelihood score of its context is higher. The advantage of the proposed scheme over conventional schemes is demonstrated by an experiment on constructing a word lattice using connected speech of Japanese uttered by a male speaker. The result indicates that the scheme is especially effective in giving correct recognition in cases where there are two or more candidate words which are almost equal in raw segmental likelihood scores.

  • A New HMnet Construction Algorithm Requiring No Contextual Factors

    Motoyuki SUZUKI  Shozo MAKINO  Akinori ITO  Hirotomo ASO  Hiroshi SHIMODAIRA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    662-668

    Many methods have been proposed for constructing context-dependent phoneme models using Hidden Markov Models (HMMs) to improve performance. These conventional methods require previously defined contextual factors. If these factors are deficient, the method exhibit poor recognition performance. In this paper, we propose a new construction algorithm for HMnet which does not require pre-defined contextual factors. Experiments demonstrated that the new algorithm could construct the HMnet even for the case that the Successive State Splitting (SSS) algorithm could not. The new algorithm produced better phoneme recognition characteristics than the SSS algorithm.

  • Speaker-Consistent Parsing for Speaker-Independent Continuous Speech Recognition

    Kouichi YAMAGUCHI  Harald SINGER  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    719-724

    This paper describes a novel speaker-independent speech recognition method, called speaker-consistent parsing", which is based on an intra-speaker correlation called the speaker-consistency principle. We focus on the fact that a sentence or a string of words is uttered by an individual speaker even in a speaker-independent task. Thus, the proposed method searches through speaker variations in addition to the contents of utterances. As a result of the recognition process, an appropriate standard speaker is selected for speaker adaptation. This new method is experimentally compared with a conventional speaker-independent speech recognition method. Since the speaker-consistency principle best demonstrates its effect with a large number of training and test speakers, a small-scale experiment may not fully exploit this principle. Nevertheless, even the results of our small-scale experiment show that the new method significantly outperforms the conventional method. In addition, this framework's speaker selection mechanism can drastically reduce the likelihood map computation.

  • A Study on Speaker Adaptation for Mandarin Syllable Recognition with Minimum Error Discriminative Training

    Chih-Heng LIN  Chien-Hsing WU  Pao-Chung CHANG  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    712-718

    This paper investigates a different method of speaker adaptation for Mandarin syllable recognition. Based on the minimum classification error (MCE) criterion, we use the generalized probabilistic decent (GPD) algorithm to adjust interatively the parameters of the hidden Markov models (HMM). The experiments on the multi-speaker Mandarin syllable database of Telecommunication Laboratories (T.L.) yield the following results: 1) Efficient speaker adaptation can be achieved through discriminative training using the MCE criterion and the GPD algorithm. 2) The computations required can be reduced through the use of the confusion sets in Mandarin base syllables. 3) For the discriminative training, the adjustment on the mean values of the Gaussian mixtures has the most prominent effect on speaker adaptation. 4) The discriminative training approach can be used to enhance the speaker adaptation capability of the maximum a posteriori (MAP) approach.

  • Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition

    Nobuaki MINEMATSU  Keikichi HIROSE  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    654-661

    A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.

  • Automatic Determination of the Number of Mixture Components for Continuous HMMs Based a Uniform Variance Criterion

    Tetsuo KOSAKA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    642-647

    We discuss how to determine automatically the number of mixture components in continuous mixture density HMMs (CHMMs). A notable trend has been the use of CHMMs in recent years. One of the major problems with a CHMM is how to determine its structure, that is, how many mixture components and states it has and its optimal topology. The number of mixture components has been determined heuristically so far. To solve this problem, we first investigate the influence of the number of mixture components on model parameters and the output log likelihood value. As a result, in contrast to the mixture number uniformity" which is applied in conventional approaches to determine the number of mixture components, we propose the principle of distribution size uniformity". An algorithm is introduced for automatically determining the number of mixture components. The performance of this algorithm is shown through recognition experiments involving all Japanese phonemes. Two types of experiments are carried out. One assumes that the number of mixture components for each state is the same within a phonetic model but may vary between states belonging to different phonemes. The other assumes that each state has a variable number of mixture components. These two experiments give better results than the conventional method.

  • An Utterance Prediction Method Based on the Topic Transition Model

    Yoichi YAMASHITA  Takashi HIRAMATSU  Osamu KAKUSHO  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    622-628

    This paper describes a method for predicting the user's next utterances in spoken dialog based on the topic transition model, named TPN. Some templates are prepared for each utterance pair pattern modeled by SR-plan. They are represented in terms of five kinds of topic-independent constituents in sentences. The topic of an utterance is predicted based on the TPN model and it instantiates the templates. The language processing unit analyzes the speech recognition result using the templates. An experiment shows that the introduction of the TPN model improves the performance of utterance recognition and it drastically reduces the search space of candidates in the input bunsetsu lattice.

28501-28520hit(30728hit)