The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] TE(21534hit)

19921-19940hit(21534hit)

  • Computation of the Field Distribution Generated by a Rectangular Aperture in a Four-Layered Lossy Dielectric Medium by Modal Analysis

    Shinya MIZOSHIRI  Katsumi ABE  Toshifumi SUGIURA  Shizuo MIZUSHINA  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    851-858

    An open-ended rectangular waveguide filled with a dielectric has been used as a contact-type antenna of microwave radiometer for non-invasive measurement of temperature in a biological object. In this application, the thermal radiation emitted by the object is measured as the brightness temperature by the instrument via the antenna. The brightness temperature is related to the physical temperatures in the object through the radiometric weighting function. By virtue of the reciprocity of antenna, the weighting function can be derived from the field distribution induced in the object by the antenna when it is operated in the active mode. In this work, we treat a problem of the rectangular waveguide antenna radiating into a four-layered medium by modal analysis. The results are first compared with those obtained by the FD-TD method to indicate that the results of the two methods are in a good agreement. The operation of an antenna used in a radiometer system in our laboratory is analyzed by this method and the weighting functions at different frequencies are computed, and the results are presented along with discussions on the results.

  • Characteristics of Multi-Layer Perceptron Models in Enhancing Degraded Speech

    Thanh Tung LE  John MASON  Tadashi KITAMURA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    744-750

    A multi-layer perceptron (MLP) acting directly in the time-domain is applied as a speech signal enhancer, and the performance examined in the context of three common classes of degradation, namely low bit-rate CELP degradation is non-linear system degradation, additive noise, and convolution by a linear system. The investigation focuses on two topics: (i) the influence of non-linearities within the network and (ii) network topology, comparing single and multiple output structures. The objective is to examine how these characteristics influence network performance and whether this depends on the class of degradation. Experimental results show the importance of matching the enhancer to the class of degradation. In the case of the CELP coder the standard MLP with its inherently non-linear characteristics is shown to be consistently better than any equivalent linear structure (up to 3.2 dB compared with 1.6 dB SNR improvement). In contrast, when the degradation is from additive noise, a linear enhancer is always, superior.

  • An Utterance Prediction Method Based on the Topic Transition Model

    Yoichi YAMASHITA  Takashi HIRAMATSU  Osamu KAKUSHO  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    622-628

    This paper describes a method for predicting the user's next utterances in spoken dialog based on the topic transition model, named TPN. Some templates are prepared for each utterance pair pattern modeled by SR-plan. They are represented in terms of five kinds of topic-independent constituents in sentences. The topic of an utterance is predicted based on the TPN model and it instantiates the templates. The language processing unit analyzes the speech recognition result using the templates. An experiment shows that the introduction of the TPN model improves the performance of utterance recognition and it drastically reduces the search space of candidates in the input bunsetsu lattice.

  • Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition

    Nobuaki MINEMATSU  Keikichi HIROSE  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    654-661

    A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.

  • Relationship among Recognition Rate, Rejection Rate and False Alarm Rate in a Spoken Word Recognition System

    Atsuhiko KAI  Seiichi NAKAGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    698-704

    Detection of an unknown word or non-vocabulary word uttered by the user is necessary in realizing a practical spoken language user-interface. This paper describes the evaluation of an unknown word processing method for a subword unit based spoken word recognizer. We have assessed the relationship between the word recognition accuracy of a system and the detection rate of unknown words both by simulation and by experiment of the unknown word processing method. We found that the resultant detection accuracies using the unknown word processing are significantly influenced by the original word recognition accuracy while the degree of such effect depends on the vocabulary size.

  • A New HMnet Construction Algorithm Requiring No Contextual Factors

    Motoyuki SUZUKI  Shozo MAKINO  Akinori ITO  Hirotomo ASO  Hiroshi SHIMODAIRA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    662-668

    Many methods have been proposed for constructing context-dependent phoneme models using Hidden Markov Models (HMMs) to improve performance. These conventional methods require previously defined contextual factors. If these factors are deficient, the method exhibit poor recognition performance. In this paper, we propose a new construction algorithm for HMnet which does not require pre-defined contextual factors. Experiments demonstrated that the new algorithm could construct the HMnet even for the case that the Successive State Splitting (SSS) algorithm could not. The new algorithm produced better phoneme recognition characteristics than the SSS algorithm.

  • A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance

    Osamu YOSHIOKA  Yasuhiro MINAMI  Kiyohiro SHIKANO  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    616-621

    This paper describes a multimodal dialogue system employing speech input. This system uses three input methods (through a speech recognizer, a mouse, and a keyboard) and two output methods (through a display and using sound). For the speech recognizer, an algorithm is employed for large-vocabulary speaker-independent continuous speech recognition based on the HMM-LR technique. This system is implemented for telephone directory assistance to evaluate the speech recognition algorithm and to investigate the variations in speech structure that users utter to computers. Speech input is used in a multimodal environment. The collecting of dialogue data between computers and users is also carried out. Twenty telephone-number retrieval tasks are used to evaluate this system. In the experiments, all the users are equally trained in using the dialogue system with an interactive guidance system implemented on a workstation. Simplified city maps that indicate subscriber names and addresses are used to reduce the implicit restrictions imposed by written sentences, thus allowing each user to develop his own forms of expression. The task completion rate is 99.0% and approximately 75% of the users say that they prefer this system to using a telephone book. Moreover, there is a significant decrease in nonkeyword usage, i.e., the usage of words other than names and addresses, for users who receive more utterance practice.

  • Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams

    Ryosuke ISOTANI  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    692-697

    This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.

  • Simultaneous Estimation of Vocal Tract and Voice Source Parameters Based on an ARX Model

    Wen DING  Hideki KASUYA  Shuichi ADACHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    738-743

    A novel adaptive pitch-synchronous analysis method is proposed to estimate simultaneously vocal tract (formant/antiformant) and voice source parameters from speech waveforms. We use the parametric Rosenberg-Klatt (RK) model to generate a glottal waveform and an autoregressive-exogenous (ARX) model to represent voiced speech production process. The Kalman filter algorithm is used to estimate the formant/antiformant parameters from the coefficient of the ARX model, and the simulated annealing method is employed as a nonlinear optimization approach to estimate the voice source parameters. The two approaches work together in a system identification procedure to find the best set of the parameters of both the models. The new method has been compared using synthetic speech with some other approaches in terms of accuracy of estimated parameter values and has been proved to be superior. We also show that the proposed method can estimate accurately the parameters from natural speech sounds. A major application of the analysis method lies in a concatenative formant synthesizer which allows us to make flexible control of voice quality of synthetic speech.

  • Simulation Study on Ground-Based Direction Finding of VLF/ELF Radio Waves by Wave Distribution Functions: a Bayesian Approach

    Mehrez HIRARI  Masashi HAYAKAWA  

     
    PAPER-Antennas and Propagation

      Vol:
    E78-B No:6
      Page(s):
    923-931

    In this paper we consider the determination of direction of arrival of VLF/ELF radio waves and their energy distribution at the ionospheric base by means of the inversion of electromagnetic data observed on the ground. The observed data are too limited, leading us to deal with a severely ill-posed problem similar to those encountered in digital image enhancement and computerized tomography. To handle this situation, the a priori information if available, is supposed to bring as much weight as the observed data do. We used a regularization based on Bayesian information criterion to reconstruct the wave distribution function at the ionosphere, that is, to determine the wave arrival direction. Using computer-generated data, two main results were obtained: first, the electromagnetic field data observed on the ground are sufficient to give a good approximation to the exit region of VLF/ELF radio waves and to reconstruct the wave energy distribution nicely at the ionospheric base. Secondly, the Bayesian information criterion is shown efficient and very promising to handle the situations where the data number is too small compared to the number of unknowns which is the case of most reconstruction problems.

  • Analyses of Virtual Path Bandwidth Control Effects in ATM Networks

    Hisaya HADAMA  Ken-ichi SATO  Ikuo TOKIZAWA  

     
    PAPER-Communication Systems and Transmission Equipment

      Vol:
    E78-B No:6
      Page(s):
    907-915

    This paper presents a newly developed analytical method which evaluates the virtual path bandwidth control effects for a general topology ATM (Asynchronous Transfer Mode) transport network. The virtual path concept can enhance the controllability of path bandwidth. Required link capacity to attain a specified call blocking probability can be reduced by applying virtual path bandwidth control. This paper proposes an analytical method to evaluate the call blocking probability of a general topology ATM network, which includes many virtual paths, that is using virtual path bandwidth control. A method for the designing link capacities of the network is also proposed. These methods make it possible to design an optimum transport network with path bandwidth control. Finally, a newly developed approximation technique is used to develop some analytical results on the effects of dynamic path bandwidth control are provided to demonstrate its effectiveness.

  • Electromagnetic Near Fields of Rectangular Waveguide Antennas in Contact with Biological Objects Obtained by the FD-TD Method

    Katsumi ABE  Shinya MIZOSHIRI  Toshifumi SUGIURA  Shizuo MIZUSHINA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    866-870

    Multifrequency microwave radiometry for non-invasive measurement of temperature in biological objects has been investigated in our laboratory. An open-ended rectangular waveguide filled with a dielectric has been used as a contact-type antenna of a radiometer operating over a 1-4GHz range. In the radiometric measurement, the radiometer measures the thermal radiation emitted by the object via the antenna as the brightness temperature. The brightness temperature is related to the physical temperatures in the object through the radiometric weighting function. By virtue of the reciprocity of antenna, the weighting function can be derived from the field distribution induced in the object by the same antenna when it is operated in the active mode. In this paper, the FD-TD method is used to analyze the problem of coupling between the rectangular waveguide antenna and a biological object. The objects studied in this paper are a homogeneous and a four-layered lossy media. Working frequency is 1.2GHz, which is the center frequency of the lowest-frequency band of our radiometer. Numerical results are presented in the form of SAR patterns. It is found that the SAR patterns tend to spread out in the lateral directions in the bolus, skin and fat layers due to the diffraction which becomes stronger at lower frequencies. Results also suggest that the lateral spreading can be controlled to a certain extent by choosing the size elf antenna flange properly.

  • A Scheme for Word Detection in Continuous Speech Using Likelihood Scores of Segments Modified by Their Context Within a Word

    Sumio OHNO  Keikichi HIROSE  Hiroya FUJISAKI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    725-731

    In conventional word-spotting methods for automatic recognition of continuous speech, individual frames or segments of the input speech are assigned labels and local likelihood scores solely on the basis of their own acoustic characteristics. On the other hand, experiments on human speech perception conducted by the present authors and others show that human perception of words in connected speech is based, not only on the acoustic characteristics of individual segments, but also on the acoustic and linguistic contexts in which these segments occurs. In other words, individual segments are not correctly perceive by humans unless they are accompanied by their context. These findings on the process of human speech perception have to be applied in automatic speech recognition in order to improve the performance. From this point of view, the present paper proposes a new scheme for detecting words in continuous speech based on template matching where the likelihood of each segment of a word is determined not only by its own characteristics but also by the likelihood of its context within the framework of a word. This is accomplished by modifying the likelihood score of each segment by the likelihood score of its phonetic context, the latter representing the degree of similarity of the context to that of a candidate word in the lexicon. Higher enhancement is given to the segmental likelihood score if the likelihood score of its context is higher. The advantage of the proposed scheme over conventional schemes is demonstrated by an experiment on constructing a word lattice using connected speech of Japanese uttered by a male speaker. The result indicates that the scheme is especially effective in giving correct recognition in cases where there are two or more candidate words which are almost equal in raw segmental likelihood scores.

  • Performance of Spread Spectrum Medical Telemetry System in a Sharing Frequency Band with Current Telemetry System

    Masaki KYOSO  Toshiaki TAKANE  Akihiko UCHIYAMA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    862-865

    To make medical telemetry system more reliable in severe electromagnetic environment, we applied spread spectrum communication to ECG data transmission method. Spread spectrum communication system has shown superior performances to other systems, especially, in respect of anti-jamming, which allows it to share the frequency band with current telemetry systems. In this study, we show the characteristics of a spread spectrum transmitter when it is used in the same frequency band as a narrow-band transmitter. The result shows that the spread spectrum telemetry system can use the same frequency band permitted for medical telemetry system.

  • Tone Recognition of Chinese Dissyllables Using Hidden Markov Models

    Xinhui HU  Keikichi HIROSE  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    685-691

    A method of tone recognition has been developed for dissyllabic speech of Standard Chinese based on discrete hidden Markov modeling. As for the feature parameters of recognition, combination of macroscopic and microscopic parameters of fundamental frequency contours was shown to give a better result as compared to the isolated use of each parameter. Speaker normalization was realized by introducing an offset to the fundamental frequency. In order to avoid recognition errors due to syllable segmentation, a scheme of concatenated learning was adopted for training hidden Markov models. Based on the observations of fundamental frequency contours of dissyllables, a scheme was introduced to the method, where a contour was represented with a series of three syllabic tone models, two for the first and the second syllables and one for the transition part around the syllabic boundary. Corresponding to the voiceless consonant of the second syllable, fundamental frequency contour of a dissyllable may include a part without fundamental frequencies. This part was linearly interpolated in the current method. To prove the validity of the proposed method, it was compared with other methods, such as representing all of the dissyllabic contours as the concatenation of two models, assigning a special code to the voiceless part, and so on. Tone sandhi was also taken into account by introducing two additional models for the half-third tone and for the first 4th tone of the combination of two 4th tones. With the proposed method, average recognition rate of 96% was achieved for 5 male and 5 female speakers.

  • Recent Trends in Medical Microwave Radiometry

    Shizuo MIZUSHINA  Hiroyuki OHBA  Katsumi ABE  Shinya MIZOSHIRI  Toshifumi SUGIURA  

     
    INVITED PAPER

      Vol:
    E78-B No:6
      Page(s):
    789-798

    Microwave radiometry has been investigated for non-invasive measurement of temperature in human body. Recent trends are to explore the capability of retrieving a temperature profile or map from a set of brightness temperatures measured by a multifrequency radiometer operating in a 1-6GHz range. The retrieval of temperature from the multifrequency measurement data is formulated as an inverse problem in which the number of independent measurement or data is limited (7) and the data suffer from considerably large random fluctuations. The standard deviation of the data fluctuation is given by the brightness temperature resolution of the instrument (0.04-0.1K). Solutions are prone to instabilities and large errors unless proper solution methods are used. Solution methods developed during the last few years are reviewed: singular system analysis, bio-heat transfer solution matched with radiometric data, and model-fitting combined with Monte Carlo technique. Typical results obtained by these methods are presented to indicate a crosssection of the present-state-of-the-development in the field. This review concludes with discussions on the radiometric weighting function which connects physical temperatures in object to the brightness temperature. Three-dimensional weighting functions derived by the modal analysis and the FDTD method for a rectangular waveguide antenna coupled to a four layered lossy medium are discussed. Development of temperature retrieval procedures incorporating the 3-D weighting functions is an important and challenging task for future work in this field.

  • Microwave CT Imaging for a Human Forearm at 3GHz

    Takayuki NAKAJIMA  Hiroshi SAWADA  Itsuo YAMAURA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    874-876

    This paper describes the imaging method for a human forearm in the microwave transmission CT at 3GHz. To improve the spatial resolution, the correction method of the diffraction effects is adopted and the high directivity antennas are used. A cross-sectional image of the human forearm is obtained in vivo.

  • Dual Concentric Conductor Radiator for Microwave Hyperthermia with Improved Field Uniformity to Periphery of Aperture

    Paul R. STAUFFER  Marco LEONCINI  Vinicio MANFRINI  Guido Biffi GENTILI  Chris J. DIEDERICH  David BOZZO  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    826-835

    Electromagnetic radiation patterns of planar 915MHz Dual Concentric Conductor (DCC) antennas were investigated with theoretical finite difference time domain (FDTD) analyses and experimental measurements of power deposition in a homogeneous lossy dielectric load. Power deposition (SAR) patterns were characterized by scanning an electric field sensor in front of the radiating aperture 1 cm deep in liquid "muscle tissue" phantom. Results showed close agreement between the theoretical simulations and measured SAR patterns for a 3.5cm square aperture. Additional SAR measurements demonstrated the ability to vary aperture size from 3.5-6cm with minimal change in shape of the power deposition pattern. Both analyses indicated that effective power deposition (50% SARmax) extends to the periphery of the square apertures. These data support the conclusion that the DCC aperture constitutes an improved radiator to be used as the functional building block of larger array applicators which are required for adjustable heating of large superficial tissue regions in the treatment of cancer.

  • Multimodal Interaction in Human Communication

    Keiko WATANUKI  Kenji SAKAMOTO  Fumio TOGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    609-615

    We are developing multimodal man-machine interfaces through which users can communicate by integrating speech, gaze, facial expressions, and gestures such as nodding and finger pointing. Such multimodal interfaces are expected to provide more flexible, natural and productive communications between humans and computers. To achieve this goal, we have taken the approach of modeling human behavior in the context of ordinary face-to-face conversations. As the first step, we have implemented a system which utilizes video and audio recording equipment to capture verbal and nonverbal information in interpersonal communications. Using this system, we have collected data from a task-oriented conversation between a guest (subject) and a receptionist at company reception desk, and quantitatively analyzed this data with respect to multi-modalities which would be functional in fluid interactions. This paper presents detailed analyses of the data collected: (1) head nodding and eye-contact are related to the beginning and end of speaking turns, acting to supplement speech information; (2) listener responses occur after an average of 0.35 sec. from the receptionist's utterance of a keyword, and turn-taking for tag-questions occurs after an average of 0.44 sec.; and (3) there is a rhythmical coordination between speakers and listeners.

  • Effect of a Catheter on SAR Distribution around Interstitial Antenna for Microwave Hyperthermia

    Meng-Shien WU  Lira HAMADA  Koichi ITO  Haruo KASAI  

     
    PAPER

      Vol:
    E78-B No:6
      Page(s):
    845-850

    This paper describes that the dielectric characteristics of a catheter around the interstitial antenna have an effect on the wavelength for current, and this effect results in the variation of the SAR (Specific Absorption Rate) distribution around the antenna. A theoretical study of SAR distribution ground a coaxial-slot antenna is performed. Analytical technique used is the moment method. Result and discussion on the effect of material and thickness of the catheter are presented. The wavelength for the current shortens with increasing dielectric constant or decreasing thickness of the catheter. Due to this variation of the wavelength for current, the SAR distributions take various shapes.

19921-19940hit(21534hit)