The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] CTI(8214hit)

7641-7660hit(8214hit)

  • Uniform and Non-uniform Normalization of Vocal Tracts Measured by MRI Across Male, Female and Child Subjects

    Chang-Sheng YANG  Hideki KASUYA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    732-737

    Three-dimensional vocal tract shapes of a male, a female and a child subjects are measured from magnetic resonance (MR) images during sustained phonation of Japanese vowels /a, i, u, e, o/. Non-uniform dimensional differences in the vocal tract shapes of the subjects are quantitatively measured. Vocal tract area functions of the female and child subjects are normalized to those of the male on the basis of non-uniform and uniform scalings of the vocal tract length and compared with each other. A comparison is also made between the formant frequencies computed from the area functions normalized by the two different scalings. It is suggested by the comparisons that non-uniformity in the vocal tract dimensions is not essential in the normalization of the five Japanese vowels.

  • 4 kbps Improved Pitch Prediction CELP Speech Coding with 20 msec Frame

    Masahiro SERIZAWA  Kazunori OZAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    758-763

    This paper proposes a new pitch prediction method for 4 kbps CELP (Code Excited LPC) speech coding with 20 msec frame, for the future ITU-T 4 kbps speech coding standardization. In the conventional CELP speech coding, synthetic speech quality deteriorates rapidly at 4 kbps, especially for female and children's speech with short pitch period. The pitch prediction performance is significantly degraded for such speech. The important reason is that when the pitch period is shorter than the subframe length, the simple repetition of the past excitation signal based on the estimated lag, not the pitch prediction, is usually carried out in the adaptive codebook operation. The proposed pitch prediction method can carry out the pitch prediction without the above approximation by utilizing the current subframe excitation codevector signal, when the pitch prediction parameters are determined. To further improve the performance, a split vector synthesis and perceptually spectral weighting method, and a low-complexity perceptually harmonic and spectral weighting method have also been developed. The informal listening test result shows that the 4 kbps speech coder with 20 msec frame, utilizing all of the proposed improvements, achieves 0.2 MOS higher results than the coder without them.

  • An Utterance Prediction Method Based on the Topic Transition Model

    Yoichi YAMASHITA  Takashi HIRAMATSU  Osamu KAKUSHO  Riichiro MIZOGUCHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    622-628

    This paper describes a method for predicting the user's next utterances in spoken dialog based on the topic transition model, named TPN. Some templates are prepared for each utterance pair pattern modeled by SR-plan. They are represented in terms of five kinds of topic-independent constituents in sentences. The topic of an utterance is predicted based on the TPN model and it instantiates the templates. The language processing unit analyzes the speech recognition result using the templates. An experiment shows that the introduction of the TPN model improves the performance of utterance recognition and it drastically reduces the search space of candidates in the input bunsetsu lattice.

  • A Comparative Study of Output Probability Functions in HMMs

    Seiichi NAKAGAWA  Li ZHAO  Hideyuki SUZUKI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    669-675

    One of the most effective methods in speech recognition is the HMM which has been used to model speech statistically. The discrete distribution and the continuos distribution HMMs have been widely used in various applications. However, in recent years, HMMs with various output probability functions have been proposed to further improve recognition performance, e.g. the Gaussian mixture continuous and the semi-continuous distributed HMMs. We recently have also proposed the RBF (radial basis function)-based HMM and the VQ-distortion based HMM which use a RBF function and VQ-distortion measure at each state instead of an output probability density function used by traditional HMMs. In this paper, we describe the RBF-based HMM and the VQ-distortion based HMM and compare their performance with the discrete distributed, the Gaussian mixture distributed and the semi-continuous distributed HMMs based on their speech recognition performance rates through experiments on speaker-independent spoken digit recognition. Our results confirmed that the RBF-based and VQ-distortion based HMMs are more robust and superior to traditional HMMs.

  • Neural Predictive Hidden Markov Model for Speech Recognition

    Eiichi TSUBOKA  Yoshihiro TAKADA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    676-684

    This paper describes new modeling methods combining neural network and hidden Markov model applicable to modeling a time series such as speech signal. The idea assumes that the sequence is nonstationary and is a nonlinear autoregressive process whose parameters are controlled by a hidden Markov chain. One is the model where a non-linear predictor composed of a multi-layered neural network is defined at each state, another is the model where a multi-layered neural network is defined so that the path from the input layer to the output layer is divided into path-groups each of which corresponds to the state of the Markov chain. The latter is an extended model of the former. The parameter estimation methods for these models are shown, and other previously proposed models--one called Neural Prediction Model and another called Linear Predictive HMM--are shown to be special cases of the NPHMM proposed here. The experimental result affirms the justification of these proposed models.

  • Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams

    Ryosuke ISOTANI  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    692-697

    This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.

  • Relationship among Recognition Rate, Rejection Rate and False Alarm Rate in a Spoken Word Recognition System

    Atsuhiko KAI  Seiichi NAKAGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    698-704

    Detection of an unknown word or non-vocabulary word uttered by the user is necessary in realizing a practical spoken language user-interface. This paper describes the evaluation of an unknown word processing method for a subword unit based spoken word recognizer. We have assessed the relationship between the word recognition accuracy of a system and the detection rate of unknown words both by simulation and by experiment of the unknown word processing method. We found that the resultant detection accuracies using the unknown word processing are significantly influenced by the original word recognition accuracy while the degree of such effect depends on the vocabulary size.

  • Microwave CT Imaging for a Human Forearm at 3GHz

    Takayuki NAKAJIMA  Hiroshi SAWADA  Itsuo YAMAURA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    874-876

    This paper describes the imaging method for a human forearm in the microwave transmission CT at 3GHz. To improve the spatial resolution, the correction method of the diffraction effects is adopted and the high directivity antennas are used. A cross-sectional image of the human forearm is obtained in vivo.

  • Multimodal Interaction in Human Communication

    Keiko WATANUKI  Kenji SAKAMOTO  Fumio TOGAWA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    609-615

    We are developing multimodal man-machine interfaces through which users can communicate by integrating speech, gaze, facial expressions, and gestures such as nodding and finger pointing. Such multimodal interfaces are expected to provide more flexible, natural and productive communications between humans and computers. To achieve this goal, we have taken the approach of modeling human behavior in the context of ordinary face-to-face conversations. As the first step, we have implemented a system which utilizes video and audio recording equipment to capture verbal and nonverbal information in interpersonal communications. Using this system, we have collected data from a task-oriented conversation between a guest (subject) and a receptionist at company reception desk, and quantitatively analyzed this data with respect to multi-modalities which would be functional in fluid interactions. This paper presents detailed analyses of the data collected: (1) head nodding and eye-contact are related to the beginning and end of speaking turns, acting to supplement speech information; (2) listener responses occur after an average of 0.35 sec. from the receptionist's utterance of a keyword, and turn-taking for tag-questions occurs after an average of 0.44 sec.; and (3) there is a rhythmical coordination between speakers and listeners.

  • Recent Trends in Medical Microwave Radiometry

    Shizuo MIZUSHINA  Hiroyuki OHBA  Katsumi ABE  Shinya MIZOSHIRI  Toshifumi SUGIURA  

     
    INVITED PAPER

      Vol:
    E78-B No:6
      Page(s):
    789-798

    Microwave radiometry has been investigated for non-invasive measurement of temperature in human body. Recent trends are to explore the capability of retrieving a temperature profile or map from a set of brightness temperatures measured by a multifrequency radiometer operating in a 1-6GHz range. The retrieval of temperature from the multifrequency measurement data is formulated as an inverse problem in which the number of independent measurement or data is limited (7) and the data suffer from considerably large random fluctuations. The standard deviation of the data fluctuation is given by the brightness temperature resolution of the instrument (0.04-0.1K). Solutions are prone to instabilities and large errors unless proper solution methods are used. Solution methods developed during the last few years are reviewed: singular system analysis, bio-heat transfer solution matched with radiometric data, and model-fitting combined with Monte Carlo technique. Typical results obtained by these methods are presented to indicate a crosssection of the present-state-of-the-development in the field. This review concludes with discussions on the radiometric weighting function which connects physical temperatures in object to the brightness temperature. Three-dimensional weighting functions derived by the modal analysis and the FDTD method for a rectangular waveguide antenna coupled to a four layered lossy medium are discussed. Development of temperature retrieval procedures incorporating the 3-D weighting functions is an important and challenging task for future work in this field.

  • Electromagnetic Near Fields of Rectangular Waveguide Antennas in Contact with Biological Objects Obtained by the FD-TD Method

    Katsumi ABE  Shinya MIZOSHIRI  Toshifumi SUGIURA  Shizuo MIZUSHINA  

     
    LETTER

      Vol:
    E78-B No:6
      Page(s):
    866-870

    Multifrequency microwave radiometry for non-invasive measurement of temperature in biological objects has been investigated in our laboratory. An open-ended rectangular waveguide filled with a dielectric has been used as a contact-type antenna of a radiometer operating over a 1-4GHz range. In the radiometric measurement, the radiometer measures the thermal radiation emitted by the object via the antenna as the brightness temperature. The brightness temperature is related to the physical temperatures in the object through the radiometric weighting function. By virtue of the reciprocity of antenna, the weighting function can be derived from the field distribution induced in the object by the same antenna when it is operated in the active mode. In this paper, the FD-TD method is used to analyze the problem of coupling between the rectangular waveguide antenna and a biological object. The objects studied in this paper are a homogeneous and a four-layered lossy media. Working frequency is 1.2GHz, which is the center frequency of the lowest-frequency band of our radiometer. Numerical results are presented in the form of SAR patterns. It is found that the SAR patterns tend to spread out in the lateral directions in the bolus, skin and fat layers due to the diffraction which becomes stronger at lower frequencies. Results also suggest that the lateral spreading can be controlled to a certain extent by choosing the size elf antenna flange properly.

  • An Objective Measure Based on an Auditory Model for Assessing Low-Rate Coded Speech

    Toshiro WATANABE  Shinji HAYASHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    751-757

    We propose an objective measure from assessing low-rate coded speech. The model for this objective measure, in which several known features of the perceptual processing of speech sounds by the human ear are emulated, is based on the Hertz-to-Bark transformation, critical-band filtering with preemphasis to boost higher frequencies, nonlinear conversion for subjective loudness, and temporal (forward) masking. The effectiveness of the measure, called the Bark spectral distortion rating (BSDR), was validated by second-order polynomial regression analysis between the computed BSDR values and subjective MOS ratings obtained for a large number of utterances coded by several versions of CELP coders and one VSELP coder under three degradation conditions: input speech levels, transmission error rates, and background noise levels. The BSDR values correspond better to MOS ratings than several commonly used measures. Thus, BSDR can be used to accurately predict subjective scores.

  • Learning Logic Programs Using Definite Equality Theories as Background Knowledge

    Akihiro YAMAMOTO  

     
    PAPER-Computational Learning Theory

      Vol:
    E78-D No:5
      Page(s):
    539-544

    In this paper we investigate the learnability of relations in Inductive Logic Programming, by using equality theories as background knowledge. We assume that a hypothesis and an observation are respectively a definite program and a set of ground literals. The targets of our learning algorithm are relations. By using equality theories as background knowledge we introduce tree structure into definite programs. The structure enable us to narrow the search space of hypothesis. We give pairs of a hypothesis language and a knowledge language in order to discuss the learnability of relations from the view point of inductive inference and PAC learning.

  • 1-V Josephson-Junction Array Voltage Standrd and Development of 10-V Josephson Junction Array at ETL

    Tadashi ENDO  Yasuhiko SAKAMOTO  Yasushi MURAYAMA  Akio IWASA  Haruo YOSHIDA  

     
    INVITED PAPER-Voltage standard

      Vol:
    E78-C No:5
      Page(s):
    503-510

    Recenty, the Josephson effect-based voltage standard has been realized by using the Josephson junction array which is constructed by integrating many Josephson junctions. In this article, the 1-V Josephson-junction-array voltage standard used in routine calibration work and further development of the 10-V Josephson junction array at the Electrotechnical Laboratory (ETL) are introduced.

  • Optimizing Linear Recursive Formulas by Detaching Isolated Variables

    Xiaoyong DU  Naohiro ISHII  

     
    PAPER-Databases

      Vol:
    E78-D No:5
      Page(s):
    579-585

    Program transformation is a kind of optimization techniques for logic programs, which aims at transforming equally a program into an other form by exploiting some properties or information of the program, so as to make the program cheaper to evaluate. In this paper, a new kind of property of logic programs, called reducibility, is exploited in program transformation. A recursive predicate is reducible if the values of some variables in the recursive predicate are independent to the remainder part and can be detached from the predicate after finite times of expansions. After being proved that the semantic notion of reducibility can be replaced by the syntactic notion of disconnectivity of a R-graph, which is a kind of graph model to represent the behavior of formula expansions, an efficient testing and factoring algorithm is proposed. The paper also extends some existed results on compiled formulas of linear sirups, and compares with some related work.

  • Neuro-Base Josephson Flip-Flop

    Yoshinao MIZUGAKI  Koji NAKAJIMA  Tsutomu YAMASHITA  

     
    PAPER-Superconducting integrated circuits

      Vol:
    E78-C No:5
      Page(s):
    531-534

    We present a superconducting neural network which functions as an RS flip-flop. We employ a coupled-SQUID as a neuron, which is a combination of a single-junction SQUID and a double-junction SQUID. A resistor is used as a fixed synapse. The network consists of two neurons and two synapses. The operation of the network is simulated under the junction current density of 100 kA/cm2. The result shows that the network is operated as an RS flip-flop with clock speed capability up to 50 GHz.

  • Development of Liquid Helium-Free Superconducting Magnet

    Junji SAKURABA  Mamoru ISHIHARA  Seiji YASUHARA  Kazunori JIKIHARA  Keiichi WATAZAWA  Tsuginori HASEBE  Chin Kung CHONG  Yutaka YAMADA  Kazuo WATANABE  

     
    INVITED PAPER-Applications of small-size high field superconducting magnet

      Vol:
    E78-C No:5
      Page(s):
    535-541

    Cryocooler cooled superconducting magnets using Bismuth based high-Tc current leads have been successfully demonstrated. The magnets mainly consisted of a superconducting coil, current leads and a radiation shield which are cooled by a two stage Gifford-McMahon cryocooler without using liquid helium. Our first liquid helium-free 4.6 T (Nb, Ti)3Sn superconducting magnet with a room temperature bore of 38 mm operated at 11 K has recorded a continuous operation at 3.7 T for 1,200 hours and total cooling time over 10,000 hours without trouble. As a next step, we constructed a (Nb, Ti)3Sn liquid helium-free superconducting magnet with a wider room temperature bore of 60 mm. The coil temperature reached 8.3 K in 37 hours after starting the cryocooler. The magnet generated 5.0 T at the center of the 60 mm room temperature bore at an operating current of 140 A. An operation at a field of 5 T was confirmed to be stable even if the cryocooler has been stopped for 4 minutes. These results show that the liquid helium-free superconducting magnets can provide an excellent performance for a new application of the superconducting magnet.

  • High-Tc Superconducting Quantum Interference Device with Additional Positive Feedback

    Akira ADACHI  Ken'ichi OKAJIMA  Youichi TAKADA  Saburo TANAKA  Hideo ITOZAKI  Haruhisa TOYODA  Hisashi KADO  

     
    PAPER-SQUID sensor and multi-channel SQUID system

      Vol:
    E78-C No:5
      Page(s):
    519-525

    This study shows that using the direct offset integration technique (DOIT) and additional positive feedback (APF) in a high-Tc dc superconducting quantum interference device (SQUID) improves the effective flux-to-voltage transfer function and reduces the flux noise of a magnetometer, thus improving the magnetic field noise. The effective flux-to-voltage transfer function and the flux noise with APF were measured at different values of the positive feedback parameter βa, which depends on the resistance of the APF circuit. These quantities were also compared between conditions with and without APF. This investigation showed that a βa condition the most suitable for minimizing the flux noise of a magnetometer with APF exists and that it is βa=0.77. The effective flux-to-voltage transfer function with APF is about three times what it is without APF (93 µV/Φ0 vs. 32 µV/Φ0). The magnetic field noise of a magnetometer with APF is improved by a factor of about 3 (242 fT/Hz vs. 738 fT/Hz).

  • Heating Phenomena in the Vibrating Superconducting Magnet on Maglev

    Eiji SUZUKI  

     
    INVITED PAPER-Applications of small-size high field superconducting magnet

      Vol:
    E78-C No:5
      Page(s):
    549-556

    The superconducting magnet on a maglev vehicle vibrate and heats up inside under the influence of various disturbances in running. We have investigated the characteristics of heating in the superconducting magnet vibrating under the electro-magnetic disturbance from the ground coils. This magnetic disturbance has a frequency component ranging widely from 0 Hz to several hundred Hz which is proportional to the speed of the maglev vehicle. It was revealed that an extreme increase of heat load on the inner vessel of the energized magnet occurred at a particular frequency and it surpassed the capacity of the refrigerator installed in the tank of the superconducting magnet. As a result of the investigation, we could identify broadly three factors of heating, and now we have good prospects of largely suppressing the heating by reducing the disturbance through the folded arrangement of the ground coils and a structural improvement of the magnet.

  • Properties of Language Classes with Finite Elasticity

    Takashi MORIYAMA  Masako SATO  

     
    PAPER-Computational Learning Theory

      Vol:
    E78-D No:5
      Page(s):
    532-538

    This paper considers properties of language classes with finite elasticity in the viewpoint of set theoretic operations. Finite elasticity was introduced by Wright as a sufficient condition for language classes to be inferable from positive data, and as a property preserved by (not usual) union operation to generate a class of unions of languages. We show that the family of language classes with finite elasticity is closed under not only union but also various operations for language classes such as intersection, concatenation and so on, except complement operation. As a framework defining languages, we introduce restricted elementary formal systems (EFS's for short), called max length-bounded by which any context-sensitive language is definable. We define various operations for EFS's corresponding to usual language operations and also for EFS classes, and investigate closure properties of the family Ge of max length-bounded EFS classes that define classes of languages with finite elasticity. Furthermore, we present theorems characterizing a max length-bounded EFS class in the family Ge, and that for the language class to be inferable from positive data, provided the class is closed under subset operation. From the former, it follows that for any n, a language class definable by max length-bounded EFS's with at most n axioms has finite elasticity. This means that Ge is sufficiently large.

7641-7660hit(8214hit)